Commit 67c3e115 authored by Sebastian Schüpbach's avatar Sebastian Schüpbach
Browse files

update README.md

parent 0c7734db
Pipeline #25854 passed with stages
in 6 minutes and 2 seconds
...@@ -17,3 +17,14 @@ This service “preprocesses” the data for media metadata enrichment in two wa ...@@ -17,3 +17,14 @@ This service “preprocesses” the data for media metadata enrichment in two wa
- The data does not contain a thumbnail or other accessible media files: The data is directly sent to the import-process-normalization topic, therefore leaving out completely the media metadata enrichment. - The data does not contain a thumbnail or other accessible media files: The data is directly sent to the import-process-normalization topic, therefore leaving out completely the media metadata enrichment.
The rationale behind the second step is two allow a more fine-grained resource control in our Kubernetes cluster as well as accelerate the import process. AV-processing / -analysing is resource and time intensive, so it does make sense to provide more resources to this step (which is unneeded for image analysis). On the other hand data which does not require this processing step can circumvent this potential bottleneck by using a “faster line” via the image enrichment service or even going directly to the normalization service. The rationale behind the second step is two allow a more fine-grained resource control in our Kubernetes cluster as well as accelerate the import process. AV-processing / -analysing is resource and time intensive, so it does make sense to provide more resources to this step (which is unneeded for image analysis). On the other hand data which does not require this processing step can circumvent this potential bottleneck by using a “faster line” via the image enrichment service or even going directly to the normalization service.
## Configuration
In order to work as expected, the service needs to have a couple of environment variables set:
* `APPLICATION_ID`: Kafka Streams application identifier (see [Kafka documentation](https://kafka.apache.org/documentation/#streamsconfigs_application.id) for details)
* `TOPIC_IN`: Kafka topic where incoming topics are read from
* `TOPIC_OUT_AV`: Kafka topic where document messages describing an AV resource are written to
* `TOPIC_OUT_IMAGE`: Kafka topic where document messages describing an image resource are written to
* `TOPIC_OUT_IGNORE`: Kafka topic where document messages describing a resource which can't be enriched are written to
* `TOPIC_PROCESS`: Kafka topic where status reports are written to
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment