Unverified Commit 9e8efbe7 authored by Sebastian Schüpbach's avatar Sebastian Schüpbach
Browse files

update README

parent 7c0ce26e
Pipeline #18551 passed with stages
in 5 minutes and 53 seconds
# Media Metadata Preprocessor
The Media Metadata Preprocessor has two tasks.
This service “preprocesses” the data for media metadata enrichment in two ways:
First, it acts as a filter for
separating records which can't be enriched from those which can. This helps to
accelerate the media metadata enrichment by not "congesting" the enrichment
pipeline with unsuitable records.
Second, it adds the `ebucore:isDistributedOn` property. It can have the
values `audio`, `image` and `video` for locally held or at least remotely
accessible media files and `youtube`, `vimeo`, `srfaudio`, `srfvideo` and
eventually `zem` for media files which are delivered by an external player
(and therefore are not directly accessible for further analysing).
\ No newline at end of file
- It defines the ebucore:isDistributedOn property. This property contains the information with what player the media information is delivered. It can have the following values:
- `srfaudio`: Audio files via SRF Play
- `srfvideo`: Video files via SRF Play
- `vimeo`: Vimeo player
- `youtube`: Youtube player
- `zem`: ZEM player
- `audio`: Local or remote audio files via Memobase player
- `image`: Local or remote image files via Memobase player
- `video`: Local or remote video files via Memobase player
- It decides on what information can at most be extracted from the referenced media file. The following variants are covered (in decreasing order of priority):
- Referenced media file is accessible (i.e. can directly be downloaded for analysis) and is either video or audio: The data is sent to the import-process-av-enrichment topic.
- The data references a poster image or the referenced media file is itself an accessible image file: The data is sent to the import-process-image-enrichment topic.
- The data does not contain a thumbnail or other accessible media files: The data is directly sent to the import-process-normalization topic, therefore leaving out completely the media metadata enrichment.
The rationale behind the second step is two allow a more fine-grained resource control in our Kubernetes cluster as well as accelerate the import process. AV-processing / -analysing is resource and time intensive, so it does make sense to provide more resources to this step (which is unneeded for image analysis). On the other hand data which does not require this processing step can circumvent this potential bottleneck by using a “faster line” via the image enrichment service or even going directly to the normalization service.
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment