Commit d759be9f authored by Sebastian Schüpbach's avatar Sebastian Schüpbach
Browse files

Update README.md, helm-charts/templates/app-config.yaml,...

Update README.md, helm-charts/templates/app-config.yaml, helm-charts/values.yaml, helm-charts/helm-values/di-media-converter-prod.yaml, helm-charts/helm-values/di-media-converter-stage.yaml, helm-charts/helm-values/di-media-converter-test.yaml files
parent 894ce734
Pipeline #29391 passed with stages
in 8 minutes and 18 seconds
# Media Converter
The __Media Converter__ is an intermediary service which takes care of media file postprocessing
upon receiving notifications on changes in the Fedora repository. More precisely, the service
1. extracts the path to the media file from the Fedora API based on the received ID
2. transcodes this file if required (e.g. jpg -> jp2),
3. copies the file to a designated directory accessible by the mediaserver or the Cantaloupe IIIF imageserver and
4. updates the Mediaserver database with the path to the copied file.
## Components
The __Media Converter__ is responsible for preparing media files for consumption by end users. This comprises:
* Copying files from the source folder (the sFTP directory) to a dedicated media directory directly accessible by the media file providers like the [media server](https://gitlab.switch.ch/memoriav/memobase-2020/services/streaming-server) or the [IIIF image server](https://gitlab.switch.ch/memoriav/memobase-2020/services/cantaloupe-docker). Besides the distribution copies these can also be preview images for videos ("thumbnails")
* In the case of audio files repackaging the content in an MPEG4 container
* Creating small "snippets" from the audio file which are in turn used by the frontend to create sonograms
## Copying
The service is packaged as a Docker image. It comprises the following components:
The service gets the needed media files via the [Media File Distributor](https://gitlab.switch.ch/memoriav/memobase-2020/services/import-process/media-distributor-service), which in turn directly reads from the collections directory on the sFTP server. The fetched files are written to the respective media file directory. In the case of the Memobase workflow these directories are directly mounted in the service containers which need them.
- The main application, responsible for communicating with [Kafka](https://kafka.apache.org),
[Fedora](https://duraspace.org/fedora/) and the
[Mediaserver](https://gitlab.switch.ch/memoriav/memobase-2020/services/streaming-server) DB as well as handling the
media files
- ffmpeg for transcoding video and audio
- imagemagick for transcoding images
## Conversions
For a comprehensive list consult the [Dockerfile](./Dockerfile).
* Audio files: Repackages files in an mpeg4 container with the help of `ffmpeg` and sets the moov atom at the beginning of the file (`-movflags faststart`)
* Image files: Copies files as-is
* Video files: Copies files as-is
## Creating snippets
In order to provide content for sonograms and a teaser on the frontend, small snippets of the first x seconds from the audio files are produced. A relatively small snippet size helps to avoid getting only a solid black bar as a sonogram (which would be the case if one compresses a sonogram of a lengthy audio track to a width which fits the icon size used in the frontend).
## Configuration
In order to configure the application, you can either modify the commented
[example configuration file](./src/main/resources/app.yml) and deploy it as a configuration in your cluster or set the
respective environment variables. For a comprehensive list of possible variables for the Kafka producer settings see
[here](https://gitlab.switch.ch/memoriav/memobase-2020/libraries/service-utilities/-/blob/master/src/main/kotlin/settings/KafkaSettings.kt#L124).
\ No newline at end of file
In order to work as expected, the service needs to have a couple of environment variables set:
* `KAFKA_BOOTSTRAP_SERVERS`: Comma-separated list of Kafka bootstrap server addresses
* `TOPIC_IN`: Kafka topic where incoming topics are read from
* `TOPIC_PROCESS`: Kafka topic where status reports are written to
* `CLIENT_ID`: Kafka client id
* `GROUP_ID`: Kafka consumer group id
* `AUDIO_SNIPPET_DURATION`: Number of seconds which are taken from the beginning of the audio track to produce the snippet
* `EXTERNAL_BASE_URL`: Base URL under which the resource is available
* `MEDIA_FOLDER_ROOT_PATH`: Path to the mounted media folder (i.e. the folder where the media files are copied to)
* `DISTRIBUTOR_URL`: Address of the respective Media File Distributor instance
* `CONNECTION_RETRY_AFTER_MS`: Delay in milliseconds after which a reconnection to the Media File Distributor takes place
* `CONNECTION_MAX_RETRIES`: Maximum number of connection retries
\ No newline at end of file
......@@ -8,7 +8,6 @@ k8sLimitsMemory: "11Gi"
kafkaConfigs: prod-kafka-bootstrap-servers
inputTopicName: mb-di-processed-records-prod
reportingTopicName: mb-di-reporting-prod
applicationId: prod-iiif-manifest-creator
groupId: prod-media-converter
clientId: prod-media-converter-client
......
......@@ -8,7 +8,6 @@ k8sLimitsMemory: "11Gi"
kafkaConfigs: prod-kafka-bootstrap-servers
inputTopicName: mb-di-processed-records-stage
reportingTopicName: mb-di-reporting-stage
applicationId: stage-iiif-manifest-creator
groupId: stage-media-converter
clientId: stage-media-converter-client
......
......@@ -8,7 +8,6 @@ k8sLimitsMemory: "4Gi"
kafkaConfigs: test-kafka-bootstrap-servers
inputTopicName: mb-di-processed-records-prod
reportingTopicName: mb-di-reporting-prod
applicationId: test-iiif-manifest-creator
groupId: test-media-converter
clientId: test-media-converter-client
......
......@@ -4,7 +4,6 @@ metadata:
name: "{{ .Values.k8sGroupId }}-{{ .Values.k8sName }}-{{ .Values.k8sEnvironment}}-config"
namespace: "{{ .Values.k8sNamespace }}"
data:
APPLICATION_ID: "{{ .Values.applicationId }}"
TOPIC_IN: "{{ .Values.inputTopicName }}"
TOPIC_PROCESS: "{{ .Values.reportingTopicName }}"
CLIENT_ID: "{{ .Values.clientId }}"
......@@ -14,4 +13,4 @@ data:
MEDIA_FOLDER_ROOT_PATH: "{{ .Values.mediaFolderRootPath }}"
DISTRIBUTOR_URL: "{{ .Values.distributorUrl }}"
CONNECTION_RETRY_AFTER_MS: "{{ .Values.connectionRetryAfterMs }}"
CONNECTION_MAX_RETRIES: "{{ .Values.connectionMaxRetries }}"
\ No newline at end of file
CONNECTION_MAX_RETRIES: "{{ .Values.connectionMaxRetries }}"
......@@ -16,7 +16,6 @@ k8sLimitsMemory: placeholder
kafkaConfigs: placeholder
inputTopicName: placeholder
reportingTopicName: placeholder
applicationId: placeholder
groupId: placeholder
clientId: placeholder
......@@ -29,4 +28,4 @@ mediaFolderRootPath: placeholder
mediaVolumeClaimName: placeholder
connectionRetryAfterMs: placeholder
connectionMaxRetries: placeholder
\ No newline at end of file
connectionMaxRetries: placeholder
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment