Unverified Commit 833da6b0 authored by Sebastian Schüpbach's avatar Sebastian Schüpbach
Browse files

update manifest and README


Signed-off-by: Sebastian Schüpbach's avatarSebastian Schüpbach <sebastian.schuepbach@unibas.ch>
parent d53895cb
Pipeline #22377 passed with stages
in 5 minutes and 42 seconds
FROM openjdk:8-jre-slim-buster
ADD target/scala-2.12/app.jar /app/app.jar
ENTRYPOINT ["java", "-XX:+HeapDumpOnOutOfMemoryError", "-XX:HeapDumpPath=/data/heapdump-%p.hprof", "-jar", "/app/app.jar"]
ENTRYPOINT ["java", "-jar", "/app/app.jar"]
# Import Process Delete
Creates deletion messages for Fedora Ingester based on timestamp, session id,
collection id, institution id, or record id.
This service runs through the reports of a specified service (normally
`import-process-bridge`) and assembles distinct ids of all non-fatal messages
matching certain criteria. It subsequently sends a message for each found id.
These messages are meant to be consumed by services which run deletetion
processes for certain endpoints (e.g. see [`record-deleter`](https://gitlab.
switch.ch/memoriav/memobase/services/postprocessing/record-deleter)).
## Usage
There are a number of filter you can apply on the parsed reports:
```sh
--from <yyyy-mm-dd
--session-id
```
\ No newline at end of file
- time-based (`--created-before`, `--created-after`)
- session id-based (`--session-filter`)
- record id-based (`--record-filter`)
- record set id-based (`--record-set-filter`)
- institution id-based (`--institution-filter`)
Furthermore you have to provide a `session-id` (this is used by subsequent
services to distinguish between deletion batches) and you can optionally go for
a dry run (`--dry-run`). Beware that the latter can be ignored by services
downstream if they choose to discard the respective message header.
Finally there are a couple of more or less static settings which are set by
environmental variables:
- `KAFKA_BOOTSTRAP_SERVERS`: a comma-separated list of Kafka brokers
- `CLIENT_ID`: name of the client. This is only used as the prefix, since every
time the service is started a random id is generated
- `TOPIC_IN`: name of input topic
- `TOPIC_OUT`: name of output topic
- `POLL_TIMEOUT`: polling timeout in milliseconds (i.e. the time the Kafka
consumer waits for new messages before sending an empty batch, which is the
signal for the service to terminate). **set this value with care**, because a
too low value could lead to "message loss", while a too high value means a
higher latency.
\ No newline at end of file
......@@ -14,22 +14,18 @@ spec:
spec:
containers:
- name: import-process-administrator-container
volumeMounts:
- name: media-volume
mountPath: /data
args: # Customise to match your needs
- "--record-set-filter <id>"
- "--record-filter <id>"
- "--institution-filter <id>"
- "--session-filter <id>"
- "--created-after <datetime>"
- "--created-before <datetime>"
- "<your-session-id>"
args: [ # Customise to match your needs
"--record-set-filter", "<id>",
"--record-filter", "<id>",
"--institution-filter", "<id>",
"--session-filter", "<id>",
"--created-after", "<datetime>",
"--created-before", "<datetime>"
"<your-session-id>"
]
image: cr.gitlab.switch.ch/memoriav/memobase-2020/utilities/import-process-delete:latest
imagePullPolicy: Always
env:
- name: JOB_ID
value: import-process-delete
- name: KAFKA_BOOTSTRAP_SERVERS
value: mb-ka1.memobase.unibas.ch:9092,mb-ka2.memobase.unibas.ch:9092,mb-ka3.memobase.unibas.ch:9092
- name: CLIENT_ID
......@@ -39,11 +35,7 @@ spec:
- name: TOPIC_OUT
value: import-process-test-delete
- name: POLL_TIMEOUT
value: "60000"
value: "20000"
restartPolicy: Never
volumes:
- name: media-volume
persistentVolumeClaim:
claimName: media-volume-claim
backoffLimit: 1
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment