Due to a scheduled upgrade to version 14.10, GitLab will be unavailabe on Monday 30.05., from 19:00 until 20:00.

README.md 1.73 KB
Newer Older
Sebastian Schüpbach's avatar
Sebastian Schüpbach committed
1
2
# Import Process Delete

3
4
5
6
7
8
This service runs through the reports of a specified service (normally
`import-process-bridge`) and assembles distinct ids of all non-fatal messages
matching certain criteria. It subsequently sends a message for each found id.
These messages are meant to be consumed by services which run deletetion
processes for certain endpoints (e.g. see [`record-deleter`](https://gitlab.
switch.ch/memoriav/memobase/services/postprocessing/record-deleter)).
Sebastian Schüpbach's avatar
Sebastian Schüpbach committed
9

10
There are a number of filter you can apply on the parsed reports:
Sebastian Schüpbach's avatar
Sebastian Schüpbach committed
11

12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
- time-based (`--created-before`, `--created-after`)
- session id-based (`--session-filter`)
- record id-based (`--record-filter`)
- record set id-based (`--record-set-filter`)
- institution id-based (`--institution-filter`)

Furthermore you have to provide a `session-id` (this is used by subsequent
services to distinguish between deletion batches) and you can optionally go for
a dry run (`--dry-run`). Beware that the latter can be ignored by services
downstream if they choose to discard the respective message header.

Finally there are a couple of more or less static settings which are set by
environmental variables:

- `KAFKA_BOOTSTRAP_SERVERS`: a comma-separated list of Kafka brokers
- `CLIENT_ID`: name of the client. This is only used as the prefix, since every
  time the service is started a random id is generated
- `TOPIC_IN`: name of input topic
- `TOPIC_OUT`: name of output topic
- `POLL_TIMEOUT`: polling timeout in milliseconds (i.e. the time the Kafka
  consumer waits for new messages before sending an empty batch, which is the
  signal for the service to terminate). **set this value with care**, because a
  too low value could lead to "message loss", while a too high value means a
  higher latency.