Thoughts on implementation
Following important information concerning the implementation of work-transformer
General workflow
- Needed fields are extracted or generated from CBS dump topic via MF-workflow (see below, "Needed fields in Elasticsearch")
- Create bulk messages depending of type of workflow
- Initial workflow: Add work id to KTable
- Incremental workflow: Lookup: Is key already in ES index?
- Yes: Check if resource already belongs to work cluster. If yes, ignore resource. If no, add update message together with other respective ES field values (otherwise existing values of a field are overwritten)
- No: Create bulk update message `Update message
Needed fields in Elasticsearch
The following fields must be generated by the microservice:
-
@context
: Link to context definition (constant:https://data.swissbib.ch/work/context.jsonld
) -
@id
:https://data.swissbib.ch/work/
+ value of MARC field986 $b
(only if value of field986 $a
equalsSWISSBIB
) -
@type
: Type (constant:http://bibframe.org/vocab/Work
) -
dct:contributor
: List of contributor ids (persons and organisations) of clustered resources -
bf:hasInstance
: List of ids of clustered resources -
dct:title
: Concatenated titles of clustered resources
Open questions
- When should the content of the KTable be flushed?
- How can we achieve a deletion?
Edited by Sebastian Schüpbach