Copyright (C) 2018-2022 The Open Library Foundation
This software is distributed under the terms of the Apache License, Version 2.0. See the file "LICENSE" for more information.
- Introduction
- Compiling
- Docker
- Installing the module
- Deploying the module
- Maximum upload file size and java heap memory setup
- Scalability
- Interaction with Kafka
- Other system properties
mod-data-import is responsible for uploading files (see documentation for file uploading), initial handling and sending records for further processing (see documentation for file processing).
mvn install
See that it says "BUILD SUCCESS" near the end.
Build the docker container with:
docker build -t mod-data-import .
Test that it runs with:
docker run -t -i -p 8081:8081 mod-data-import
Follow the guide of Deploying Modules sections of the Okapi Guide and Reference, which describe the process in detail.
First of all you need a running Okapi instance. (Note that specifying an explicit 'okapiurl' might be needed.)
cd .../okapi
java -jar okapi-core/target/okapi-core-fat.jar dev
We need to declare the module to Okapi:
curl -w '\n' -X POST -D - \
-H "Content-type: application/json" \
-d @target/ModuleDescriptor.json \
http://localhost:9130/_/proxy/modules
That ModuleDescriptor tells Okapi what the module is called, what services it provides, and how to deploy it.
Next we need to deploy the module. There is a deployment descriptor in
target/DeploymentDescriptor.json
. It tells Okapi to start the module on 'localhost'.
Deploy it via Okapi discovery:
curl -w '\n' -D - -s \
-X POST \
-H "Content-type: application/json" \
-d @target/DeploymentDescriptor.json \
http://localhost:9130/_/discovery/modules
Then we need to enable the module for the tenant:
curl -w '\n' -X POST -D - \
-H "Content-type: application/json" \
-d @target/TenantModuleDescriptor.json \
http://localhost:9130/_/proxy/tenants/<tenant_name>/modules
Current implementation supports only storing of the file in a LOCAL_STORAGE (file system of the module). It has a couple of implications:
- the request for processing the file can be processed only by the same instance of the module, which prevents mod-data-import from scaling
- file size that can be uploaded is limited to the java heap memory allocated to the module. It is necessary to have the size of the java heap equal to the expected max file size plus 10 percent.
File Size | Java Heap size |
---|---|
256mb | 270+ mb |
512mb | 560+ mb |
1GB | 1.1+ GB |
To initialise processing of a file user should choose a Job Profile - that information is crucial as it basically contains the instructions on what to do with the uploaded file. However, this process happens after file is uploaded and comes to mod-data-import as a separate request. External storage is required to make mod-data-import scalable. Implementation of the module has the possibility to read the configuration settings from mod-configuration. To allow multiple instance deployment, for every instance the same persistent volume must be mounted to the mount point defined by the value of data.import.storage.path property.
- data.import.storage.type - type of data storage used for uploaded files. Default value is LOCAL_STORAGE. Other implementations for storage should be added.
- data.import.storage.path - path where uploaded file will be stored
All modules involved in data import (mod-data-import, mod-source-record-manager, mod-source-record-storage, mod-inventory, mod-invoice) are communicating via Kafka directly. Therefore, to enable data import Kafka should be set up properly and all the necessary parameters should be set for the modules.
Properties that are required for mod-data-import to interact with Kafka:
- KAFKA_HOST
- KAFKA_PORT
- OKAPI_URL
- ENV(unique env ID).
Initial handling of the uploaded file means chunking it and sending records for processing in other modules. The chunk size can be adjusted for different files, otherwise default values will be used:
- "file.processing.marc.raw.buffer.chunk.size": 50 - applicable to MARC files in binary format
- "file.processing.marc.json.buffer.chunk.size": 50 - applicable to json files with MARC data in json format
- "file.processing.marc.xml.buffer.chunk.size": 10 - applicable to xml files with MARC data in xml format
- "file.processing.edifact.buffer.chunk.size": 10 - applicable to EDIFACT files
See project MODDATAIMP at the FOLIO issue tracker.
The raml-module-builder framework.
Other modules.
See project MODDATAIMP at the FOLIO issue tracker.
Other FOLIO Developer documentation is at dev.folio.org
The scripts
directory contains a shell-script, load-marc-data-into-folio.sh
, and a file with a sample of 100 MARC records, sample100.marc
. This script can be used to upload any batch of MARC files automatically, using the same sequence of WSAPI operations as the Secret Button. First, login to a FOLIO backend service using the Okapi command-line utility or any other means that leaves definitions of the Okapi URL, tenant and token in the .okapi
file in the home directory. Then run the script, naming the MARC file as its own argument:
scripts$ echo OKAPI_URL=https://folio-snapshot-stable-okapi.dev.folio.org > ~/.okapi
scripts$ echo OKAPI_TENANT=diku >> ~/.okapi
scripts$ okapi login
username: diku_admin
password: ************
Login successful. Token saved to /Users/mike/.okapi
scripts$ ./load-marc-data-into-folio.sh sample100.marc
=== Stage 1 ===
=== Stage 2 ===
=== Stage 3 ===
HTTP/2 204
date: Thu, 27 Aug 2020 11:55:28 GMT
x-okapi-trace: POST mod-authtoken-2.6.0-SNAPSHOT.73 http://10.36.1.38:9178/data-import/uploadDefinitions/123a8d01-e389-4893-a53e-cc2de846471d/processFiles.. : 202 7078us
x-okapi-trace: POST mod-data-import-1.11.0-SNAPSHOT.140 http://10.36.1.38:9175/data-import/uploadDefinitions/123a8d01-e389-4893-a53e-cc2de846471d/processFiles.. : 204 6354us
scripts$