2020 08 06

Developing OCR-D APIs

Rationale

OCR-D has focused on developing consistent command line interfaces, based on processor-provided metadata and a convention that maps mets:fileGrp/mets:file to directories and files in the file system.

While this has proven an effective and comfortable-to-develop-in set of design patterns, it is a very low level API. With phase 3 of OCR-D there will be much more emphasis on scalable solutions that are easy to deploy and integrate into existing software and workflows.

There will be a need for server-client API more abstract than the "low-level" command line interface. We should plan and implement such APIs cooperatively to ensure interoperability.

Possible API protocol

Hot Folder

Workspaces can be placed in a specific directory via FTP or similar.

Workers try to move the workspace to their local storage.

Message Queue

Workspaces are stored at a location with a URL.

Message is posted on a queue with command line and url of workspace.

HTTP/REST

JSON-RPC

Open Questions

How do we translate CLI to HTTP/REST calls?
How should workspace storage work - would a reference implementation of a "workspace repository" help?

Planned integrations

OCRD-Butler

In-house solution at Stabi Berlin:

Python
flask for web interface
celery/redis for jobs
OpenAPI (Swagger) definitions

OCR4all

Kitodo / Goobi

DSpace

Kubernetes

HPC / Singularity

Visual Library

MyCoRe

Design discussion 2020-08-06

CLI is the lowest-level API
Technology and architecture for scaling and load-balancing CLI processes is outside OCR-D's scope (backend)
Consider common integration guidelines:
- Specify file format for exchangeable workflows
- Mechanism for parameterizing workflows (for workflow "inheritance")
- Define standard workflows that work well for certain groups of works
- Keep standard workflows in a Git repository
- Digitization software user interfaces should offer both workflow and language selection
- "language" is to be understood more broadly than for ABBYY backends (which Kitodo/Goobi, Visual Library and DWork all support, so we can build on that), more based on the training material
Design an OCR-D HTTP interface
- discovery of available processors, workflows, parameter sets, models
- discovery of RAM, CPU cores, GPUs, available slots, load
- auth: authentication and authorization
- jobs: list all, list one
- processor: list available, run one
- workspace: list all, search by job/processor
- => Define as OpenAPI/Swagger
- => All projects with API needs should consider API development/maintenance a low-effort but continuuous task
An OCR-D Training HTTP interface to provide training services:
- Revisit okralact's API and implementation
- Specify how (work-specific) ground truth should be serialized in a workspace

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly