Skip to content
Stefan Weil edited this page Sep 18, 2020 · 2 revisions

Developing OCR-D APIs

Rationale

OCR-D has focused on developing consistent command line interfaces, based on processor-provided metadata and a convention that maps mets:fileGrp/mets:file to directories and files in the file system.

While this has proven an effective and comfortable-to-develop-in set of design patterns, it is a very low level API. With phase 3 of OCR-D there will be much more emphasis on scalable solutions that are easy to deploy and integrate into existing software and workflows.

There will be a need for server-client API more abstract than the "low-level" command line interface. We should plan and implement such APIs cooperatively to ensure interoperability.

Possible API protocol

Hot Folder

Workspaces can be placed in a specific directory via FTP or similar.

Workers try to move the workspace to their local storage.

Message Queue

Workspaces are stored at a location with a URL.

Message is posted on a queue with command line and url of workspace.

HTTP/REST

JSON-RPC

Open Questions

  • How do we translate CLI to HTTP/REST calls?
  • How should workspace storage work - would a reference implementation of a "workspace repository" help?

Planned integrations

OCRD-Butler

In-house solution at Stabi Berlin:

  • Python
  • flask for web interface
  • celery/redis for jobs
  • OpenAPI (Swagger) definitions

OCR4all

Kitodo / Goobi

DSpace

Kubernetes

HPC / Singularity

Visual Library

MyCoRe

Design discussion 2020-08-06

  • CLI is the lowest-level API
  • Technology and architecture for scaling and load-balancing CLI processes is outside OCR-D's scope (backend)
  • Consider common integration guidelines:
    • Specify file format for exchangeable workflows
    • Mechanism for parameterizing workflows (for workflow "inheritance")
    • Define standard workflows that work well for certain groups of works
    • Keep standard workflows in a Git repository
    • Digitization software user interfaces should offer both workflow and language selection
    • "language" is to be understood more broadly than for ABBYY backends (which Kitodo/Goobi, Visual Library and DWork all support, so we can build on that), more based on the training material
  • Design an OCR-D HTTP interface
    • discovery of available processors, workflows, parameter sets, models
    • discovery of RAM, CPU cores, GPUs, available slots, load
    • auth: authentication and authorization
    • jobs: list all, list one
    • processor: list available, run one
    • workspace: list all, search by job/processor
    • => Define as OpenAPI/Swagger
    • => All projects with API needs should consider API development/maintenance a low-effort but continuuous task
  • An OCR-D Training HTTP interface to provide training services:

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally