Skip to content
Uwe Hartwig edited this page May 24, 2024 · 13 revisions

OCR-D - evaluation of workflows and intermediate results

Problem statement

Which data and tools can we use to objectively measure quality and compare results of both complete workflows and individual steps (beyond final text Character Error Rate) on a non-representative sample?

Tasks

Preprocessing

  1. Pixel-by-pixel comparison (e.g. for binarization: what percentage of black pixels in the output are also black in the GT)
  2. Connected-component statistics, specialised measures (e.g. of vertical/horizontal projection profiles)
  3. RGB/Grayscale entropy or other general purpose image measures
  4. downstream segmentation quality/accuracy
  5. downstream recognition quality (gotcha)
  6. compare image histogram data with an ideal histogram or simple histogram classification (especially grayscale images) to determine what kind/how much preprocessing is needed in terms of saturation, hue, contrast, brightness, ...

@cneud: e.g. this is the state-of-the-art for binarization evaluation for example (sadly, tools are not made open/available by the authors)

Segmentation

https://github.com/OCR-D/ocrd_segment/wiki/SegmentationEvaluation

Recognition / Post-correction

  1. Edit distance of characters/words after text alignment with GT
  2. Edit distance or precision/recall after indexing GT (ignoring reading order and/or textline order and/or reading direction – "bag of words")
  3. (only for post-correction:) precision/recall of correction

complementing historical dictionaries:

Tools

PRImA LayoutEval

https://www.primaresearch.org/alternative_download_links.html

https://www.dropbox.com/s/ky53r9k79tb0ywz/LayoutEvaluation_1.9.132.zip?dl=0

(partial source code:) https://github.com/PRImA-Research-Lab/prima-layout-eval

(partial documentation:) https://github.com/PRImA-Research-Lab/prima-layout-eval/blob/master/doc/liblayouteval.pdf

ocrd-segment-evaluate

A Free Software reimplementation of PRIMA LayoutEval in Python has been begun by @bertsky and @wrznr:

Dinglehopper

https://github.com/qurator-spk/dinglehopper

… text alignment CER/WER (mean) per page, visual comparison

cor-asv-ann-evaluate

https://github.com/ASVLeipzig/cor-asv-ann

… text alignment CER (mean+variance) with (multi-OCR) aggregation across pages/documents, confusion statistics, various metrics and normalization options (GT levels)

eddieantonio / ocreval

https://github.com/eddieantonio/ocreval

... ISRI evaluation tools with Unicode fixes

Leptonica / pyleptonica

https://github.com/jsbueno/pyleptonica

impactcentre/ocrevalUAtion

https://github.com/impactcentre/ocrevalUAtion

qurator-spk/ocrd_repair_inconsistencies

https://github.com/qurator-spk/ocrd_repair_inconsistencies

language-tool

Service for Dictionary Look-ups, already 25 Languages integrated, access via WEB-API

https://github.com/languagetool-org/languagetool

digital-eval

Tool for OCR evaluation of large, structured data sets with +1.000 items. Includes string-based evaluations based on normalized edit distance, as well as IR-Metrics and connector for dictionary metric based on language-tool (if present in containerized mode).

https://github.com/ulb-sachsen-anhalt/digital-eval

Data

Where can we find challenging, yet well-annotated data to test such evaluations?

structural GT (for preprocessing and segmentation)

How about OCR-D structure GT (1000pages DTA) @tboenig ?

textual GT (for recognition)

How about OCR-D structure+text GT?

Articles

Outlook

  • Which eval tools could be wrapped with an OCR-D CLI with manageable effort?
  • What methodology do we use to evaluate processors, parameters and workflows? (GT curation, error/quality aggregation/slicing across meta-data, overall CER vs step-specific measures, )
  • How should evaluation fit into an OCR-D workflow runtime management (i.e. should missing a threshold value trigger the workflow to fail? The page? Should we reuse the ValidationReport mechanics used elsewhere?)

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally