Skip to content
Konstantin Baierer edited this page Jul 7, 2020 · 13 revisions

OCR-D - evaluation of workflows and intermediate results

Problem statement

Which data and tools can we use to objectively measure quality and compare results of both complete workflows and individual steps (beyond final text Character Error Rate) on a non-representative sample?

Tasks

Preprocessing

  1. Pixel-by-pixel comparison (e.g. for binarization: what percentage of black pixels in the output are also black in the GT)
  2. Connected-component statistics, specialised measures (e.g. of vertical/horizontal projection profiles)
  3. RGB/Grayscale entropy or other general purpose image measures
  4. downstream segmentation quality/accuracy
  5. downstream recognition quality (gotcha)
  6. ...

Segmentation

https://github.com/OCR-D/ocrd_segment/wiki/SegmentationEvaluation

Recognition / Post-correction

  1. Edit distance of characters/words after text alignment with GT
  2. Edit distance or precision/recall after indexing GT (ignoring reading order and/or textline order and/or reading direction – "bag of words")
  3. (only for post-correction:) precision/recall of correction

complementing historical dictionaries:

Tools

PRImA LayoutEval

https://www.primaresearch.org/alternative_download_links.html

https://www.dropbox.com/s/ky53r9k79tb0ywz/LayoutEvaluation_1.9.132.zip?dl=0

(partial source code:) https://github.com/PRImA-Research-Lab/prima-layout-eval (partial documentation:) https://github.com/PRImA-Research-Lab/prima-layout-eval/blob/master/doc/liblayouteval.pdf

Dinglehopper

https://github.com/qurator-spk/dinglehopper

… text alignment CER/WER (mean) per page, visual comparison

cor-asv-ann-evaluate

https://github.com/ASVLeipzig/cor-asv-ann

… text alignment CER (mean+variance) with (multi-OCR) aggregation across pages/documents, confusion statistics, various metrics and normalization options (GT levels)

Leptonica / pyleptonica

https://github.com/jsbueno/pyleptonica

[name=Robert Sachunsky] Why does this PR appear in the list for eval tools?

[name=kba] Copy/Paste error :) The PR is essential though to move forward with a potential pyleptonica-based dewarping OCR-D wrapper

impactcentre/ocrevalUAtion

https://github.com/impactcentre/ocrevalUAtion

Data

Where can we find challenging, yet well-annotated data to test such evaluations?

structural GT (for preprocessing and segmentation)

How about OCR-D structure GT (1000pages DTA) @tboenig ?

textual GT (for recognition)

How about OCR-D structure+text GT?

Articles

Outlook

  • Which eval tools could be wrapped with an OCR-D CLI with manageable effort?
  • What methodology do we use to evaluate processors, parameters and workflows? (GT curation, error/quality aggregation/slicing across meta-data, overall CER vs step-specific measures, )
  • How should evaluation fit into an OCR-D workflow runtime management (i.e. should missing a threshold value trigger the workflow to fail? The page? Should we reuse the ValidationReport mechanics used elsewhere?)

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally