-
Notifications
You must be signed in to change notification settings - Fork 7
Evaluation
Which data and tools can we use to objectively measure quality and compare results of both complete workflows and individual steps (beyond final text Character Error Rate) on a non-representative sample?
- Pixel-by-pixel comparison (e.g. for binarization: what percentage of black pixels in the output are also black in the GT)
- Connected-component statistics, specialised measures (e.g. of vertical/horizontal projection profiles)
- RGB/Grayscale entropy or other general purpose image measures
- downstream segmentation quality/accuracy
- downstream recognition quality (gotcha)
- compare image histogram data with an ideal histogram or simple histogram classification (especially grayscale images) to determine what kind/how much preprocessing is needed in terms of saturation, hue, contrast, brightness, ...
@cneud: e.g. this is the state-of-the-art for binarization evaluation for example (sadly, tools are not made open/available by the authors)
- http://www.cenparmi.concordia.ca/ICFHR2008/Proceedings/papers/cr1133.pdf
- http://docshare02.docshare.tips/files/27693/276932155.pdf
https://github.com/OCR-D/ocrd_segment/wiki/SegmentationEvaluation
- Edit distance of characters/words after text alignment with GT
- Edit distance or precision/recall after indexing GT (ignoring reading order and/or textline order and/or reading direction – "bag of words")
- (only for post-correction:) precision/recall of correction
complementing historical dictionaries:
- canonicalization of historic orthography (via hybrid stochastic and linguistic modelling) at BBAW: http://www.deutschestextarchiv.de/doku/software#cab note: has no rejection or confidence scoring, so every historical input will get an analysis
- decanonicalization of modern orthography (called historical patterns) at CIS: https://github.com/cisocrgroup/Resources/tree/master/lexica (includes references) note: is always fuzzy, so inputs will always produce many different historical candidates
https://www.primaresearch.org/alternative_download_links.html
https://www.dropbox.com/s/ky53r9k79tb0ywz/LayoutEvaluation_1.9.132.zip?dl=0
(partial source code:) https://github.com/PRImA-Research-Lab/prima-layout-eval
(partial documentation:) https://github.com/PRImA-Research-Lab/prima-layout-eval/blob/master/doc/liblayouteval.pdf
A Free Software reimplementation of PRIMA LayoutEval in Python has been begun by @bertsky and @wrznr:
- https://github.com/OCR-D/ocrd_segment/blob/master/ocrd_segment/evaluate.py
- https://github.com/OCR-D/ocrd_segment/wiki
- https://github.com/OCR-D/ocrd_segment/wiki/SegmentationEvaluation
https://github.com/qurator-spk/dinglehopper
… text alignment CER/WER (mean) per page, visual comparison
https://github.com/ASVLeipzig/cor-asv-ann
… text alignment CER (mean+variance) with (multi-OCR) aggregation across pages/documents, confusion statistics, various metrics and normalization options (GT levels)
https://github.com/eddieantonio/ocreval
... ISRI evaluation tools with Unicode fixes
https://github.com/jsbueno/pyleptonica
https://github.com/impactcentre/ocrevalUAtion
https://github.com/qurator-spk/ocrd_repair_inconsistencies
Service for Dictionary Look-ups, already 25 Languages integrated, access via WEB-API
https://github.com/languagetool-org/languagetool
Tool for OCR evaluation of large, structured data sets with +1.000 items. Includes string-based evaluations based on normalized edit distance, as well as IR-Metrics and connector for dictionary metric based on language-tool (if present in containerized mode).
https://github.com/ulb-sachsen-anhalt/digital-eval
Where can we find challenging, yet well-annotated data to test such evaluations?
How about OCR-D structure GT (1000pages DTA) @tboenig ?
How about OCR-D structure+text GT?
- Which eval tools could be wrapped with an OCR-D CLI with manageable effort?
- What methodology do we use to evaluate processors, parameters and workflows? (GT curation, error/quality aggregation/slicing across meta-data, overall CER vs step-specific measures, )
- How should evaluation fit into an OCR-D workflow runtime management (i.e. should missing a threshold value trigger the workflow to fail? The page? Should we reuse the ValidationReport mechanics used elsewhere?)
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows