Workflow Guide post correction

In this processing step, the recognized text is corrected by statistical error modelling, language modelling, and word modelling (dictionaries, morphology and orthography).

Note: Most tools benefit strongly from input which includes alternative OCR hypotheses. Currently, models for ocrd-cor-asv-ann-process are optimised for input from specific OCR models, whereas ocrd-cis-postcorrect expects input from multi-OCR alignment. For more information, see this presentation at vDHd 2021 (held on 23rd May 2021) (slides / video in German)

Note: There is some overlap with text alignment here, which can also be used (or contribute to) post-correction.

Available processors

Processor	Parameter	Remarks	Call
ocrd-cor-asv-ann-process	`-P textequiv_level word -P model_file modelname`	Pre-trained models can be found here and here or downloaded via the OCR-D resource manager; If you didn't download the model with `resmgr`, for `model_file` you need to pass the local filesystem path as parameter value. (Relative paths are resolved from the workspace directory or the environment variable `CORASVANN_DATA`.) There is no default `model_file`.	`ocrd-cor-asv-ann-process -I OCR-D-OCR -O OCR-D-PROCESS -P textequiv_level word -P model_file /path/to/model/model.h5`
ocrd-cis-postcorrect	`-P profilerPath /path/to/profiler.bash -P profilerConfig ignored -P nOCR 2 -P model /path/to/model/model.zip`	The `profilerConfig` parameters can be specified in a JSON file. If you do not want to use a profiler, you can set the value for `profilerConfig` to `ignored`. In this case, your `profiler.bash` should look like this: `#!/bin/bash cat > /dev/null echo '{}'` For `model` you need to pass the local filesystem path as parameter value. There is no default `model`.	`ocrd-cis-postcorrect -I OCR-D-ALIGN -O OCR-D-CORRECT -p postcorrect.json`

Notes on parameter usage

E.g.

which parameters do you use with what values?
which parameters are insufficiently documented?
which aspects of a processor should be parameterizable but are not?

Notes on document-specific usage

E.g. which processors worked best with what material? -- feel free to post sample images here, too.

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials

Discussions

Expert section on OCR-D- workflows

Particular workflow steps

Recommended workflows

Successful Workflows for Particular Material (Template)

Workflow Guide

Videos

Section on Ground Truth

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Workflow Guide post correction

Available processors

Notes on parameter usage

Notes on document-specific usage

Clone this wiki locally