-
Notifications
You must be signed in to change notification settings - Fork 7
Workflow Guide post correction
In this processing step, the recognized text is corrected by statistical error modelling, language modelling, and word modelling (dictionaries, morphology and orthography).
Note: Most tools benefit strongly from input which includes alternative OCR hypotheses. Currently, models for ocrd-cor-asv-ann-process
are optimised for input from specific OCR models, whereas ocrd-cis-postcorrect
expects input from multi-OCR alignment. For more information, see this presentation at vDHd 2021 (held on 23rd May 2021) (slides / video in German)
Note: There is some overlap with text alignment here, which can also be used (or contribute to) post-correction.
Processor | Parameter | Remarks | Call |
---|---|---|---|
ocrd-cor-asv-ann-process | -P textequiv_level word -P model_file modelname |
Pre-trained models can be found here and here or downloaded via the OCR-D resource manager;
If you didn't download the model with resmgr , for model_file you need to pass the local filesystem path
as parameter value.
(Relative paths are resolved from the workspace directory or the environment variable CORASVANN_DATA .)
There is no default model_file . |
ocrd-cor-asv-ann-process -I OCR-D-OCR -O OCR-D-PROCESS -P textequiv_level word -P model_file /path/to/model/model.h5 |
ocrd-cis-postcorrect | -P profilerPath /path/to/profiler.bash -P profilerConfig ignored -P nOCR 2 -P model /path/to/model/model.zip |
The profilerConfig parameters can be specified in a JSON file. If you do not want to use a profiler, you can set the value for profilerConfig to ignored .
In this case, your profiler.bash should look like this:
model you need to pass the local filesystem path as parameter value.
There is no default model .
|
ocrd-cis-postcorrect -I OCR-D-ALIGN -O OCR-D-CORRECT -p postcorrect.json |
E.g.
- which parameters do you use with what values?
- which parameters are insufficiently documented?
- which aspects of a processor should be parameterizable but are not?
E.g. which processors worked best with what material? -- feel free to post sample images here, too.
Welcome to the OCR-D wiki, a companion to the OCR-D website.
Articles and tutorials
- Running OCR-D on macOS
- Running OCR-D in Windows 10 with Windows Subsystem for Linux
- Running OCR-D on POWER8 (IBM pSeries)
- Running browse-ocrd in a Docker container
- OCR-D Installation on NVIDIA Jetson Nano and Xavier
- Mapping PAGE to ALTO
- Comparison of OCR formats (outdated)
- A Practicioner's View on Binarization
- How to use the bulk-add command to generate workspaces from existing files
- Evaluation of (intermediary) steps of an OCR workflow
- A quickstart guide to ocrd workspace
- Introduction to parameters in OCR-D
- Introduction to OCR-D processors
- Introduction to OCR-D workflows
- Visualizing (intermediate) OCR-D-results
- Guide to updating ocrd workspace calls for 2.15.0+
- Introduction to Docker in OCR-D
- How to import Abbyy-generated ALTO
- How to create ALTO for DFG Viewer
- How to create searchable fulltext data for DFG Viewer
- Setup native CUDA Toolkit for Qurator tools on Ubuntu 18.04
- OCR-D Code Review Guidelines
- OCR-D Recommendations for Using CI in Your Repository
Expert section on OCR-D- workflows
Particular workflow steps
Workflow Guide
- Workflow Guide: preprocessing
- Workflow Guide: binarization
- Workflow Guide: cropping
- Workflow Guide: denoising
- Workflow Guide: deskewing
- Workflow Guide: dewarping
- Workflow Guide: region-segmentation
- Workflow Guide: clipping
- Workflow Guide: line-segmentation
- Workflow Guide: resegmentation
- Workflow Guide: olr-evaluation
- Workflow Guide: text-recognition
- Workflow Guide: text-alignment
- Workflow Guide: post-correction
- Workflow Guide: ocr-evaluation
- Workflow Guide: adaptation-of-coordinates
- Workflow Guide: format-conversion
- Workflow Guide: generic transformations
- Workflow Guide: dummy processing
- Workflow Guide: archiving
- Workflow Guide: recommended workflows