Skip to content

Workflow Guide resegmentation

Lena Hinrichsen edited this page Mar 24, 2022 · 4 revisions

In this processing step the segmented text lines can be corrected in order to reduce their overlap.

This can be done either via coordinates (polygonalizing the bounding boxes tightly around the glyphs) – which is what ocrd-cis-ocropy-resegment and ocrd-segment-project offer – or via derived images (clipping pixels that do not belong to a text line to the background color) – which is what ocrd-cis-ocropy-clip (on the line level) offers. The former is usually more accurate, but not always possible (for example, when neighbors intersect heavily, creating non-contiguous contours). The latter is only possible if no preceding workflow step has already annotated derived images (AlternativeImage references) on the line level (see also region-level clipping).

Available processors

Processor Parameter Remarks Call
ocrd-cis-ocropy-clip -P level-of-operation line ocrd-cis-ocropy-clip -I OCR-D-SEG-LINE -O OCR-D-CLIP-LINE -P level-of-operation line
ocrd-cis-ocropy-resegment ocrd-cis-ocropy-resegment -I OCR-D-SEG-LINE -O OCR-D-RESEG
ocrd-segment-project `-P level-of-operation line ocrd-segment-project -I OCR-D-SEG-LINE -O OCR-D-RESEG -P level-of-operation line

Notes on parameter usage

E.g.

  • which parameters do you use with what values?
  • which parameters are insufficiently documented?
  • which aspects of a processor should be parameterizable but are not?

Notes on document-specific usage

E.g. which processors worked best with what material? -- feel free to post sample images here, too.

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally