Skip to content
This repository has been archived by the owner on Jan 9, 2024. It is now read-only.

Commit

Permalink
Merge pull request #1 from bertsky/patch-1
Browse files Browse the repository at this point in the history
Improve description & documented steps & "no underscores"
  • Loading branch information
mikegerber authored Nov 26, 2019
2 parents cf3ee27 + 30a59d8 commit 2205a44
Show file tree
Hide file tree
Showing 2 changed files with 13 additions and 10 deletions.
13 changes: 8 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,13 @@
# ocrd_repair_inconsistencies

Automatically fix PAGE-XML order inconsistencies in regions, lines and words.
Child elements are only reordered if reordering by coordinates
top-to-bottom/left-to-right fixes the appropriately concatenated `TextEquiv`
texts of the children to match the parent's `TextEquiv` text. This processor
does not change reading order, just the order of the XML elements in the file.
Automatically re-order lines, words and glyphs to become textually consistent with their parents.

PAGE-XML elements with textual annotation are re-ordered by their centroid coordinates
in top-to-bottom/left-to-right fashion iff such re-ordering fixes the inconsistency
between their appropriately concatenated `TextEquiv` texts with their parent's `TextEquiv` text.

This processor does not affect `ReadingOrder` between regions, just the order of the XML elements
below the region level, and only if not contradicting the annotated `textLineOrder`/`readingDirection`.

We wrote this as a one-shot script to fix some files. Use with caution.

Expand Down
10 changes: 5 additions & 5 deletions ocrd_repair_inconsistencies/ocrd-tool.json
Original file line number Diff line number Diff line change
@@ -1,21 +1,21 @@
{
"tools": {
"ocrd_repair_inconsistencies": {
"executable": "ocrd_repair_inconsistencies",
"ocrd-repair-inconsistencies": {
"executable": "ocrd-repair-inconsistencies",
"categories": [
"Layout analysis"
],
"description": "Repair glyph/word/line order inconsistencies",
"description": "Re-order glyphs/words/lines top-down-left-right when textually inconsistent with their parents",
"input_file_grp": [
"OCR-D-SEG-BLOCK"
],
"output_file_grp": [
"OCR-D-SEG-BLOCK-FIXED"
],
"steps": [
"layout/segmentation/region",
"layout/segmentation/line",
"layout/segmentation/words"
"layout/segmentation/word",
"layout/segmentation/glyph"
]
}
}
Expand Down

0 comments on commit 2205a44

Please sign in to comment.