Skip to content

Tools for Visualizing (intermediate) OCR D results

Robert Sachunsky edited this page Jun 4, 2022 · 32 revisions

PAGE Tools

The Page Viewer is a stand alone application for viewing page layout and text content of segmentation ground truth and results of page recognition/OCR systems. The natively supported file format is PAGE XML. However, ALTO XML, FineReader XML, and HOCR can be opened as well.

The viewer shows the page layout as a transparent overlay on the document image. Text content and object attributes are displayed as tooltips.

The Page Viewer requires a Java Runtime Environment version 6 or later. Both 32 and 64 bit installations are supported. Supported platforms are: Windows, Linux, and MacOS. (https://www.primaresearch.org/tools/PAGEViewer)

Installation

  • download a pre-built release from Github
  • unzip somewhere
  • copy/symlink the startup script from your platform's subdirectory to your search PATH, probably adding --resolve-dir $PWD (or similar) to the arguments (in order to make PageViewer resolve relative image paths w.r.t. the current working directory instead of the XML file – which is more useful for OCR-D workspaces).
    For example, on Linux, add this to your ~/.bash_aliases or ~/.bashrc:
alias jpageviewer='java -jar ~/path/to/JPageViewer\ 1.4\ \(Linux\,\ 64\ bit\)/JPageViewer.jar --resolve-dir $PWD'

Usage

    # cd into workspace directory
    jpageviewer OCR-D-SEG-TESS/PAGE1.xml

(Then continue with the Open button, navigating to the next PAGE file, or close the UI and start new instance on the shell.)

Advantages

  • Scheme support: all PAGE versions, but also ALTO
  • Shows fully recursive regions, including reading order
  • Shows all hierarchy levels from Border to Glyph
  • Platforms: Win, Linux, Mac
  • Recommended usage: viewing

Drawbacks

  • Bugs related to zooming (which breaks tooltips)
  • Sometimes does not open the document if PAGE is not sufficient (although valid) – without error message indicating cause
  • Does not show AlternativeImage content
  • Does not rotate image according to annotated skew in the Page-XML file
  • Fixed colour scheme
  • No METS or directory navigation (pages have to be opened individually)

Aletheia is an advanced system for accurate and yet cost-effective analysis, recognition and annotation of scanned documents. It aids the user with a number of automated and semi-automated tools which were developed and fine-tuned based on feedback from major libraries across Europe and from their digitisation service providers which are using it in a production environment.

Cutting-edge features are, among others, the support of top-down ground truthing with sophisticated split and shrink tools as well as bottom-up ground truthing supporting the aggregation of lower-level elements to more complex structures. The integrated rules and guidelines validator, in combination with powerful correction tools, enable efficient production of highly accurate ground truth as well as standardised electronic renditions of digitised documents.

In addition, special features such as a customisable virtual keyboard and the Aletheia Sans font with extensive coverage of special characters in Unicode have been developed to support working with the complexities of historical documents. (https://www.primaresearch.org/tools/Aletheia)

Aletheia is available either as a free Lite version (only requires registration via Email) or as a Pro version (annual paid subscription, added features and support).

See also the feature comparison for both versions.

Installation

  • unzip somewhere
  • run Aletheia.exe

Advantages

  • Scheme support: all PAGE versions, but also ALTO
  • Shows fully recursive regions, including reading order
  • Shows all hierarchy levels from Border to Glyph
  • Offers lots of check/fixup tools for consistency
  • Platforms: Win
  • Recommended usage: editing and viewing
  • Some directory navigation (pages have to be opened collectively)

Drawbacks

  • Does not show AlternativeImage content
  • Does not rotate image according to annotated skew
  • Fixed colour scheme
  • No OCR-D METS navigation, but Aletheia uses its own METS format

Installation

Advantages

  • Supports editing polygons, Tables and structural Metadata-Annotations

Drawbacks

  • Not Free Software, recognition backend and trained models are proprietary
  • Commercial software
  • Does not support recent PAGE versions
  • Produces invalid PAGE-XML because of extensions in the same namespace
    (which can be repaired via transkribus-to-prima, though)
  • Enhanced name matching for images and corresponding OCR-Files

Transkribus SWT-Client is an open source alternative client based on the Transkribus desktop client.

Installation

Requires local build. For detailed instructions, please see the project's README.

Advantages

  • Supports editing polygons, Tables and structural Metadata-Annotations
  • No Registration required, only local working mode
  • Platforms: Win, Linux, Mac with recent OpenJDK included (Win64)
  • Imports recent ALTO 3.0+ with ComponentBlock elements from Tesseract-OCR 4.x+
  • Imports recent PAGE 2019
  • Enhanced name matching for images and corresponding OCR-Files

Drawbacks

  • Only supports export to the older PAGE-XML 2013 format with extensions in the same namespace
    (which can be repaired via transkribus-to-prima, though)
  • only region-line-word hierarchy, no glyphs or super regions possible

Installation

  • native: as described the README
  • Docker:
    • docker pull bertsky/larex and then as described here, e.g. docker run --rm -u 0:$GROUPS -p 8080:8080 -v path/to/workspace:/data bertsky/larex
    • docker pull maxnth/larex and then as described here, e.g. docker run --rm -u 0:$GROUPS -p 8080:8080 -v path/to/workspace:/home/books -v path/to/larex.config:/larex.config maxnth/larex

Usage

  • go to http://localhost:8080/Larex with your browser (preferably Chrome/chromium)

Advantages

  • Very efficient for large amounts of pages (fast, has keyboard shortcuts for everything), esp. for text correction
  • Offers custom auto-segmentation, including reading order
  • Variable colour scheme
  • Platforms: Linux or Docker-capable
  • Recommended usage: editing and viewing

Drawbacks

  • Does not show Border or hierarchy levels below TextLine
  • Does not show recursive regions (e.g. table contents)
  • Does not show AlternativeImage content (fixed in current dev version / v0.6)
  • Does not rotate image according to annotated skew (fixed in current dev version / v0.6)
  • No direct METS navigation (custom, flat bookpath directory structure which needs to be exported from OCR-D fileGrps via ocrd-export-larex) (fixed in current dev version / v0.6)

nw-page-editor is an application for editing ground truth information for diverse purposes related to the areas of document processing and text recognition. The edition is done interactively and visually on top of images of scanned documents. Additionally the app supports many keyboard shortcuts to allow more efficient editing, see section Application usage shortcuts.

The app is available in two variants. The first variant is as a desktop application based on the NW.js framework thus making it cross-platform. The second variant is as a web application that allows remote editing by multiple users and can be easily setup via a docker container. (https://github.com/mauvilsa/nw-page-editor)

Installation

Advantages

Drawbacks

  • Custom PAGE extensions when editing

METS Tools

An extensible viewer for OCRD mets.xml files (https://github.com/hnesk/browse-ocrd)

Installation

    sudo make deps-ubuntu
    pip install browse-ocrd

Usage

    browse-ocrd path/to/mets.xml # or open METS interactively

Advantages

  • Scheme support: OCR-D METS conventions (https://ocr-d.de/en/spec/mets)
  • Shows pages on all fileGrps, including AlternativeImages (on all hierarchy levels)
  • Shows segmentation (in PageViewer-like colour scheme), with
    • structural elements selectable (Border, ReadingOrder, Region, TextLine, Baseline, Word, Glyph)
    • mouse-over element ID and content, and exact coordinates
    • warnings where polygon path is invalid
    • AlternativeImages (including cropped and deskewed)
  • Shows concatenated text
  • Shows raw PAGE-XML with syntax highlighting
  • Can start JPageViewer for current PAGE-XML
  • Shows rendered HTML (from ocrd-dinglehopper comparison reports)
  • Allows fast zooming into/out of images or text
  • Can show multiple pages or views next to each other
  • Gives page/segment IDs in mouse-over tooltips
  • Platforms: Linux
  • Recommended usage: viewing

Drawbacks

  • No text search currently

Image Viewer and Tools

feh is an X11 image viewer aimed mostly at console users. Unlike most other viewers, it does not have a fancy GUI, but simply displays images. It is controlled via commandline arguments and configurable key/mouse actions. (https://feh.finalrewind.org/)

Installation

    sudo apt install feh

Usage

    # cd into workspace directory
    feh OCR-D-IMG-BIN/

Advantages

  • Exact zoom interpolation
  • Extensive keyboard shortcuts
  • Allows keeping zoom level across pages
  • Very versatily and fast
  • Can browse multiple files, including thumbnail mode

Drawbacks

  • No multi-page TIFF display

Installation

    sudo apt install evince

Usage

    # cd into workspace directory
    evince OCR-D-IMG-BIN/PAGE1.png

Advantages

  • Has multi-page TIFF display

Drawbacks

  • Artefacts and/or decreased sharpness in zoom interpolation
  • Cannot browse multiple files

Use ImageMagick® to create, edit, compose, or convert bitmap images. It can read and write images in a variety of formats (over 200) including PNG, JPEG, GIF, HEIC, TIFF, DPX, EXR, WebP, Postscript, PDF, and SVG. ImageMagick can resize, flip, mirror, rotate, distort, shear and transform images, adjust image colors, apply various special effects, or draw text, lines, polygons, ellipses and Bézier curves.

Installation

    sudo apt install imagemagick

Usage

    # cd into workspace directory
    identify -verbose OCR-D-IMG/*.tiff
    compare OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png
    display OCR-D-IMG-BIN1/PAGE1.png OCR-D-IMG-BIN2/PAGE1.png PAGE1-BIN1-BIN2.png

Advantages

Drawbacks

Welcome to the OCR-D wiki, a companion to the OCR-D website.

Articles and tutorials
Discussions
Expert section on OCR-D- workflows
Particular workflow steps
Recommended workflows
Workflow Guide
Videos
Section on Ground Truth
Clone this wiki locally