Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Stitch ocr detections workflow block #765

Merged
merged 3 commits into from
Nov 1, 2024

Conversation

reiffd7
Copy link
Contributor

@reiffd7 reiffd7 commented Oct 31, 2024

Description

I created a transformation workflow block called "Stitch OCR Detections". At its core, this block combines detection class names into a text string based on where the detections are located spatially. It will be useful for object detection OCR models where class names are characters. It allows for user input to dictate how the results will look based on language of the text and size of the image.

This transformation takes OCR detection results and reconstructs the original text by:

  1. Grouping text detections into rows based on their vertical (y) positions
  2. Sorting detections within each row by horizontal (x) position
  3. Concatenating the detected text in reading order

The block supports two configurable parameters:

Reading Direction (dropdown)

  • "left_to_right": Standard left-to-right reading (e.g., English)
  • "right_to_left": Right-to-left reading (e.g., Arabic)
  • "vertical_top_to_bottom": Vertical reading from top to bottom
  • "vertical_bottom_to_top": Vertical reading from bottom to top

Tolerance (integer)

Controls how close detections need to be vertically (in pixels) to be considered part of the same line of text. A higher tolerance will group detections that are further apart vertically.

This block is particularly useful for:

  • Converting individual character/word detections into readable text
  • Reconstructing multi-line text from OCR results
  • Maintaining proper reading order of detected text elements
  • Supporting different writing systems and text orientations

Type of change

  • New feature (non-breaking change which adds functionality)

Testing

The changes have been tested through:

  1. Unit Tests

    • Tests for all reading directions
    • Edge cases (empty detections, single characters)
    • Tolerance grouping behavior
    • Multi-line text handling
  2. Integration Testing

    • Tested on local inference server
    • Created workflows with various text orientations:
      • Horizontal text
      • Vertical text
      • Multi-line text
      • Different languages/writing systems

@CLAassistant
Copy link

CLAassistant commented Oct 31, 2024

CLA assistant check
All committers have signed the CLA.

refactored block code and created unit tests

fixed some bugs in the unit tests with tolerance and vertical top to bottom

Bump version

Make linters happpy

Adding fixes for the block

discovered a bug with reading vertically. fixed it by switching initial grouping to x dimension. adjusted unit tests appropriately

Make linters happpy
@PawelPeczek-Roboflow PawelPeczek-Roboflow merged commit 727ebd0 into main Nov 1, 2024
58 checks passed
@PawelPeczek-Roboflow PawelPeczek-Roboflow deleted the stitch_ocr_detections_workflow_block branch November 1, 2024 17:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants