This repository contains regression test cases for the Mangatan OCR engine. It stores input images, raw OCR data (cached from Google Lens), and the manually verified "expected" output.
Test cases are organized by folder. Each test case consists of three files:
- Image:
name.png(or .jpg, .webp, .avif) - The input. - Raw Data:
name.raw.json- Cached raw OCR output (Auto-generated). - Expected Output:
name.expected.json- The corrected, merged text (Manually Edited).
- You must have the Mangatan repository cloned in a sibling directory (e.g.,
../Mangatan). - Rust installed.
- Just the text bubbles should be visible. Crop any extraneous background.
Create a new folder (or use an existing one) and drop your image file into it.
mkdir -p complex-layouts
cp ~/screenshots/page_01.png complex-layouts/
Run the make command to generate the raw OCR cache and a baseline expected file.
make generate
- What this does: It runs the Mangatan OCR logic against your new image.
- Result: It creates
page_01.raw.json(so tests don't hit the API repeatedly) andpage_01.expected.json(containing the current automatic output).
Open the generated .expected.json file in your editor. This is the most important step.
The generated file represents what the code currently does, which might be wrong (that's why you are adding a test case!).
- Fix the text: Correct any misread characters.
- Fix the merges: If two bubbles were incorrectly merged, split them into separate objects in the JSON array.
- Fix the order: Ensure the reading order is correct.
Note: Do not worry about tightBoundingBox coordinates. The test runner ignores them during comparison.
Run the validation script to ensure your changes are logically possible (i.e., you didn't add text that doesn't exist in the raw OCR).
make validate
Commit the image and the .expected.json.
Note: *.raw.json files are usually ignored by git to keep the repo size down, but check .gitignore policy.
git add complex-layouts/page_01.png complex-layouts/page_01.expected.json
git commit -m "test: add complex layout regression case"
git push
Make a pull request to add your test case to the repository.