OCR dataset generator

Training data generator for Text Detection and Text Recognition. The training data will be generated following the format specified by the various supported OCR systems. The supported OCR systems are:

At the moment the datasets that can be used to generate the training data are:

FUNSD: https://guillaumejaume.github.io/FUNSD/
IAM: https://fki.tic.heia-fr.ch/databases/iam-handwriting-database
SROIE: https://paperswithcode.com/paper/icdar2019-competition-on-scanned-receipt-ocr
XFUND: https://github.com/doc-analysis/XFUND (de,es,fr,it,ja,pt,zh)

Setup

Install the requirements:

pip3 install -r requirements.txt

Generate training data

To generate the training data check the ./config/config.json first. This json file specifies:

output: the output of the training data, stored in ./output/
ocr-system: the ocr system that will be trained, the choices are doctr, mmocr, paddleocr

tasks: specify if the training data is for detection, recognition or both.

"tasks": ["det"]        # only det
"tasks": ["rec"]        # only rec
"tasks": ["det", "rec"] # both

datasets: specify which datasets are going to be used for the generation of the training data. To select the dataset just set it to y otherwise set it to n, example below:
```
"dataset1": "y",        # selected
"dataset2": {
    "sub1": "n",        # not selected
    "sub2": "y"         # selected
}
```

When everything is set up just run:

python3 generate.py

Name		Name	Last commit message	Last commit date
Latest commit History 145 Commits
config		config
docs		docs
icons		icons
scripts		scripts
.gitignore		.gitignore
README.md		README.md
generate.py		generate.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

OCR dataset generator

Setup

Generate training data

Docs

About

Releases

Packages

Languages

xReniar/OCR-Dataset-Generator

Folders and files

Latest commit

History

Repository files navigation

OCR dataset generator

Setup

Generate training data

Docs

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages