Pilgen - (WIP: optimizing)

Python Image-Label dataset Generator for OCR

Generate

python3 generate.py --lang ug --count 100 --out-dir data/

This command will output 100 images into folder data/images/, filename pattern is 'word_{}.jpg'.format(line_num), exmaple:

data/images/word_1.jpg
data/images/word_2.jpg
...
data/images/word_100.jpg

and a gt.txt file, its content pattern is '{}\t{}'.format(filepath, word), like below:

data/images/word_1.jpg	ئانا
data/images/word_2.jpg	تىلىم
...
data/images/word_100.jpg	گۈللە

Supported languages

ug - Uyghur (Uighur)
other langs may will come

FAQ

How use your own corpus?

Ref: #2

Uyghur words are separated in image?

Ref: #2

Test

python3 test.py

Develop environment

Ubuntu 18.04.1
Python 3.6.9

Author

Salam Hiyali

Contribute

Feel free

License

MIT

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
lang/ug		lang/ug
util		util
.gitignore		.gitignore
README.md		README.md
generate.py		generate.py
test.py		test.py
tool.py		tool.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Pilgen - (WIP: optimizing)

Generate

Supported languages

FAQ

Test

Develop environment

Author

Contribute

License

About

Releases

Sponsor this project

Packages

Languages

hiyali/pilgen

Folders and files

Latest commit

History

Repository files navigation

Pilgen - (WIP: optimizing)

Generate

Supported languages

FAQ

Test

Develop environment

Author

Contribute

License

About

Topics

Resources

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Languages

Packages