Python Image-Label dataset Generator for OCR
python3 generate.py --lang ug --count 100 --out-dir data/
This command will output 100
images into folder data/images/
, filename pattern is 'word_{}.jpg'.format(line_num)
, exmaple:
data/images/word_1.jpg
data/images/word_2.jpg
...
data/images/word_100.jpg
and a gt.txt
file, its content pattern is '{}\t{}'.format(filepath, word)
, like below:
data/images/word_1.jpg ئانا
data/images/word_2.jpg تىلىم
...
data/images/word_100.jpg گۈللە
- ug - Uyghur (Uighur)
- other langs may will come
- How use your own corpus?
Ref: #2
- Uyghur words are separated in image?
Ref: #2
python3 test.py
- Ubuntu 18.04.1
- Python 3.6.9
Salam Hiyali
Feel free
MIT