This is the implementation of the paper "SEED: Semantics Enhanced Encoder-Decoder Framework for Scene Text Recognition" This code is based on the aster.pytorch, we sincerely thank ayumiymk for his awesome repo and help.
PyTorch == 1.1.0
torchvision == 0.3.0
fasttext == 0.9.1
Details can be found in requirements.txt
- Download the pretrained language model (bin) from here
- Update the path in the lib/tools/create_all_synth_lmdb.py
- Run the lib/tools/create_all_synth_lmdb.py
- Note: it may result in large storage space, you can modify the datasets/dataset.py to generate the word embedding in an online way
- Update the path in train.sh, then
sh train.sh
- Update the path in the test.sh, then
sh test.sh
Checkpoint | IIIT5K | IC13-1015 | IC13-857 | IC15-1811 | IC15-2077 | SVT | SVTP | CUTE |
---|---|---|---|---|---|---|---|---|
OneDrive BaiduYun(key: x54e) | 93.4 | 93.5 | 94.5 | 79.8 | 75.8 | 88.4 | 82.0 | 84.0 |
- Existing methods replace the predicted word with the nearest lexicon word under the metric of edit distance (ED). With the semantic information, we can choose the most semantics similar (SS) word based on the nearest edit distance.
Methods | IIIT5K-50 | IIIT5K-1K | SVT-50 | IC13 | IC15 |
---|---|---|---|---|---|
ED | 99.06 | 97.87 | 96.36 | 97.44 | 87.76 |
ED + SS | 99.27 | 97.93 | 96.45 | 97.64 | 88.07 |
- Directly use word embedding from the pre-trained LM during training and inference.
IIIT5K | IC13 | IC15-1811 | IC15-2077 | SVT | SVTP | CUTE |
---|---|---|---|---|---|---|
94.6 | 93.8 | 85.0 | 79.6 | 90.9 | 84.2 | 85.4 |
- We try to use Aggregation Cross-Entropy as the global information instead of the semantics. This part of code will be released in next few days.
IIIT5K | IC13 | IC15-1811 | IC15-2077 | SVT | SVTP | CUTE |
---|---|---|---|---|---|---|
93.8 | 91.3 | 78.7 | - | 90.1 | 81.6 | 81.9 |
@inproceedings{qiao2020seed,
title={{SEED}: Semantics enhanced encoder-decoder framework for scene text recognition},
author={Qiao, Zhi and Zhou, Yu and Yang, Dongbao and Zhou, Yucan and Wang, Weiping},
booktitle={CVPR},
year={2020},
}
@article{shi2018aster,
title={{ASTER}: An attentional scene text recognizer with flexible rectification},
author={Shi, Baoguang and Yang, Mingkun and Wang, Xinggang and Lyu, Pengyuan and Yao, Cong and Bai, Xiang},
journal={TPAMI},
volume={41},
number={9},
pages={2035--2048},
year={2018},
publisher={IEEE}
}