Skip to content

Latest commit

 

History

History
270 lines (245 loc) · 9.52 KB

readme.md

File metadata and controls

270 lines (245 loc) · 9.52 KB

MANGO

This code repository contains the implementations of the paper MANGO: A Mask Attention Guided One-Stage Scene Text Spotter (AAAI 2021).

Preparing Dataset

Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT, ICDAR2019_MLT

The formatted training datalist can be found in demo/text_spotting/datalist

Train On Your Own Dataset

1.Firstly, download the pre-trained model, which was well trained on SynthText and SynthText_Curve).

2.Modified the paths (ann_file, img_prefix, work_dir, etc..) in the config files demo/text_spotting/mango/config/mango_r50_ete_finetune_ic13.py.

3.Run the following bash command in the command line

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/
>>> bash dist_train.sh

Notice:We provide the implementation of online validation. If you want to close it to save training time, you may modify the startup script to add --no-validate command.

Train From Scratch

If you want to re-implement the model's performance from scratch, please following these steps:

1.Firstly, pre-train the attention module using the SynthText containing character-level annotations. See demo/text_spotting/mango/configs/mango_r50_att_pretrain.py for more details.

2.Secondly, end-to-end training using the SynthText and SynthCurve containing only word-level annotations. See demo/text_spotting/mango/configs/mango_r50_ete_pretrain.py for more details.

Notice:At the beginning of training, attention module and recognition module are trained together to prevent attention module from collapsing. The pretrained model is provided as mentioned above.

3.Thirdly, Fine-tune model on the mixed real dataset (include:ICADR2013~2019, Total-Text). See demo/text_spotting/mango/configs/mango_r50_ete_finetune_ic13.py for more details.

4.Finally, Fine-tune on the ICDAR2013, ICDAR2015 and Total-Text separately for testing and evaluation.

Notice:Fine-tune on the ICDAR2015 with num_gird=60, and on the ICDAR2013 and Total-Text with num_grid=40

Offline Inference and Evaluation

We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint, lexicon_type, etc..) in the testing script, and start testing:

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> bash test_ic13.sh

The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/.

Visualization

We provide a script to visualize the intermediate output results of the model, include visualization results of segmentation, activated grid map, text pred and attention map. You can modify the paths (test_dataset, config_file, etc..) in the script, and start generating visualization results:

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> python vis.py

Some visualization results are shown:

./vis/img_154_seg.jpg ./vis/img_154_cate.jpg ./vis/img_154_text.jpg ./vis/img_154_cma.gif

Trained Model Download

All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection. So, the results might be slightly different from reported results.

Results on various datasets and trained models download:

Pipeline Pretrained-Dataset Links
resnet50+fpn+CMA+lstm SynthText
SynthCurve

cfg , pth (Access Code: S50M)

resnet101+fpn+CMA+lstm SynthText
SynthCurve

cfg , pth (Access Code: 6uc3)

Dataset Backbone Pretrained Mix-Finetune Specific-Finetune Test Scale End-to-End Word Spotting Links
General Weak Strong General Weak Strong
ICDAR2013
(Reported)
ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text
None L-1440 86.9 90.0 90.5 90.1 94.1 94.8 -
ICDAR2013 ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
None L-1440 84.9 88.6 89.5 88.4 92.7 93.7

cfg , pth (Access Code: Al5m)

ICDAR2013 ResNet-101 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
None L-1440 88 90.3 90.4 90.7 93.8 94.0

cfg , pth (Access Code: SS27)

ICDAR2015 (Reported) ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text
ICDAR2015 L-1800 67.3 78.9 81.8 70.3 83.1 86.4 -
ICDAR2015 ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
ICDAR2015 L-1800 70.8 77.4 80.7 73.8 81.1 85

cfg , pth (Access Code: 6pdl)

ICDAR2015 ResNet-101 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
ICDAR2015 L-1800 72.8 79.8 82.4 75.7 83.4 86.6

cfg , pth (Access Code: 1J0F)

Dataset Backbone Pretrained Mix-Finetune Specific-Finetune Test Scale End-to-End Word Spotting Links
None Full None Full
Total-Text (Reported) ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text
Total-Text L-1600 - - 72.9 83.6 -
Total-Text ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
Total-Text L-1600 68.9 78.9 71.7 82.7

cfg , pth (Access Code: 4PwC)

Total-Text ResNet-101 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
Total-Text L-1600 70.2 79.9 73 83.9

cfg , pth (Access Code: H32x)

Citation

If you find this repository is helpful to your research, please feel free to cite us:

@inproceedings{qiao2021mango,
  title={MANGO: A Mask Attention Guided One-Stage Scene Text Spotter},
  author={Qiao, Liang and Chen, Ying and Cheng, Zhanzhan and Xu, Yunlu and Niu, Yi and Pu, Shiliang and Wu, Fei},
  booktitle={Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI)},
  pages={2467-2476},
  year={2021}
}

License

This project is released under the Apache 2.0 license

Contact

If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.