Skip to content

Latest commit

 

History

History

mango

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

MANGO

This code repository contains the implementations of the paper MANGO: A Mask Attention Guided One-Stage Scene Text Spotter (AAAI 2021).

Preparing Dataset

Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT, ICDAR2019_MLT

The formatted training datalist can be found in demo/text_spotting/datalist

Train On Your Own Dataset

1.Firstly, download the pre-trained model, which was well trained on SynthText and SynthText_Curve).

2.Modified the paths (ann_file, img_prefix, work_dir, etc..) in the config files demo/text_spotting/mango/config/mango_r50_ete_finetune_ic13.py.

3.Run the following bash command in the command line

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/
>>> bash dist_train.sh

Notice:We provide the implementation of online validation. If you want to close it to save training time, you may modify the startup script to add --no-validate command.

Train From Scratch

If you want to re-implement the model's performance from scratch, please following these steps:

1.Firstly, pre-train the attention module using the SynthText containing character-level annotations. See demo/text_spotting/mango/configs/mango_r50_att_pretrain.py for more details.

2.Secondly, end-to-end training using the SynthText and SynthCurve containing only word-level annotations. See demo/text_spotting/mango/configs/mango_r50_ete_pretrain.py for more details.

Notice:At the beginning of training, attention module and recognition module are trained together to prevent attention module from collapsing. The pretrained model is provided as mentioned above.

3.Thirdly, Fine-tune model on the mixed real dataset (include:ICADR2013~2019, Total-Text). See demo/text_spotting/mango/configs/mango_r50_ete_finetune_ic13.py for more details.

4.Finally, Fine-tune on the ICDAR2013, ICDAR2015 and Total-Text separately for testing and evaluation.

Notice:Fine-tune on the ICDAR2015 with num_gird=60, and on the ICDAR2013 and Total-Text with num_grid=40

Offline Inference and Evaluation

We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint, lexicon_type, etc..) in the testing script, and start testing:

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> bash test_ic13.sh

The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/.

Visualization

We provide a script to visualize the intermediate output results of the model, include visualization results of segmentation, activated grid map, text pred and attention map. You can modify the paths (test_dataset, config_file, etc..) in the script, and start generating visualization results:

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> python vis.py

Some visualization results are shown:

./vis/img_154_seg.jpg ./vis/img_154_cate.jpg ./vis/img_154_text.jpg ./vis/img_154_cma.gif

Trained Model Download

All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection. So, the results might be slightly different from reported results.

Results on various datasets and trained models download:

Pipeline Pretrained-Dataset Links
resnet50+fpn+CMA+lstm SynthText
SynthCurve

cfg , pth (Access Code: S50M)

resnet101+fpn+CMA+lstm SynthText
SynthCurve

cfg , pth (Access Code: 6uc3)

Dataset Backbone Pretrained Mix-Finetune Specific-Finetune Test Scale End-to-End Word Spotting Links
General Weak Strong General Weak Strong
ICDAR2013
(Reported)
ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text
None L-1440 86.9 90.0 90.5 90.1 94.1 94.8 -
ICDAR2013 ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
None L-1440 84.9 88.6 89.5 88.4 92.7 93.7

cfg , pth (Access Code: Al5m)

ICDAR2013 ResNet-101 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
None L-1440 88 90.3 90.4 90.7 93.8 94.0

cfg , pth (Access Code: SS27)

ICDAR2015 (Reported) ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text
ICDAR2015 L-1800 67.3 78.9 81.8 70.3 83.1 86.4 -
ICDAR2015 ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
ICDAR2015 L-1800 70.8 77.4 80.7 73.8 81.1 85

cfg , pth (Access Code: 6pdl)

ICDAR2015 ResNet-101 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
ICDAR2015 L-1800 72.8 79.8 82.4 75.7 83.4 86.6

cfg , pth (Access Code: 1J0F)

Dataset Backbone Pretrained Mix-Finetune Specific-Finetune Test Scale End-to-End Word Spotting Links
None Full None Full
Total-Text (Reported) ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text
Total-Text L-1600 - - 72.9 83.6 -
Total-Text ResNet-50 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
Total-Text L-1600 68.9 78.9 71.7 82.7

cfg , pth (Access Code: 4PwC)

Total-Text ResNet-101 SynthText
SynthCurve
ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text
Total-Text L-1600 70.2 79.9 73 83.9

cfg , pth (Access Code: H32x)

Citation

If you find this repository is helpful to your research, please feel free to cite us:

@inproceedings{qiao2021mango,
  title={MANGO: A Mask Attention Guided One-Stage Scene Text Spotter},
  author={Qiao, Liang and Chen, Ying and Cheng, Zhanzhan and Xu, Yunlu and Niu, Yi and Pu, Shiliang and Wu, Fei},
  booktitle={Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI)},
  pages={2467-2476},
  year={2021}
}

License

This project is released under the Apache 2.0 license

Contact

If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.