MANGO

This code repository contains the implementations of the paper MANGO: A Mask Attention Guided One-Stage Scene Text Spotter (AAAI 2021).

Preparing Dataset

Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT, ICDAR2019_MLT

The formatted training datalist can be found in demo/text_spotting/datalist

Train On Your Own Dataset

1.Firstly, download the pre-trained model, which was well trained on SynthText and SynthText_Curve).

2.Modified the paths (ann_file, img_prefix, work_dir, etc..) in the config files demo/text_spotting/mango/config/mango_r50_ete_finetune_ic13.py.

3.Run the following bash command in the command line

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/
>>> bash dist_train.sh

Notice:We provide the implementation of online validation. If you want to close it to save training time, you may modify the startup script to add --no-validate command.

Train From Scratch

If you want to re-implement the model's performance from scratch, please following these steps:

1.Firstly, pre-train the attention module using the SynthText containing character-level annotations. See demo/text_spotting/mango/configs/mango_r50_att_pretrain.py for more details.

2.Secondly, end-to-end training using the SynthText and SynthCurve containing only word-level annotations. See demo/text_spotting/mango/configs/mango_r50_ete_pretrain.py for more details.

Notice:At the beginning of training, attention module and recognition module are trained together to prevent attention module from collapsing. The pretrained model is provided as mentioned above.

3.Thirdly, Fine-tune model on the mixed real dataset (include:ICADR2013~2019, Total-Text). See demo/text_spotting/mango/configs/mango_r50_ete_finetune_ic13.py for more details.

4.Finally, Fine-tune on the ICDAR2013, ICDAR2015 and Total-Text separately for testing and evaluation.

Notice:Fine-tune on the ICDAR2015 with num_gird=60, and on the ICDAR2013 and Total-Text with num_grid=40

Offline Inference and Evaluation

We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint, lexicon_type, etc..) in the testing script, and start testing:

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> bash test_ic13.sh

The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/.

Visualization

We provide a script to visualize the intermediate output results of the model, include visualization results of segmentation, activated grid map, text pred and attention map. You can modify the paths (test_dataset, config_file, etc..) in the script, and start generating visualization results:

>>> cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mango/tools/
>>> python vis.py

Some visualization results are shown:

Trained Model Download

All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection. So, the results might be slightly different from reported results.

Results on various datasets and trained models download:

Pipeline	Pretrained-Dataset	Links
resnet50+fpn+CMA+lstm	SynthText SynthCurve	cfg , pth (Access Code: S50M)
resnet101+fpn+CMA+lstm	SynthText SynthCurve	cfg , pth (Access Code: 6uc3)

Dataset

Backbone

Pretrained

Mix-Finetune

Specific-Finetune

Test Scale

End-to-End

Word Spotting

Links

General

Weak

Strong

General

Weak

Strong

ICDAR2013
(Reported)

ResNet-50

SynthText
SynthCurve

ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text

None

L-1440

86.9

90.0

90.5

90.1

94.1

94.8

-

ICDAR2013

ResNet-50

SynthText
SynthCurve

ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text

None

L-1440

84.9

88.6

89.5

88.4

92.7

93.7

cfg , pth (Access Code: Al5m)

ICDAR2013

ResNet-101

SynthText
SynthCurve

ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text

None

L-1440

88

90.3

90.4

90.7

93.8

94.0

cfg , pth (Access Code: SS27)

ICDAR2015 (Reported)

ResNet-50

SynthText
SynthCurve

ICDAR2013
ICDAR2015
ICDAR2017_MLT
COCO-Text
Total-Text

ICDAR2015

L-1800

67.3

78.9

81.8

70.3

83.1

86.4

-

ICDAR2015

ResNet-50

SynthText
SynthCurve

ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text

ICDAR2015

L-1800

70.8

77.4

80.7

73.8

81.1

85

cfg , pth (Access Code: 6pdl)

ICDAR2015

ResNet-101

SynthText
SynthCurve

ICDAR2013
ICDAR2015
ICDAR2017_MLT
ICDAR2019_MLT
Total-Text

ICDAR2015

L-1800

72.8

79.8

82.4

75.7

83.4

86.6

cfg , pth (Access Code: 1J0F)

Dataset	Backbone	Pretrained	Mix-Finetune	Specific-Finetune	Test Scale	End-to-End		Word Spotting		Links
Dataset	Backbone	Pretrained	Mix-Finetune	Specific-Finetune	Test Scale	None	Full	None	Full	Links
Total-Text (Reported)	ResNet-50	SynthText SynthCurve	ICDAR2013 ICDAR2015 ICDAR2017_MLT COCO-Text Total-Text	Total-Text	L-1600	-	-	72.9	83.6	-
Total-Text	ResNet-50	SynthText SynthCurve	ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text	Total-Text	L-1600	68.9	78.9	71.7	82.7	cfg , pth (Access Code: 4PwC)
Total-Text	ResNet-101	SynthText SynthCurve	ICDAR2013 ICDAR2015 ICDAR2017_MLT ICDAR2019_MLT Total-Text	Total-Text	L-1600	70.2	79.9	73	83.9	cfg , pth (Access Code: H32x)

Citation

If you find this repository is helpful to your research, please feel free to cite us:

@inproceedings{qiao2021mango,
  title={MANGO: A Mask Attention Guided One-Stage Scene Text Spotter},
  author={Qiao, Liang and Chen, Ying and Cheng, Zhanzhan and Xu, Yunlu and Niu, Yi and Pu, Shiliang and Wu, Fei},
  booktitle={Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence (AAAI)},
  pages={2467-2476},
  year={2021}
}

License

This project is released under the Apache 2.0 license

Contact

If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

readme.md

readme.md

MANGO

Preparing Dataset

Train On Your Own Dataset

Train From Scratch

Offline Inference and Evaluation

Visualization

Trained Model Download

Citation

License

Contact

Files

readme.md

Latest commit

History

readme.md

File metadata and controls

MANGO

Preparing Dataset

Train On Your Own Dataset

Train From Scratch

Offline Inference and Evaluation

Visualization

Trained Model Download

Citation

License

Contact