This code repository contains the implementation of a simple Mask-RCNN based Text Spotter. Many advanced text spotters are built based on such framework, e.g.,
- Mask TextSpotter: An End-to-End Trainable Neural Network for Spotting Text with Arbitrary Shapes (ECCV 2018)
- Towards Unconstrained End-to-End Text Spotting (ICCV 2019)
- All You Need Is Boundary: Toward Arbitrary-Shaped Text Spotting (AAAI 2020)
- ...
Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT.
The formatted training datalists can be found in demo/text_spotting/datalist
1.Download the pre-trained model, which was well trained on SynthText and COCO-Text.
2.Modify the paths (ann_file
, img_prefix
, work_dir
, etc..) in the config files.
3.Modify the paths in training scripting and run the following bash command in the command line
cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/
bash dist_train.sh
Notice:We provide the implementation of online validation. If you want to close it to save training time, you may modify the startup script to add
--no-validate
command.
If you want to re-implement the model's performance from scratch, please following these steps:
1.End-to-End pre-training using the SynthText and COCO-Text. See demo/text_spotting/mask_rcnn_spot/configs/mask_rcnn_r50_conv6_e2e_pretrain.py
for more details.
2.Fine-tune model on the mixed real dataset (include:ICADR2013, ICDAR2015, ICDAR2017-MLT, Total-Text). See demo/text_spotting/mask_rcnn_spot/configs/mask_rcnn_r50_conv6_e2e_finetune_ic13.py
for more details.
Notice:We provide the implementation of online validation, if you want to close it to save training time, you may modify the startup script to add
--no-validate
command.
We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint
, lexicon_type
, etc..) in the testing script, and start testing:
cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/tools/
bash test_ic13.sh
The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/
.
We provide a script to visualize the intermediate output results of the model. You can modify the paths (test_dataset
, config_file
, etc..) in the script, and start generating visualization results:
cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/tools/
python vis.py
Some visualization results are shown:
All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection.
Note: The following trained model based on mask_rcnn_r50_fpn+res32+bilstm+attention uses only synthtext pre-training, and does not use random crop, color jitter, mix-train strategy, so the reported performance is slightly worse than that of mask_rcnn_r50_fpn+conv6+bilstm+attention.
Results on various datasets and trained models download:
Pipeline | Pretrained-Dataset | Links |
mask_rcnn_r50_fpn+conv6+bilstm+attention | SynthText COCO-Text |
|
mask_rcnn_r50_fpn+res32+bilstm+attention | SynthText |
Dataset | Backbone | Pretrained | Finetune | Test Scale | End-to-End | Word Spotting | Links | ||||
General | Weak | Strong | General | Weak | Strong | ||||||
ICDAR2013 | ResNet-50 Conv-6x |
SynthText COCO-Text |
ICDAR2013 ICDAR2015 ICDAR2017_MLT Total-Text |
L-1440 | 82.1 | 85.6 | 86.1 | 85.6 | 89.9 | 90.5 | |
ICDAR2013 | ResNet-50 ResNet-32 |
SynthText | ICDAR2013 ICDAR2015 ICDAR2017_MLT Total-Text |
L-1440 | 82.7 | 86.0 | 86.6 | 86.1 | 90.4 | 91.1 | |
ICDAR2015 | ResNet-50 Conv-6x |
SynthText COCO-Text |
ICDAR2013 ICDAR2015 ICDAR2017_MLT Total-Text |
L-2000 | 66.3 | 75.3 | 78.4 | 66.7 | 78.1 | 81.7 | |
ICDAR2015 | ResNet-50 ResNet-32 |
SynthText | ICDAR2013 ICDAR2015 ICDAR2017_MLT Total-Text |
L-2000 | 62.9 | 72.2 | 75.7 | 63.5 | 75.0 | 79.1 |
Dataset | Backbone | Pretrained | Finetune | Test Scale | End-to-End | Word Spotting | Links | ||
None | Full | None | Full | ||||||
Total-Text | ResNet-50 Conv-6x |
SynthText COCO-Text |
ICDAR2013 ICDAR2015 ICDAR2017_MLT Total-Text |
L-1350 | 63.6 | 72.2 | 66.1 | 76.5 | |
Total-Text | ResNet-50 ResNet-32 |
SynthText | ICDAR2013 ICDAR2015 ICDAR2017_MLT Total-Text |
L-1350 | 62.8 | 71.5 | 65.2 | 75.8 |
@inproceedings{He_2017,
title={Mask R-CNN},
author={He, Kaiming and Gkioxari, Georgia and Dollar, Piotr and Girshick, Ross},
booktitle={2017 IEEE International Conference on Computer Vision (ICCV)},
year={2017}
}
This project is released under the Apache 2.0 license
If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.