Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Dynamic Low-Resolution Distillation

This code repository contains the implementations of the paper Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting (ECCV 2022).

Preparing Dataset

Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT.

The formatted training datalists can be found in demo/text_spotting/datalist

Train From Scratch

If you want to re-implement the model's performance from scratch, please following these steps:

1.Download the pre-trained model, which was well trained on SynthText & COCO-Text (pth (Access Code: yu09)). See demo/text_spotting/mask_rcnn_spot/readme.md for more details.

2.Train the multi-scale teacher model using the ICDAR2013, ICDAR2015, ICDAR2017-MLT and Total-Text based on the pre-trained model in step-1 (L307 in mask_rcnn_pretrain_teacher.py). The teacher model is also used as the Vanilla Multi-Scale competitors. See demo/text_spotting/dld/configs/mask_rcnn_pretrain_teacher.py for more details.

Just modify the required path in the config file (img_prefixes, ann_files, work_dir, load_from, etc.) and then run the following script:

cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/dld/
bash dist_train_teacher.sh

3.Initialize teacher and student models with the trained models obtained in step-2 (L360-361 in mask_rcnn_distill.py), and then end-to-end distill student model on the mixed real dataset (include:ICADR2013, ICDAR2015, ICDAR2017-MLT and Total-Text). The results on separate testing dataset are reported based on the same model. See demo/text_spotting/dld/configs/mask_rcnn_distill.py for more details.

Just modify the required path in the config file (img_prefixes, ann_files, work_dir, load_from, etc.) and then run the following script:

cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/dld/
bash dist_train_distill.sh

Notice:We provide the implementation of online validation, if you want to close it to save training time, you may modify the startup script to add --no-validate command.

Offline Inference and Evaluation

We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint, lexicon_type, etc..) in the testing script, and start testing. For example:

cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/tools/
bash test_ic13.sh

The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/.

Trained Model Download

All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection.

Results on various datasets and trained models download:

Dataset Training Method Input Size End-to-End Word Spotting FLOPS Links
General Weak Strong General Weak Strong
ICDAR2013 Vanilla Multi-Scale S-768 82.9 86.6 86.9 86.3 91.0 91.4 142.9G

cfg , pth (Access Code: BD63)

ICDAR2013 DLD (γ=0.1) Dynamic 82.7 85.7 86.5 86.1 89.9 90.9 71.5G

cfg , pth (Access Code: 32Y9)

ICDAR2013 DLD (γ=0.3) Dynamic 81.6 84.4 85.6 84.9 88.6 90.0 41.6G

cfg , pth (Access Code: Vi12)

ICDAR2015 Vanilla Multi-Scale S-1280 69.5 74.4 78.0 71.7 77.2 81.4 517.2G

cfg , pth (Access Code: BD63)

ICDAR2015 DLD (γ=0.1) Dynamic 70.9 75.7 79.0 73.3 78.6 82.4 298.8G

cfg , pth (Access Code: 32Y9)

ICDAR2015 DLD (γ=0.3) Dynamic 69.3 73.5 78.1 71.2 76.4 81.1 148.3G

cfg , pth (Access Code: Vi12)

Dataset Training Method Input Size End-to-End Word Spotting Links
None Full None Full
Total-Text Vanilla Multi-Scale S-896 62.3 71.4 65.2 75.9 206.7G

cfg , pth (Access Code: BD63)

Total-Text DLD (γ=0.1) Dynamic 63.9 73.7 66.4 77.8 103.0G

cfg , pth (Access Code: 32Y9)

Total-Text DLD (γ=0.3) Dynamic 61.9 71.9 64.0 75.9 62.1G

cfg , pth (Access Code: Vi12)

Citation:

@inproceedings{chen2022dynamic,
  title={Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting},
  author={Chen, Ying and Qiao, Liang and Cheng, Zhanzhan and Pu, Shiliang and Niu, Yi and Li, Xi},
  booktitle={ECCV},
  year={2022}
}

License

This project is released under the Apache 2.0 license

Contact

If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.