This code repository contains the implementations of the paper Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting (ECCV 2022).
Original images can be downloaded from: Total-Text , ICDAR2013 , ICDAR2015, ICDAR2017_MLT.
The formatted training datalists can be found in demo/text_spotting/datalist
If you want to re-implement the model's performance from scratch, please following these steps:
1.Download the pre-trained model, which was well trained on SynthText & COCO-Text (pth (Access Code: yu09)). See demo/text_spotting/mask_rcnn_spot/readme.md
for more details.
2.Train the multi-scale teacher model using the ICDAR2013, ICDAR2015, ICDAR2017-MLT and Total-Text based on the pre-trained model in step-1 (L307 in mask_rcnn_pretrain_teacher.py
). The teacher model is also used as the Vanilla Multi-Scale competitors. See demo/text_spotting/dld/configs/mask_rcnn_pretrain_teacher.py
for more details.
Just modify the required path in the config file (img_prefixes
, ann_files
, work_dir
, load_from
, etc.) and then run the following script:
cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/dld/
bash dist_train_teacher.sh
3.Initialize teacher and student models with the trained models obtained in step-2 (L360-361 in mask_rcnn_distill.py
), and then end-to-end distill student model on the mixed real dataset (include:ICADR2013, ICDAR2015, ICDAR2017-MLT and Total-Text). The results on separate testing dataset are reported based on the same model. See demo/text_spotting/dld/configs/mask_rcnn_distill.py
for more details.
Just modify the required path in the config file (img_prefixes
, ann_files
, work_dir
, load_from
, etc.) and then run the following script:
cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/dld/
bash dist_train_distill.sh
Notice:We provide the implementation of online validation, if you want to close it to save training time, you may modify the startup script to add
--no-validate
command.
We provide a demo of forward inference and evaluation. You can modify the parameter (iou_constraint
, lexicon_type
, etc..) in the testing script, and start testing. For example:
cd $DAVAR_LAB_OCR_ROOT$/demo/text_spotting/mask_rcnn_spot/tools/
bash test_ic13.sh
The offline evaluation tool can be found in davarocr/demo/text_spotting/evaluation/
.
All of the models are re-implemented and well trained in the based on the opensourced framework mmdetection.
Results on various datasets and trained models download:
Dataset | Training Method | Input Size | End-to-End | Word Spotting | FLOPS | Links | ||||
General | Weak | Strong | General | Weak | Strong | |||||
ICDAR2013 | Vanilla Multi-Scale | S-768 | 82.9 | 86.6 | 86.9 | 86.3 | 91.0 | 91.4 | 142.9G | |
ICDAR2013 | DLD (γ=0.1) | Dynamic | 82.7 | 85.7 | 86.5 | 86.1 | 89.9 | 90.9 | 71.5G | |
ICDAR2013 | DLD (γ=0.3) | Dynamic | 81.6 | 84.4 | 85.6 | 84.9 | 88.6 | 90.0 | 41.6G | |
ICDAR2015 | Vanilla Multi-Scale | S-1280 | 69.5 | 74.4 | 78.0 | 71.7 | 77.2 | 81.4 | 517.2G | |
ICDAR2015 | DLD (γ=0.1) | Dynamic | 70.9 | 75.7 | 79.0 | 73.3 | 78.6 | 82.4 | 298.8G | |
ICDAR2015 | DLD (γ=0.3) | Dynamic | 69.3 | 73.5 | 78.1 | 71.2 | 76.4 | 81.1 | 148.3G |
Dataset | Training Method | Input Size | End-to-End | Word Spotting | Links | |||
None | Full | None | Full | |||||
Total-Text | Vanilla Multi-Scale | S-896 | 62.3 | 71.4 | 65.2 | 75.9 | 206.7G | |
Total-Text | DLD (γ=0.1) | Dynamic | 63.9 | 73.7 | 66.4 | 77.8 | 103.0G | |
Total-Text | DLD (γ=0.3) | Dynamic | 61.9 | 71.9 | 64.0 | 75.9 | 62.1G |
@inproceedings{chen2022dynamic,
title={Dynamic Low-Resolution Distillation for Cost-Efficient End-to-End Text Spotting},
author={Chen, Ying and Qiao, Liang and Cheng, Zhanzhan and Pu, Shiliang and Niu, Yi and Li, Xi},
booktitle={ECCV},
year={2022}
}
This project is released under the Apache 2.0 license
If there is any suggestion and problem, please feel free to contact the author with qiaoliang6@hikvision.com.