A synthetic benchmark database for scene text removal is now released by Deep Learning and Vision Computing Lab of South China University of Technology. The database can be downloaded through the following links:
-
Yunpan : (link: https://pan.baidu.com/s/1wwBwgm-n2A7iykoD0i37iQ PASSWORD: vk8f) (Size = 6.3G).
-
Google Driver: (link: https://drive.google.com/open?id=1l_yJm1vWV7TF7vDcaVa7FqZLfW7ASYeo) (Size = 6.3G).
On the other hand, we collected 1000 images from the ICDAR 2017 MLT subdataset which only contains English text to enlarge real data, and the background image(label) is generated by manually erasing the text. The database can be downloaded through the following links:
-
Yunpan : (link: https://pan.baidu.com/s/1WBvB1kS1BcmgrDi9c1Me9Q PASSWORD: knr7).
-
Google Driver : (link: https://drive.google.com/file/d/1G0d6yQwYEDhJdZH-S8mYWTXltJRG3Mg1/view?usp=sharing). Note: The real scene text removal dataset can only be used for non-commercial research purpose. For scholars or organization who wants to use the database, please first fill in this Application Form and send it via email to us (lianwen.jin@gmail.com). We will give you the decompression password after your letter has been received and approved.
The training set of synthetic database consists of a total of 8000 images and the test set contains 800 images; all the training and test samples are resized to 512 × 512. The code for generating synthetic dataset and more synthetic text images as described in “Ankush Gupta, Andrea Vedaldi, Andrew Zisserman, Synthetic Data for Text localisation in Natural Images, CVPR 2016", and can be found in (https://github.com/ankush-me/SynthText). Besides, all the real scene text images are also resized to 512 × 512.
For more details, please refer to our AAAI 2019 paper. arXiv: http://arxiv.org/abs/1812.00723
- Mxnet==1.3.1
- Python2.
- NVIDA GPU+ CUDA 8.0.
- Matplotlib.
- Numpy.
- Clone this respository.
git clone https://github.com/HCIILAB/Scene-Text-Removal
You can refer to our given example to put data.
To train our model, you may need to change the path of dataset or the parameters of the network etc. Then run the following code:
python train.py \
--trainset_path=[the path of dataset] \
--checkpoint=[path save the model] \
--gpu=[use gpu] \
--lr=[Learning Rate] \
--n_epoch=[Number of iterations]
To output the generated results of the inputs, you can use the test.py. Please run the following code:
python test.py \
--test_image=[the path of test images] \
--model=[which model to be test] \
--vis=[ vis images] \
--result=[path to save the output images]
To evalution the model performace over a dataset, you can find the evaluation metrics in this website PythonCode.zip
Please download the ImageNet pretrained models vgg16 PASSWORD:8tof, and put it under
root/.mxmet/models/
Please consider to cite our paper when you use our database:
@article{zhang2019EnsNet,
title = {EnsNet: Ensconce Text in the Wild},
author = {Shuaitao Zhang∗, Yuliang Liu∗, Lianwen Jin†, Yaoxiong Huang, Songxuan Lai
joural = {AAAI}
year = {2019}
}
Suggestions and opinions of dataset of this dataset (both positive and negative) are greatly welcome. Please contact the authors by sending email to eestzhang@mail.scut.edu.cn.
The synthetic database can be only used for non-commercial research purpose.
For commercial purpose usage, please contact Dr. Lianwen Jin: lianwen.jin@gmail.com.
Copyright 2018, Deep Learning and Vision Computing Lab, South China University of Teacnology.http://www.dlvc-lab.net