A tool for scene text data augmentation. We provide the tool to avoid overfitting and gain robustness of models.
We are now focusing on the shape of the cropped scene text image. The next version for both detection and recognition tasks will be released later.
We recommend Anaconda to manage the version of your dependencies. For example:
conda install boost=1.67.0
Build library:
mkdir build
cd build
cmake -D CUDA_USE_STATIC_CUDA_RUNTIME=OFF ..
make
Copy the Augment.so to the target folder and follow demo.py to use the tool.
cp Augment.so ..
cd ..
python demo.py
- Distortion
- Stretch
- Perspective
To transform an image with size (H:64, W:200), it takes less than 3ms using a 2.0GHz CPU. It is possible to accelerate the process by calling multi-process batch samplers in an on-the-fly manner, such as setting "num_workers" in PyTorch.
We compare the accuracies of CRNN trained using only the corresponding small training set.
Dataset | IIIT5K | IC13 | IC15 |
---|---|---|---|
Without Data Augmentation | 40.8% | 6.8% | 8.7% |
With Data Augmentation | 53.4% | 9.6% | 24.9% |
@inproceedings{schaefer2006image,
title={Image deformation using moving least squares},
author={Schaefer, Scott and McPhail, Travis and Warren, Joe},
booktitle={ACM transactions on graphics (TOG)},
volume={25},
number={3},
pages={533--540},
year={2006},
organization={ACM}
}
The tool is the combination of @cxcxcxcx's imgwarp-opencv and @Yati Sagade's opencv-ndarray-conversion. Thanks for your contribution.
The tool is only free for academic research purposes.