Incentive Award at AI Challenge 2021
- Vietnamese Scene Text Recognition
- Dataset
- Detection
- Run on Ubuntu or Windows
- Run on Docker
- Training and Evaluation
- Acknowledgement
- Recognition
We use a dataset based on the VinText dataset, combined with different pictures found on Google.
Name | #Images | #Text Instances | Examples |
---|---|---|---|
VinTextV2 | 3,800 | About 92,000 |
Dataset Variant | Input Format | Download Link |
---|---|---|
Original | x1,y1,x2,y2,x3,y3,x4,y4,TRANSCRIPT | Download here |
Converted Dataset | COCO format | Download here |
Extract the data and copy the folder to Vietnamese-Language-Detection-and-Recognition/Detection/datasets/
.
datasets
└───vintext
└───test.json
└───train.json
└───train_images
└───test_images
└───evaluation
└───gt_vintext.zip
You can download our label tool here.
- python=3.7
- torch==1.8.0
conda create -n fb -y python=3.7
conda activate fb
# For CPU users
conda install pytorch==1.8.0 torchvision==0.9.0 torchaudio==0.8.0 cpuonly -c pytorch
# For GPU users
conda install pytorch=1.8.0 torchvision=0.9.0 cudatoolkit=11.1 -c pytorch -c conda-forge
python -m pip install ninja yacs cython matplotlib tqdm opencv-python shapely scipy tensorboardX pyclipper Polygon3 weighted-levenshtein editdistance
# Install Detectron2
python -m pip install 'git+https://github.com/facebookresearch/detectron2.git'
git clone https://github.com/Ton2808/Vietnamese-Language-Detection-and-Recognition.git
cd Vietnamese-Language-Detection-and-Recognition
python setup.py build develop
Prepare folders:
mkdir sample_input
mkdir sample_output
Copy your images to sample_input/
. Output images will be saved in sample_output/
.
python demo/demo.py --config-file configs/BAText/VinText/attn_R_50.yaml --input data/test_data --output sample_output/ --opts MODEL.WEIGHTS ./save_models/bbox.pth
Qualitative Results on VinText |
Download and unzip FantasticBeasts.zip
:
unzip FantasticBeasts.zip
cd FantasticBeasts/Docker
Open Docker and load docker images:
# For CPU users
docker load fantasticbeasts-aic-cpu.tar
# For GPU users with NVIDIA Docker toolkit
docker load fantasticbeasts-aic-gpu.tar
Create folders and copy the path:
mkdir test_data
mkdir submission_output
Run Docker images:
# For CPU users
docker run --mount type=bind,source=[test_data_folder_path],target=/home/ml/AIC/aicsolution/data/test_data --mount type=bind,source=[submission_output_folder_path],target=/home/ml/AIC/aicsolution/data/submission_output [IMAGE ID] /bin/bash run.sh
# For GPU users with NVIDIA Docker toolkit
nvidia-docker run -it --rm --gpus all --mount type=bind,source=[test_data_folder_path],target=/home/ml/AIC/aicsolution/data/test_data --mount type=bind,source=[submission_output_folder_path],target=/home/ml/AIC/aicsolution/data/submission_output [IMAGE ID] /bin/bash run.sh
For training, we used the pre-trained model tt_attn_R_50 from the ABCNet repository for initialization:
python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_tt_attn_R_50_checkpoint
Example:
python tools/train_net.py --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./tt_attn_R_50.pth
Trained model output will be saved in output/batext/vintext/
for evaluation.
python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS path_to_trained_model_checkpoint
Example:
python tools/train_net.py --eval-only --config-file configs/BAText/VinText/attn_R_50.yaml MODEL.WEIGHTS ./output/batext/vintext/trained_model.pth
This repository is built based on ABCNet.
Using TransformerOCR.
pip install vietocr==0.3.7
Extract the data and copy the folder to Vietnamese-Language-Detection-and-Recognition/recognition/dataset/
.
datasets
└───train
└───valid
└───train.txt
└───valid.txt
- Use detection results from the detection task. Then cut out the bounding box pictures to train and predict.
Follow this Jupyter Notebook for a quick start: Train AIC vocr-2.
This repository is built based on VietOCR.