Implementation of our approach for Automatic Understanding of Visual Advertisements Challenge (1st place of 2018 challenge).
You need to have following packages:
- chainer
- chainercv
- keras
- cupy
- gensim
- nltk
- pandas
- pytables
- parse
We also provide a Dockerfile to setup dependencies.
We use Google word2vec to compute word embeddings. Download GoogleNews-vectors-negative300.bin.gz here and set WORD2VEC_PATH
.
export WORD2VEC_PATH=/path/to/Word2Vec/GoogleNews-vectors-negative300.bin
You can get the competition dataset here.
Downlaod the training/test datasets and extracut them in the data
directory.
We also use OCR results. Download the OCR results (figshare) and save in the data
directory.
Before training, pre-compute Faster-RCNN features of ad images.
VA_DATASET_ROOT=/path/to/VisualAdvertisementDataset/ python script/save_feat.py
Otherwise, you can download precomputed Faster-RCNN features (figshare), and copy to data/frcnn_feat/
.
To train our full model, run
python script/train.py --model_name ocr+vis --text_net cnn
An output directory will be made under /output/checkpoint/
, and a trained model and some other output files will be saved in the directory.
To evaluate a model, run
python script/train.py --eval /path/to/output/directory
Download two figshare items Chainer model file
and tokenizer and word embeddings
(figshare).
Copy wordvec.npy
and tokenizer.pickle
to data
directory, then run
python script/train.py --eval /path/to/directory/of/Chainer_model_file
We included some code snippets for visualization. See notebook/visualize inference.ipynb
.