TensorFlow implementation of Towards Generating Stylized Image Captions via Adversarial Training.
if you use our codes or models, please cite our paper:
@article{nezami2019towards,
title={Towards Generating Stylized Image Captions via Adversarial Training},
author={Nezami, Omid Mohamad and Dras, Mark and Wan, Stephen and Paris, Cecile and Hamey, Len},
journal={arXiv preprint arXiv:1908.02943},
year={2019}
}
We pretrain our models using Microsoft COCO Dataset. Then, we train the models using SentiCap Dataset.
- Python 2.7.12
- Numpy 1.15.2
- Hickle
- Python-skimage
- Tensorflow 1.8.0
- Download Microsoft COCO Dataset including neutral image caption data and SentiCap Dataset including sentiment-bearing image caption data.
- Reseize the downloded images into [224, 224] and put them in "./images".
- Preprosses the COCO image caption data and place them in "./data/neutral". You can do this by prepro.py and the ResNet-152 network trained on ImageNet, which is generating [7,7,2048] feature map (we use the Res5c layer of the network).
- Preprosses the SentiCap image caption data and place its positve part in "./data/positive" and its negative part in "./data/negative". You can do this by prepro.py and the ResNet-152 network trained on ImageNet, which is generating [7,7,2048] feature map (we use the Res5c layer of the network).
- Pretrain the generator and discriminator using "./data/neutral". (python model_train.py)
- Train the generator and the discriminator using "./data/positive" for the positive part and "./data/negative" for the negative part. (python model_train.py)
- Dowload pretrained models and unzip the models in "./models".
- python model_test.py
BLEU-1 | BLEU-4 | METEOR | ROUGE-L | CIDEr | SPICE | |
---|---|---|---|---|---|---|
ATTEND-GAN | 56.55% | 13.05% | 18.35% | 44.45% | 62.85% | 16.05% |
ATTEND-GAN is inspired from Self-critical Sequence Training and SeqGAN in TensorFlow.