- Most of the code was taken from original repo. But Model creation is different so we get different results.
Image captioning is a task that involves computer vision as well as Natural language processing. It takes an image and is able to describe whats going on in the image in plain English.
- Keras With Tensorflow back-end
- InceptionV3 for encoding
- LSTM for decoding
- Greedy as well as Beam serch was used.
- Hyper parameters used
Hyper parameter | Value |
---|---|
Embedding size | 300 |
Vocabulary size | 8256 |
Dropout | 0.5 |
Batch Size | 128 |
LSTM 1 Output | 256 |
LSTM 1 Output | 1000 |
I have also written a blog post describing my experience of implementing the project. You can find it here.
If you want to use pretrained weights for LSTM model. You can download them here.
Flickr8k dataset can be downloaded here.
Sometimes beam search do great job.
- Keras 2.1.6
- Tensorflow 1.7.0
- Numpy
- Pandas
- Pickle
- PIL
- Tqdm
1)CS231n Winter 2016 Lesson 10 Recurrent Neural Networks, Image Captioning and LSTM
2)Another implementation of image captioning model.