Skip to content

Captioning generated by a CNN encoder (ResNet 101) and a decoder using LTSM with attention

Notifications You must be signed in to change notification settings

LaurentVeyssier/Automatic-image-Captioning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

7 Commits
 
 
 
 
 
 

Repository files navigation

Automatic-image-Captioning

Captioning generated by a CNN encoder (ResNet 101) and a decoder using LSTM with attention and BEAM search

Description

The notebook re-uses a pre-trained model and part of the code developped by Sagar Vinodababu in his tutorial available on github: https://github.com/sgrvinod/a-PyTorch-Tutorial-to-Image-Captioning

The principles were described in the paper "Show, Attend, and Tell" (https://arxiv.org/abs/1502.03044).

The model takes an image as an input, encodes its key features (Encoder based on ResNet-101 feature detection conv blocks) and uses an LSTM RNN to decode and generate a caption for the image, word by word, using BEAM search for optimal sequence.

Additionally, the areas of the picture most relevant in the prediction of each word is highlighted and displayed as part of the result.

YOu can visit Sagar Vinadababu's tutorial for a detailed walkthrough.

Installation

The model uses pre-trained weights and word dictionary available from the author here: https://drive.google.com/open?id=189VY65I_n4RTpQnmLGj7IzVnOF6dmePC

  • collect the pretrained weights and dictionary
  • download the notebook
  • replicate following structure

  • insert your own images in an "images" folder
  • run the notebook (Pytorch).

Results

About

Captioning generated by a CNN encoder (ResNet 101) and a decoder using LTSM with attention

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published