The implementation of the model in paper "Skeleton Key: Image Captioning by Skeleton-Attribute Decomposition"
The model uses tensorflow, and the preprocessing of the captions requires Stanford NLP Core and you need to download COCO dataset first.
Use create_data.py to create the skeleton-attribute dataset from COCO.
Download the pre-trained model at Drive, and put the model under ./model Use run_inference.py to test the model on the 5000-split test set.