Bottom-Up Top-Down Attention for Image Captioning and Visual Question Answering (pytorch implementation)
This repository aims on implementing this CVPR2018 paper: using PyTorch.
For simplification, region detection is done using YOLOv3 and only the image captioning model is implemented.
requirements:
hogehoge