Skip to content

Sejong-VLI/V2T-CLIP4Caption-Reproduction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Reproducing CLIP4Caption

image

Note: The implementation is not considering the TSN sampling as in the CLIP4Caption paper. However, even without the TSN sampling, i.e., only using the original sampling method in CLIP4Clip, it is found that similar (even slightly better) performance results can be achieved as in the CLIP4Caption paper. While reproducing the results, it was observed that using the TSN sampling could not achieve the similar performance results as in the paper.

Paper: Mingkang Tang, Zhanyu Wang, Zhenhua LIU, Fengyun Rao, Dian Li, and Xiu Li. 2021. CLIP4Caption: CLIP for Video Caption. In Proceedings of the 29th ACM International Conference on Multimedia (MM '21). Association for Computing Machinery, New York, NY, USA, 4858–4862. > https://dl.acm.org/doi/10.1145/3474085.3479207

Setup

Execute below scripts in the main folder, to avoid a download conflict when doing distributed pretraining.

mkdir modules/bert-base-uncased
cd modules/bert-base-uncased/
wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased-vocab.txt
mv bert-base-uncased-vocab.txt vocab.txt
wget https://s3.amazonaws.com/models.huggingface.co/bert/bert-base-uncased.tar.gz
tar -xvf bert-base-uncased.tar.gz
rm bert-base-uncased.tar.gz
cd ../../

Prepare the conda environment:

conda create -n clip4caption python=3.6.9 tqdm boto3 requests pandas
conda activate clip4caption
pip install torch==1.10.2 torchvision --extra-index-url https://download.pytorch.org/whl/cu113
pip install git+https://github.com/Maluuba/nlg-eval.git@master
pip install pycocoevalcap
pip install pickle5
pip install opencv-python==4.5.5.62

Download the pretrained weight of UniVL:

mkdir -p ./weight
wget -P ./weight https://github.com/microsoft/UniVL/releases/download/v0/univl.pretrained.bin

Extract the Video Features

Follow the instructions written here

Training & Evaluation

The shell scripts to train and to evaluate the model is provided here. You may need to modify the scripts as per your needs.

References

This repository is implemented based on UniVL and CLIP4Clip

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published