Contrastive Video Question Answering via Video Graph Transformer

Abstract

This repo holds the code for our paper CoVGT accepted to IEEE T-PAMI'23. The work extends our preliminary publication at ECCV'22. We highlight the following differences compared to the conference version:

Jointly supervised and self-supervised contrastive objectives to optimize VGT.
Substitute BERT with a stronger language model (e.g., RoBERTa) for QA embedding.
Extended results on Causal-VidQA and STAR-QA and more comprehensive ablation studies.

The code is based on VGT.

Illustration of contrastive learning strategy

Todo

Release feature of other datasets. Please email the first author and specify the reason as the data is strictly for research purpose.

Environment

Assume you have installed Anaconda3, cuda version > 11.0 with gpu memory >= 24G, please do the following to setup the envs:

>conda create -n videoqa python==3.8.16
>conda activate videoqa
>git clone https://github.com/doc-doc/CoVGT.git
>pip install -r requirements.txt
>conda install pytorch==1.8.1 torchvision==0.9.1 cudatoolkit=11.1 -c pytorch -c nvidia

Preparation

Please create a data folder outside this repo, so you have two folders in your workspace 'workspace/data/' and 'workspace/CoVGT/'.

Below we use NExT-QA as an example to get you farmiliar with the code. Please download the related video feature and QA annotations according to the links provided in the Results and Resources section. Note that the QA annotations will be saved into workspace/CoVGT/datasets/nextqa/ after you clone this repo., video features into workspace/data/nextqa/ and checkpoint files into workspace/data/save_models/nextqa/. Change default paths in global_parameters.py and args.py for your own datasets.

Inference

./shell/next_test.sh 0

Evaluation

python eval_next.py --folder CoVGT_FTCoWV --mode test

Results and Resources

Table 1. VideoQA Accuracy (%) on Test Set.

Cross-Modal Pretrain	NExT-QA	Causal-VidQA	STAR	TGIF-QA (Action)	TGIF-QA (Trans)	TGIF-QA (FrameQA)	TGIF-QA-R* (Action)	TGIF-QA-R* (Trans)	MSRVTT-QA
-	59.4	59.1	44.0	94.7	97.6	61.6	60.8	73.8	38.3
WebVid0.18M	59.7	60.8	46.2	91.3	96.2	61.7	61.0	73.2	40.0
-	feats	feats	feats	feats	feats	feats	feats	feats	feats
-	videos	videos	videos	videos	videos	videos	videos	videos	videos
-	Q&A	Q&A	Q&A	Q&A	Q&A	Q&A	Q&A	Q&A	Q&A

(The feature files are identical to VGT. We have merged some files of the same dataset to avoid too many links.)

Train

We have provided all the scripts in the folder 'shells', you can start your training by specifying the GPU IDs behind the script. (If you have multiple GPUs, you can separate them with comma: ./shell/nextqa_train.sh 0,1)

./shell/nextqa_train.sh 0

It will train the model and save to the folder 'save_models/nextqa/CoVGT/'. You will get results around 60.1% and 59.4% on the val and test set respectively.

Result Visualization (NExT-QA)

Citations

@ARTICLE {xiao2023contrastive,
author = {Junbin Xiao and Pan Zhou and Angela Yao and Yicong Li and Richang Hong and Shuicheng Yan and Tat Seng Chua},
journal = {IEEE Transactions on Pattern Analysis &amp; Machine Intelligence},
title = {Contrastive Video Question Answering via Video Graph Transformer},
year = {2023},
volume = {45},
number = {11},
issn = {1939-3539},
pages = {13265-13280},
doi = {10.1109/TPAMI.2023.3292266},
publisher = {IEEE Computer Society},
address = {Los Alamitos, CA, USA},
month = {nov}
}

@inproceedings{xiao2022video,
  title={Video Graph Transformer for Video Question Answering},
  author={Xiao, Junbin and Zhou, Pan and Chua, Tat-Seng and Yan, Shuicheng},
  booktitle={European Conference on Computer Vision},
  pages={39--58},
  year={2022},
  organization={Springer}
}

Notes

If you use any resources from this repo, please kindly cite our paper and acknowledge the source.

License

This repository is released under the Apache 2.0 license as found in the LICENSE file.

Name	Name	Last commit message	Last commit date
Latest commit doc-doc Update README.md Mar 9, 2024 cbc9fa7 · Mar 9, 2024 History 50 Commits
dataloader	dataloader	provide feature extraction code for reference	Oct 24, 2023
datasets/nextqa	datasets/nextqa	provide feature extraction code for reference	Oct 24, 2023
misc	misc	initial commit	Jul 2, 2023
model	model	initial commit	Jul 2, 2023
shells	shells	initial commit	Jul 3, 2023
tools	tools	provide feature extraction code for reference	Oct 24, 2023
train	train	update requirements.txt and fix env issue	Jul 27, 2023
.gitignore	.gitignore	initial commit	Jul 2, 2023
LICENSE	LICENSE	Initial commit	Nov 15, 2022
README.md	README.md	Update README.md	Mar 9, 2024
args.py	args.py	initial commit	Nov 16, 2022
eval_next.py	eval_next.py	initial commit	Feb 24, 2023
global_parameters.py	global_parameters.py	initial commit	Jul 2, 2023
loss.py	loss.py	initial commit	Nov 16, 2022
main.py	main.py	initial commit	Jul 11, 2023
requirements.txt	requirements.txt	update requirements.txt and fix env issue	Jul 27, 2023
util.py	util.py	provide feature extraction code for reference	Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Contrastive Video Question Answering via Video Graph Transformer

Todo

Environment

Preparation

Inference

Evaluation

Results and Resources

Train

Result Visualization (NExT-QA)

Citations

Notes

License

About

Releases

Packages

Languages

License

doc-doc/CoVGT

Folders and files

Latest commit

History

Repository files navigation

Contrastive Video Question Answering via Video Graph Transformer

Todo

Environment

Preparation

Inference

Evaluation

Results and Resources

Train

Result Visualization (NExT-QA)

Citations

Notes

License

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages