UVLP

Unsupervised Vision-and-Language Pre-training via Retrieval-based Multi-Granular Alignment

Mingyang Zhou*, Licheng Yu*,Amanpreet Singh, Mengjiao Wang, Zhou Yu, Ning Zhang

This is the official repository of our CVPR 2022 (Oral) Work UVLP, a retrieval-based unsupervised vision and language pre-training framework. In this repository we provide code to support the end-to-end pre-training and finetuning for NLVR2 and RefCOCO+ Task.

Installation

To use the code, set up the conda virtual environment with the following command.

conda create -n mmf python=3.7
conda activate mmf
git clone https://github.com/zmykevin/UVLP.git
cd UVLP
pip install --editable .

Our code only supports Linux with NVIDIA GPUs. We test our code on Ubuntu 18.04 and A100 cards.

Pre-trained Checkpoints and Dataset

Download the pre-trained checkpoints and Dataset from here.

The visual features for CC is too large to be uploaded to the cloud drive. You can generate the CC features for pre-traning with the following steps:

Download the CC feature from the VinVL Repository
Change these lines based on your saved feature path (dataset_cc.json is included in the downnloaded tar file). Run the following command to generate the CC Visual Features

python data_preparation/convert_cc_vinvl.py

The code will prepare the traininng visual features of CC for you. The Validation CC Image features is included in the downloaded tar file.

Pre-training

Lauch Pretraining

After you prepare the visual features, change the dataset directory in your pretraining config file accordingly based on your saved visual feature directory. you can launch pretraining with the following command:

mmf_run config=projects/visual_bert/configs/masked_conceptual_captions/pretrain.yaml \
run_type=train_val \
model=visual_bert \
dataset=masked_conceptual_captions,masked_conceptual_captions_image_tag,masked_conceptual_captions_image_phrase,itm_conceptual_captions \ env.save_dir=/PATH/TO/SAVE \
training.fp16=True

Downstream Task Fine-tuning

NLVR2

Download data Download the nlvr2 dataset from this link. Change the config file's dataset repository based on your save path for the downloaded data.

Finetuning

mmf_run config=projects/visual_bert/configs/nlvr2/vinvl_defaults.yaml \
run_type=train_val_test \
model=visual_bert \ 
dataset=nlvr2 \
checkpoint.resume_pretrained=True \
checkpoint.resume_file=/PATH/TO/MODEL/best.ckpt \ 
env.save_dir=/PATH/TO/SAVE \
training.fp16=True

RefCOCO+

Download data Download the refcoco+ dataset from this link. Change the config file's dataset repository based on your save path for the downloaded data.

Finetuning

mmf_run config=projects/visual_bert/configs/refcoco/vinvl_defaults.yaml \
run_type=train_val_test \
model=visual_bert \
dataset=refcoco \
checkpoint.resume_pretrained=True \
checkpoint.resume_file=/PATH/TO/MODEL/best.ckpt \ 
env.save_dir=/PATH/TO/SAVE \
training.fp16=True

Citation

If you find this code useful for your research, please consider citing:

@inproceedings{zhou2022uvlp,
  author    = {Mingyang Zhou and
               Licheng Yu and
               Amanpreet Singh and
               Mengjiao Wang and
               Zhou Yu and
               Ning Zhang},
  title     = {Unsupervised Vision-and-Language Pre-training via Retrieval-based
               Multi-Granular Alignment},
  booktitle={CVPR}
  year= {2022},
}

Acknowledge

Our code is developed on top of MMF. We thank the author and the collegues at Meta AI for their helpful discussion on code implementation. We also thank the anonymous reviewers for their constructive feedback.

Liscense

BSD

Name		Name	Last commit message	Last commit date
Latest commit History 22 Commits
build		build
data_preparation		data_preparation
docs		docs
mmf.egg-info		mmf.egg-info
mmf		mmf
mmf_cli		mmf_cli
projects		projects
save		save
tests		tests
tools		tools
website		website
:q		:q
LICENSE		LICENSE
MANIFEST.in		MANIFEST.in
NOTICES		NOTICES
README.md		README.md
README_MMF.md		README_MMF.md
convert_cc_vinvl.py		convert_cc_vinvl.py
pyproject.toml		pyproject.toml
requirements.txt		requirements.txt
setup.py		setup.py
test.py		test.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

UVLP

Installation

Pre-trained Checkpoints and Dataset

Pre-training

Downstream Task Fine-tuning

NLVR2

RefCOCO+

Citation

Acknowledge

Liscense

About

Releases

Packages

Languages

License

zmykevin/UVLP

Folders and files

Latest commit

History

Repository files navigation

UVLP

Installation

Pre-trained Checkpoints and Dataset

Pre-training

Downstream Task Fine-tuning

NLVR2

RefCOCO+

Citation

Acknowledge

Liscense

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages