Skip to content

Commit

Permalink
[feat] Add VinVL checkpoint and configs (facebookresearch#1159)
Browse files Browse the repository at this point in the history
Summary:
Pull Request resolved: facebookresearch#1159

Add the VinVL model checkpoints to the model zoo.
Add the VinVL default configs.

Test Plan: Imported from OSS

Reviewed By: ebsmothers, apsdehal

Differential Revision: D32698108

Pulled By: Ryan-Qiyu-Jiang

fbshipit-source-id: 2a841d9293cee97b77a5847454e4348df99d8d00
  • Loading branch information
Ryan-Qiyu-Jiang authored and facebook-github-bot committed Dec 17, 2021
1 parent ea4a776 commit 7634dc7
Show file tree
Hide file tree
Showing 3 changed files with 89 additions and 0 deletions.
15 changes: 15 additions & 0 deletions mmf/configs/zoo/models.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -564,3 +564,18 @@ villa:
- url: mmf://models/uniter/villa.pretrained.tar.gz
file_name: villa.pretrained.tar.gz
hashcode: 7a8f31421ef644fddc99bd142a0090660573dd526a779d025253c3fd996754fc

vinvl:
defaults: ${vinvl.coco_ir}
coco_ir:
version: 1.0_2020_11_29
resources:
- url: mmf://models/vinvl/vinvl.finetuned.coco_ir.tar.gz
file_name: vinvl.finetuned.coco_ir.tar.gz
hashcode: 47b07303c15f1e78143b3e98a9be1e70af5e7e5a37552bc10792178703060afa
pretrained:
version: 1.0_2020_11_29
resources:
- url: mmf://models/vinvl/vinvl.pretrained.tar.gz
file_name: vinvl.pretrained.tar.gz
hashcode: d2bd6a96d89f8b4210b33ea792acc361243b86c332bb831d384b48f47a40d4e0
22 changes: 22 additions & 0 deletions projects/vinvl/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
# VinVL

This repository contains the code for pytorch implementation of VinVL model, released originally under this ([repo](https://github.com/microsoft/Oscar)). Please cite the following papers if you are using VinVL model from mmf:

* Zhang, P., Li, X., Hu, X., Yang, J., Zhang, L., Wang, L., ... & Gao, J. (2021). *Vinvl: Revisiting visual representations in vision-language models*. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (pp. 5579-5588). ([arXiV](https://arxiv.org/abs/2101.00529))
```
@article{li2020oscar,
title={Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks},
author={Li, Xiujun and Yin, Xi and Li, Chunyuan and Hu, Xiaowei and Zhang, Pengchuan and Zhang, Lei and Wang, Lijuan and Hu, Houdong and Dong, Li and Wei, Furu and Choi, Yejin and Gao, Jianfeng},
journal={ECCV 2020},
year={2020}
}
@article{zhang2021vinvl,
title={VinVL: Making Visual Representations Matter in Vision-Language Models},
author={Zhang, Pengchuan and Li, Xiujun and Hu, Xiaowei and Yang, Jianwei and Zhang, Lei and Wang, Lijuan and Choi, Yejin and Gao, Jianfeng},
journal={CVPR 2021},
year={2021}
}
```

Please see [https://mmf.sh/docs/projects/vilt](https://mmf.sh/docs/projects/vinvl) for more details on how to use the VinVL model.
52 changes: 52 additions & 0 deletions projects/vinvl/configs/vqa2/defaults.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
model_config:
vinvl:
do_pretraining: false
heads:
vqa2:
type: mlp
num_labels: 3129

dataset_config:
vqa2:
processors:
text_processor:
type: vinvl_text_tokenizer
params:
from_pretrained: bert-base-uncased
corrupt_probability: 0
tokenizer_config:
type: bert-base-uncased
params:
do_lower_case: true
mask_probability: 0

training:
clip_gradients: false
lr_scheduler: true
max_updates: 44000
checkpoint_interval: 4000
evaluation_interval: 4000
batch_size: 256 # 32 per GPU * 8 GPU
find_unused_parameters: false
log_interval: 1000

optimizer:
type: adam_w
params:
lr: 1e-4
eps: 1e-8
weight_decay: 1e-2


scheduler:
type: warmup_cosine
params:
num_warmup_steps: 4400
num_training_steps: ${training.max_updates}


evaluation:
metrics:
- type: vqa_accuracy
datasets:
- vqa2

0 comments on commit 7634dc7

Please sign in to comment.