Download

Datasets

We provide the extracted image region features, object tags, and the original text annotations for each downstream tasks.

path/to/azcopy copy 'https://biglmdiag.blob.core.windows.net/vinvl/datasets/TASK_NAME' <target folder> --recursive

TASK_NAME could be coco_caption, nocaps, coco_ir, vqa, gqa, nlvr2.

Pre-trained Models

We provide pre-trained Oscar+ models of Bert-base and Bert-large structures, with the name starting with base and large, respectively.

path/to/azcopy copy 'https://biglmdiag.blob.core.windows.net/vinvl/model_ckpts/TASK_NAME' <target folder> --recursive

TASK_NAME could be image_captioning (including nocaps), coco_ir, vqa, gqa, nlvr2, od_models.

The models are trained with both image region features and object tags. The image region features are extracted by the Faster R-CNN with ResNet-101, using object and attribute annotations from Visual Genome. The object tags are from: 1) the same VisualGenome model, named as -vg-labels. Or, 2) the model trained on object annotations from Open Images V5. named as -oid-labels. Or, 3) no object tags provied, serving as baseline, named as -no-labels.

Pre-exacted Image Features

For ease-of-use, we make pretrained features available for all pretraining datasets and downstream tasks. Features are stored in tsv (tab-separated-values) format that can be used in pretraining and dowstream tasks like COCO Image-Text Retrieval.

Notice that all the links below are links to a folder. We recommend using the following AzCopy command to download.

path/to/azcopy copy <folder-link> <target-address> --recursive

COCO 2014 Train/Val Image Features (~50G)

COCO 2014 Test Image Features (~16G)

COCO 2015 Test Image Features (~32G)

GQA All Image Features (~62G)

NVLR2 Train/Del/Test Image Features (~28G)

Flickr30k All Image Features (~14G)

Google Conceptual Captions Image Features (Huge, ~960G, splitted into 12 chunks)

SBU Image Features (Huge, ~280G, splitted into 4 chunks)

Open Images Detection Image Features (Huge, ~530G, splitted into 8 chunks)

Oscar+ pretraining corpus

Small corpus

Medium corpus

Large corpus

We have tried our best to make sure that there is no data contamination between pretraining corpus and test sets for downstream tasks. More specifically, we use two methods to achieve this. (1) We use the COCO Image ID of Visual Genome and Flickr30k images. (2) For COCO, Visual Genome and Flickr30k, we calucate the pair-wise l2 norm between two images after resizing them into the same size.

Note

It is recommended to download large files with AzCopy for faster speed. AzCopy executable tools can be downloaded here. Decompress the tar file and put the executable in any path. To download from any URL above, the command is:

path/to/azcopy copy <URL> <local_path>

# for example, downloading coco_caption.zip
path/to/azcopy copy https://biglmdiag.blob.core.windows.net/oscar/datasets/coco_caption.zip <local_path>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

VinVL_DOWNLOAD.md

VinVL_DOWNLOAD.md

Download

Datasets

Pre-trained Models

Pre-exacted Image Features

Oscar+ pretraining corpus

Note

Files

VinVL_DOWNLOAD.md

Latest commit

History

VinVL_DOWNLOAD.md

File metadata and controls

Download

Datasets

Pre-trained Models

Pre-exacted Image Features

Oscar+ pretraining corpus

Note