Cross-Modal Contrastive Learning for Robust Reasoning in VQA

This repo is an implementation in PyTorch and supports METER, ViLT and ALBEF backbones.

Data preparation and pretrained models

Please follow METER and ViLT to prepare the datasets and download the pretrained checkpoints released by corresponding backbones.

Finetune on VQA data

train with single-node multi-gpu

python -m torch.distributed.launch --nproc_per_node=1 main.py \
    --vision_encoder clip16 --kwargs \
    num_workers=4 \
    data_root=path/to/datasets/in/arrows/ \
    per_gpu_batchsize=8 \
    exp_name=finetune_vqa \
    resume_from=result/official_released/meter_clip16_288_roberta_pretrain.ckpt \
    load_pretrained_meter=1 \
    draw_false_image_vqa=1 \
    model_type=cmcl \
    cl_type=nce_graph \
    nce_lambda=0.5 \
    test_only=0 \
    dataset_name=vqacp \
    max_epochs=10

train with multi-node multi-gpu

python main_dist.py --dist-url tcp://$server_ip:$port_id --world-size $num_machines --rank 0 --multiprocessing-distributed \
    --vision_encoder clip16 --kwargs \
    num_workers=4 \
    data_root=path/to/datasets/in/arrows/ \
    per_gpu_batchsize=8 \
    exp_name=finetune_vqa \
    resume_from=result/official_released/meter_clip16_288_roberta_pretrain.ckpt \
    load_pretrained_meter=1 \
    draw_false_image_vqa=1 \
    model_type=cmcl \
    cl_type=nce_graph \
    nce_lambda=0.5 \
    test_only=0 \
    dataset_name=vqacp \
    max_epochs=10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

Data preparation and pretrained models

Finetune on VQA data

train with single-node multi-gpu

train with multi-node multi-gpu

Files

README.md

Latest commit

History

README.md

File metadata and controls

Cross-Modal Contrastive Learning for Robust Reasoning in VQA

Data preparation and pretrained models

Finetune on VQA data

train with single-node multi-gpu

train with multi-node multi-gpu