mdetr

Thomas Polasek

and

facebook-github-bot

Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531

Nov 18, 2024

6569fcc · Nov 18, 2024

History

Name	Name	Last commit message	Last commit date
parent directory ..
data	data	Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531	Nov 18, 2024
tests	tests	apply Black 2024 style in fbcode (4/16)	Mar 3, 2024
utils	utils	Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531	Nov 18, 2024
LoadAndComparePretrainedWeights.ipynb	LoadAndComparePretrainedWeights.ipynb	Move all encoders/transformers to models folders (#239 )	Aug 11, 2022
MDETRTutorial.ipynb	MDETRTutorial.ipynb	MDETR tutorial notebook (#351 )	Oct 12, 2022
README.md	README.md	Fix phrase grounding example and add README (#367 )	Nov 10, 2022
loss.py	loss.py	Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531	Nov 18, 2024
matcher.py	matcher.py	Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531	Nov 18, 2024
optimizer.py	optimizer.py	Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531	Nov 18, 2024
phrase_grounding.json	phrase_grounding.json	Phrase grounding eval (#181 )	Jul 28, 2022
phrase_grounding.py	phrase_grounding.py	Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531	Nov 18, 2024
requirements.txt	requirements.txt	Bump scipy from 1.8.1 to 1.10.0 in /examples/mdetr (#435 )	Jul 10, 2023
vqa.json	vqa.json	Add VQA evaluation script (#198 )	Jul 28, 2022
vqa_eval.py	vqa_eval.py	Convert directory fbcode/torchmultimodal to use the Ruff Formatter (#531	Nov 18, 2024
vqa_finetune.py	vqa_finetune.py	Fix phrase grounding example and add README (#367 )	Nov 10, 2022

README.md

MDETR

MDETR (Kamath et al., 2021) is a multimodal reasoning model to detect objects in an image conditioned on a text query. TorchMultimodal provides example scripts for MDETR on phrase grounding and visual question answering tasks.

Prerequisites

Prior to running any of the MDETR tasks, you should

follow the TorchMultimodal installation instructions in the README.
install MDETR requirements via pip install -r examples/mdetr/requirements.txt.

Phrase grounding

In phrase grounding, the objective is to associate noun phrases in the caption of an (image, text) pair to regions in the image. Phrase grounding tasks are not straightforward to evaluate in the case where a single phrase refers to multiple distinct boxes in the image, and different papers handle this case differently. One protocol, referred to as the Any-Box Protocol, considers the prediction to be correct based on the maximal IoU value over all ground truth boxes. In this protocol, the pretrained MDETR checkpoint can be evaluated directly on the holdout set without further fine-tuning. We provide a script for evaluation of MDETR on the phrase grounding task using this protocol. For additional details, see Appendix D of the MDETR paper.

Instructions

First, make sure you have followed the prerequisites.

To run the evaluation script, you will need to download the Flickr30k dataset. This includes images, standard annotations, and additional annotations used by MDETR.

Download the Flickr30k images here. You will need to fill out the form to request access before receiving the download link in your e-mail.

# Download Flickr30k images following the link above, then
tar -xvzf flickr30k-images.tar.gz

# Note that MDETR will expect separate directories for each dataset split, but you can just create symlinks.
ln -s flickr30k-images flickr30k-images/train
ln -s flickr30k-images flickr30k-images/val
ln -s flickr30k-images flickr30k-images/test

Download the annotations and split mappings from flickr30k_entities.

wget https://github.com/BryanPlummer/flickr30k_entities/blob/master/annotations.zip
unzip annotations.zip
wget https://github.com/BryanPlummer/flickr30k_entities/blob/master/train.txt
wget https://github.com/BryanPlummer/flickr30k_entities/blob/master/val.txt
wget https://github.com/BryanPlummer/flickr30k_entities/blob/master/test.txt

Download and unzip the MDETR custom annotations.

wget https://zenodo.org/record/4729015/files/mdetr_annotations.tar.gz?download=1
tar -xvzf 'mdetr_annotations.tar.gz?download=1'

Modify the fields in phrase_grounding.json based on the locations of the files in (1) - (3). E.g. if root directory for (1)-(3) is /data, you should use

{
    "combine_datasets": ["flickr"],
    "combine_datasets_val": ["flickr"],
    "GT_type" : "separate",
    "flickr_img_path" : "/data/flickr30k-images",
   "flickr_dataset_path" : "/data/flickr30k/",
   "flickr_ann_path" : "/data/OpenSource"
  }

Run the evaluation script.

cd examples/mdetr

# Run on CPU
python phrase_grounding.py --resume https://pytorch.s3.amazonaws.com/models/multimodal/mdetr/pretrained_resnet101_checkpoint.pth --ema --eval --dataset_config phrase_grounding.json

# Run on two GPUs
CUBLAS_WORKSPACE_CONFIG=:4096:8 torchrun --nproc_per_node=2 phrase_grounding.py --resume https://pytorch.s3.amazonaws.com/models/multimodal/mdetr/pretrained_resnet101_checkpoint.pth --ema --eval --dataset_config phrase_grounding.json

VQA

Coming soon

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Files

mdetr

mdetr

README.md

MDETR

Prerequisites

Phrase grounding

Instructions

VQA

Files

mdetr

Directory actions

More options

Directory actions

More options

Latest commit

History

mdetr

Folders and files

parent directory

README.md

MDETR

Prerequisites

Phrase grounding

Instructions

VQA