MLLM-DataEngine for MiniGPT4-v2

Installation

1. Prepare environment

Git clone our repository, creating a python environment and activate it via the following command

cd MiniGPT-4
conda env create -f environment.yml
conda activate minigptv

2. Prepare the pretrained LLM weights

MiniGPT-v2 is based on Llama2-chat-7b. Download the corresponding LLM weights from the following huggingface space via huggingface download.

3. Prepare the pretrained model checkpoints

Download the stage-2 pretrained MiniGPT4-v2 checkpoints from here and put it to MLLM-DataEngine-v2/MiniGPT-4/checkpoint_stage2.pth

Data Preparation

Download the dataset for finetuning the MiniGPT-v2

Download the dataset

Image source	Download path
COCO 2014 images	images captions
COCO VQA	vqa train vqa val
Visual Genome	images part1 images part2 image meta data
TextCaps	images annotations
RefCOCO	annotations
RefCOCO+	annotations
RefCOCOg	annotations
OKVQA	annotations
AOK-VQA	annotations
OCR-VQA	annotations
GQA	images annotations
Filtered flickr-30k	annotations
Multi-task conversation	annotations
Filtered unnatural instruction	annotations
LLaVA	Compelex reasoning Detailed description Conversation

MLLM-DataEngine generated data

Download MLLM-DataEngine generated data from huggingface or opendatalab, and put dataengine_minigpt4.json under:

train_dataset
└── data_engine
    └── dataengine_minigpt4.json
...

COCO captions

Download the COCO 2014 images and captions, put them as follows:

train_dataset
└── COCO2014
    ├── train
    └── coco_karpathy_train.json
...

COCO VQA

Download the vqav2 train and validation json files

├── train_dataset
│   ├── vqav2
│       ├── vqa_train.json
|       ├── vqa_val.json

Visual genome

Download visiual genome images and annotation files

train_dataset
├── vg
│   ├── VG_100K
│   ├── VG_100K_2
│   ├── region_descriptions.json
│   └── image_data.json
...

TextCaps

Download the TextCaps images and annotation files

├── train_dataset
│   ├── textcaps
│       ├── train_images
│       ├── TextCaps_0.1_train.json

RefCOCO, RefCOCO+, RefCOCOg

Download the RefCOCO, RefCOCO+, RefCOCOg annotation files

train_dataset
├── refcoco
│   ├── refcoco
│   │   ├── instances.json
│   │   ├── refs(google).p
│   │   └── refs(unc).p
│   ├── refcoco+
│   │   ├── instances.json
│   │   └── refs(unc).p
│   └── refcocog
│       ├── instances.json
│       ├── refs(google).p
│       └─── refs(und).p
...

OKVQA

train_dataset
├── okvqa
    ├── okvqa_train.json

AOK-VQA

Download the AOK-VQA annotation dataset

export AOKVQA_DIR=YOUR_DATASET_PATH
mkdir -p ${AOKVQA_DIR}
curl -fsSL https://prior-datasets.s3.us-east-2.amazonaws.com/aokvqa/aokvqa_v1p0.tar.gz | tar xvz -C ${AOKVQA_DIR}

train_dataset
├── aokvqa
    ├── aokvqa_v1p0_train.json

OCR-VQA

Download the OCR-VQA annotation files download the images with loadDataset.py script

train_dataset
├── ocrvqa
    ├── images
    ├── dataset.json

GQA

Download the GQA annotation files and images

train_dataset
├── gqa
    ├── images
    ├── train_balanced_questions.json

filtered Flickr-30k

Download filtered Flickr-30k images (fill this form on official website or from kaggle) and annotation files

train_dataset
├── filtered_flickr
│   ├── images
│   ├── captiontobbox.json
│   ├── groundedcaption.json
│   └── phrasetobbox.json
...

Multi-task conversation

Download the multi-task converstation dataset

train_dataset
├── multitask_conversation
│   └── multitask_conversation.json
...

Unnatural instruction

Download the filtered unnatural instruction annotation files (we remove the very long sentences from the original unnatural instruction dataset)

train_dataset
    ├── unnatural_instructions
        ├── filtered_unnatural_instruction.json

LLaVA

train_dataset
    ├── llava
        ├── conversation_58k.json
        ├── detail_23k.json
        ├── complex_reasoning_77k.json

Training

We perform the stage-3 training on 8xA100 gpus, which takes 8-10 hours. Run the following command to train model:

torchrun --master-port $RANDOM --nproc_per_node 8 train.py --cfg-path train_configs/minigptv2_finetune_dataengine.yaml

Evaluation

For evaluation on downstream datasets, first download evaluation dataset and put folder under MLLM-DataEngine-v2/MiniGPT-4.
Change ckpt key in eval_configs/minigptv2_benchmark_evaluation.yaml to the model you trained. Change ckpt to dataengine_minigpt4v2.pth if you want to reproduce results in paper, download model from here.

SEED-Bench

download SEED-Bench images (not video frames) and put under evaluation_dataset/SEED-Bench-image
inference on SEED-Bench

torchrun --master-port $RANDOM --nproc_per_node 1 eval_scripts/eval_vqa.py --cfg-path ./eval_configs/minigptv2_benchmark_evaluation.yaml --dataset seed

calculate results

python eval_scripts/convert_seed_for_submission_minigpt4.py \
    --annotation-file ./evaluation_dataset/seed/SEED-Bench-image.json \
    --result-file ./evaluation_results/seed.jsonl

MMBench

Inference on MMBench

torchrun --master-port $RANDOM --nproc_per_node 1 eval_scripts/eval_vqa.py --cfg-path ./eval_configs/minigptv2_benchmark_evaluation.yaml --dataset mmbench

Convert results to MMBench format

python eval_scripts/convert_mmbench_for_submission.py \
    --annotation-file evaluation_dataset/mmbench/mmbench_dev_20230712.tsv \
    --result-file evaluation_results/mmbench.jsonl \
    --output-file evaluation_results/mmbench.xlsx

Submit the results to the evaluation server

OKVQA, VizWiz, VSR

COCO2014 val: download COCO2014 validation images and put under evaluation_dataset/coco2014_val/

VizWiz: download vizwiz validation set images from here and put under evaluation_dataset/vizwiz/vizwiz_images

VSR: download VSR images from here and put under evaluation_dataset/vsr/vsr_images

torchrun --master-port $RANDOM --nproc_per_node 1 eval_scripts/eval_vqa.py --cfg-path ./eval_configs/minigptv2_benchmark_evaluation.yaml --dataset okvqa,vizwiz,vsr

Main Results

Incremental Dataset	Data Amount	SEED	MMB	OKVQA	VizWiz	VSR
None(baseline)	-	49.21	38.83	56.03	53.08	61.37
MLLM-DataEngine	270k	63.83	52.92	56.87	54.39	62.43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

MLLM-DataEngine for MiniGPT4-v2

Installation

Data Preparation

Download the dataset for finetuning the MiniGPT-v2

MLLM-DataEngine generated data

COCO captions

COCO VQA

Visual genome

TextCaps

RefCOCO, RefCOCO+, RefCOCOg

OKVQA

AOK-VQA

OCR-VQA

GQA

filtered Flickr-30k

Multi-task conversation

Unnatural instruction

LLaVA

Training

Evaluation

SEED-Bench

MMBench

OKVQA, VizWiz, VSR

Main Results

Files

README.md

Latest commit

History

README.md

File metadata and controls

MLLM-DataEngine for MiniGPT4-v2

Installation

Data Preparation

Download the dataset for finetuning the MiniGPT-v2

MLLM-DataEngine generated data

COCO captions

COCO VQA

Visual genome

TextCaps

RefCOCO, RefCOCO+, RefCOCOg

OKVQA

AOK-VQA

OCR-VQA

GQA

filtered Flickr-30k

Multi-task conversation

Unnatural instruction

LLaVA

Training

Evaluation

SEED-Bench

MMBench

OKVQA, VizWiz, VSR

Main Results