VLFeedback

A GPT-4V annotated preference dataset for large vision language models.

[Project Page] [Datasets] [Silkie Model] [Paper]

Annotation Framework

Multimodal Instruciton Source

The instructions are sampled from various domains to cover different capabilities of LVLMs

Model Pool

We construct a model pool consists of 12 LVLMs, including

GPT-4V
LLaVA-series
- LLaVA-v1.5-7B
- LLaVA-v1.5-13B
- LLaVA-RLHF-7b-v1.5-224
- LLaVA-RLHF-13b-v1.5-336
Qwen-VL-7B
IDEFICS-9b-Instruct
Fuyu-8B
InstructBLIP-serise
- InstructBLIP-Vicuna-7B
- InstructBLIP-Vicuna-13B
VisualGLM-6B
MMICL-Vicuna-13B

Silkie

We select Qwen-VL-Chat as the backbone model and perform DPO on our dataset.

Generated by DALL·E 3

The resulting model, Silkie, achieves comprehensive improvements on various benchmarks

Installation

To run our training scripts, create a virtual environment and install the dependencies first.

conda create -n silkie python=3.10  && conda activate silkie
pip install -r requirements.txt

Training

Our training scripts support both single-node and multi-node training. We provide a launch_dpo.py script that handles both cases. If you want to launch a job locally, you can use:

python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR

If you want to launch a job on a Slurm cluster, specify GPUS_PER_NODE in launch_dpo.py and run:

python launch_dpo.py --config dpo_config/example.yaml --working $WORKING_DIR --gpus $NUM_GPUS

Citations

@article{2023vlfeedback,
  author      = {Lei Li and Zhihui Xie and Mukai Li and Shunian Chen and Peiyi Wang and Liang Chen and  Yazheng Yang and  Benyou Wang and  Lingpeng Kong},
  title       = {Silkie: Preference Distillation for Large Visual Language Models},
  publisher   = {arXiv:2312.10665},
  year        = {2023}
}

Acknowledgements

We would like to thank the authors of trl and Qwen-VL for their great work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

VLFeedback

Annotation Framework

Multimodal Instruciton Source

Model Pool

Silkie

Installation

Training

Citations

Acknowledgements

Files

README.md

Latest commit

History

README.md

File metadata and controls

VLFeedback

Annotation Framework

Multimodal Instruciton Source

Model Pool

Silkie

Installation

Training

Citations

Acknowledgements