GitHub - yuezih/less-is-more: Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective (ACL 2024)

Checklist

What you can find in this repo:

Selective EOS Supervision
- training code
- trained model checkpoints
Scoring EOS Supervision
- data filtering code
- filtered data
CHAIR evaluation
- evaluation scripts
- our test set data
others
- generated captions in Table 1

Selective EOS Supervision

Training

Following the instruction of LLaVA to prepare the environment, data (LLaVA-Instruction-150K) and pretraining models (e.g., LLaVA-1.5-7b).

Train the model with Selective EOS Supervision. The default configuration is set to train the llava-1.5-7b model with Detail23k for one epoch.

cd LLaVA
bash scripts/v1_5/selective_eos_finetune.sh

The main modifications to the original LLaVA code for Selective EOS Supervision are detailed in ./assets/selective-eos-supervision.md.

Checkpoint

Our models (LoRA weights) finetuned with Selective EOS Supervision:

Basic Model	Finetuning Data	Checkpoint
`llava-1.5-7b`	`Detail23k`	llava-v1.5-7b-selective-23k-lora
`llava-1.5-7b`	`LLaVA-Instruction-150K`	llava-v1.5-7b-selective-150k-lora

Scoring EOS Supervision

Data Scoring

For the LLaVA codebase, due to some constraints related to deepspeed, currently I have no idea about how to efficiently score a dataset with a standalone script. Our scoring relies on the training process, i.e., for each training step:

Score the data in the minibatch and save the scores;
Cancel loss backward (can be achieved by modifying the trainer code).

The core code for data scoring are provided in ./LLaVA/llava/model/language_model/llava_llama_filter.py.

Filtered Data

Our data filtered with Scoring EOS Supervision:

Basic Data	Filtered Data
`LLaVA-Instruction-150K`	LLaVA-Instruction-150K-filtered [OneDrive]

Training

Instruction tune the LLaVA-7b model on our filtered data with:

cd LLaVA
bash scripts/finetune_qlora_filtered.sh

CHAIR Evaluation

Data

The test set used in our paper for CHAIR evaluation is provided in ./CHAIR-eval/data/chair-500.jsonl. The data is randomly sampled from the MSCOCO validation set with a random seed of 0.

CHAIR Images

We provide two ways to collect test set images:

a python script to collect images from the original MSCOCO images with softlinks. Please specify the path of your own MSCOCO image path. The script will create a folder ./CHAIR-eval/data/chair-500 for the CHAIR images.
```
python ./CHAIR-eval/prepare_data.py
```
a OneDrive link to download the 500 images. Unzip the images to ./CHAIR-eval/data/chair-500.

MSCOCO Annotation

Use the following command to download the annotation files of MSCOCO detection, which will be used for CHAIR evaluation:

cd ./CHAIR-eval/data
wget http://images.cocodataset.org/annotations/annotations_trainval2014.zip
mkdir MSCOCO
unzip -d MSCOCO/annotation annotations_trainval2014.zip

Evaluation

We provide a script for CHAIR inference and evaluation.
Set your model in the following script and then run it:

bash ./CHAIR-eval/eval.sh

MODEL_NAME: lora weights, e.g., yuezih/llava-v1.5-7b-selective-23k-lora
MODEL_BASE: base model checkpoint, e.g., liuhaotian/llava-v1.5-7b

The first-time evaluation can be slow because of the ground-truth object set construction. Subsequent evaluations will be faster with the cache.

Citation

If you find this repo helpful, please consider citing our paper:

@misc{yue2024less,
      title={Less is More: Mitigating Multimodal Hallucination from an EOS Decision Perspective}, 
      author={Zihao Yue and Liang Zhang and Qin Jin},
      year={2024},
      eprint={2402.14545},
      archivePrefix={arXiv},
      primaryClass={cs.CL}
}

Acknowledgement

This repo is built on LLaVA (models) and OPERA (CHAIR evaluation). Many thanks for their efforts. The use of our code should also follow the original licenses.

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
CHAIR-eval		CHAIR-eval
LLaVA		LLaVA
assets		assets
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Checklist

Selective EOS Supervision

Training

Checkpoint

Scoring EOS Supervision

Data Scoring

Filtered Data

Training

CHAIR Evaluation

Data

CHAIR Images

MSCOCO Annotation

Evaluation

Citation

Acknowledgement

About

Releases

Packages

Languages

License

yuezih/less-is-more

Folders and files

Latest commit

History

Repository files navigation

Checklist

Selective EOS Supervision

Training

Checkpoint

Scoring EOS Supervision

Data Scoring

Filtered Data

Training

CHAIR Evaluation

Data

CHAIR Images

MSCOCO Annotation

Evaluation

Citation

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages