🔥 RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models

This repository contains the official pytorch implementation of the paper: "RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models".

🚨 Updates

2024.12.16: Update Paper / Project page
2024.05.29: Build project page
2024.05.29: RITUAL Paper online
2024.05.28: Code Release

👀 Overview

TL;DR: RITUAL is a simple yet effective anti-hallucination approach for LVLMs. Our RITUAL method leverages basic image transformations (e.g., vertical and horizontal flips) to enhance LVLM accuracy without external models or training. By integrating transformed and original images, RITUAL significantly reduces hallucinations in both discriminative tasks and descriptive tasks. Using both versions together enables the model to refine predictions, reducing errors and boosting correct responses.

🤖 RITUAL

When conditioned on the original image, the probabilities for Blue (correct) and Red (hallucinated) responses are similar, which can lead to the hallucinated response being easily sampled. RITUAL leverages an additional probability distribution conditioned on the transformed image, where the likelihood of hallucination is significantly reduced. Consequently, the response is sampled from a linear combination of the two probability distributions, ensuring more accurate and reliable outputs.

RITUAL+

In RITUAL, the original image V undergoes random transformations, generating a transformed image. In RITUAL+, the model evaluates various potential transformations and selects the most beneficial one to improve answer accuracy within the given context, further refining reliability. These transformed images serve as complementary inputs, enabling the model to incorporate multiple visual perspectives to reduce hallucinations.

💻 Setup

conda create -n RITUAL python=3.10
conda activate RITUAL
git clone https://github.com/sangminwoo/RITUAL.git
cd RITUAL
pip install -r requirements.txt

Models

About model checkpoints preparation

LLaVA-1.5: Download LLaVA-1.5 merged 7B
InstructBLIP: Download InstructBLIP

📊 Evaluation

POPE: bash eval_bench/scripts/pope_eval.sh
- Need to specify "model", "model_path"
CHAIR: bash eval_bench/scripts/chair_eval.sh
- Need to specify "model", "model_path", "type"
MME: bash experiments/cd_scripts/mme_eval.sh
- Need to specify "model", "model_path"

About datasets preparation

Please download and extract the MSCOCO 2014 dataset from this link to your data path for evaluation.
For MME evaluation, see this link.

Results

⚠️ All baseline methods were reimplemented within our evaluation setup for fair comparison.

POPE

MME

MME-Fullset

MME-Hallucination

CHAIR

Examples

🙏 Acknowledgments

This codebase borrows from most notably VCD, OPERA, and LLaVA. Many thanks to the authors for generously sharing their codes!

📝 Citation

If you find this repository helpful for your project, please consider citing our work :)

@article{woo2024ritual,
  title={RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models}, 
  author={Woo, Sangmin and Jang, Jaehyuk and Kim, Donguk and Choi, Yubin and Kim, Changick},
  journal={arXiv preprint arXiv:2405.17821},
  year={2024},
}

Name		Name	Last commit message	Last commit date
Latest commit History 11 Commits
assets		assets
eval_bench		eval_bench
experiments		experiments
ritual_utils		ritual_utils
utils		utils
.nojekyll		.nojekyll
LICENSE		LICENSE
README.md		README.md
chair.pkl		chair.pkl
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🔥 RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models

🚨 Updates

👀 Overview

🤖 RITUAL

RITUAL+

💻 Setup

Models

📊 Evaluation

Results

POPE

MME

CHAIR

Examples

🙏 Acknowledgments

📝 Citation

About

Releases

Packages

Contributors 2

Languages

License

sangminwoo/RITUAL

Folders and files

Latest commit

History

Repository files navigation

🔥 RITUAL: Random Image Transformations as a Universal Anti-hallucination Lever in Large Vision Language Models

🚨 Updates

👀 Overview

🤖 RITUAL

RITUAL+

💻 Setup

Models

📊 Evaluation

Results

POPE

MME

CHAIR

Examples

🙏 Acknowledgments

📝 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages