Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

This repository contains the source code for Reefknot, which is a Multimodal Benchmark for Relation Hallucination Evaluation proposed in our paper “Reefknot: A Comprehensive Benchmark For Relation Hallucination Evaluation And Mitigation in Multimodal Large Language Models”

Hallucination issues persistently plagued current multimodal large language models (MLLMs). While existing research primarily focuses on object-level or attribute-level hallucinations, sidelining the more sophisticated relation hallucinations that necessitate advanced reasoning abilities from MLLMs. Besides, recent benchmarks regarding relation hallucinations lack in-depth evaluation and effective mitigation. To handle the aforementioned challenges, we introduce Reefknot, the first comprehensive benchmark specifically targeting relation hallucinations, consisting of over 20,000 samples derived from real-world scenarios. Specifically, we first provide a systematic definition of relation hallucinations, integrating perspectives from perceptive and cognitive domains. Moreover, we construct the relation-based corpus utilizing the representative scene graph dataset Visual Genome (VG).

Our comprehensive evaluation across three distinct tasks revealed a substantial shortcoming in the capabilities of current MLLMs to mitigate issues related to relation hallucinations. Finally, we advance a novel confidence-based mitigation strategy tailored to tackle the relation hallucinations problem.

Dataset

Contruction Method

We first identify relation triplets from Visual Genome (VG) dataset (Phase a), and conduct triplet filtering (Phase b). Subsequently, we extract the semantic triplets (Phase c) and categorize their relations (Phase d). Then, a relation-based question set can be constructed into three types (Phase e). Finally, the quality of dataset is ensured by three rounds of expert-based validation (Phase f).

Download

You need to download the photo from Visual Genome Dataset first and merge two image folder to one.
You need to git clone our repository

git clone https://github.com/Lumos0917/RLC-bench.git
cd Reefknot
conda create -yn Reefknot python=3.9
conda activate Reefknot

Our dataset consists of three jsonl files: YESNO.jsonl, Multichoice.jsonl, VQA.jsonl. Each case in jsonl file includes the following parts:

image_id: Image ID in Visual Genome Dataset
query_prompt: Quetion
label: Ground Truth label
relation_type: Type of relation, including perception and cognition.

Mitigation

Model Setup

We use the code of mitigation on LLaVA as example.

Download LLaVA file.

git clone https://github.com/haotian-liu/LLaVA.git
cd LLaVA

Download checkpoint of LLaVA and Vision Encoder from LLaVA and Vision Encoder

Mitigation Setup

Move infer_LLaVA_yesandno.py and DTC.py to ./llava/eval

Usage

Run infer.sh, which contains the following codes:

export CUDA_VISIBLE_DEVICES=0
python ./infer_LLaVA_yesandno.py \
    --model-path PATH_TO_LLaVA_CHECKPOINT \
    --question-file PATH_TO_QUESTION_FILE \
    --image-folder PATH_TO_IMAGE_FOLDER \
    --answers-file PATH_TO_ANSWER_FILE \
    --temperature 0 \
    --conv-mode vicuna_v1 \
    --apha APHA \
    --layer LAYER_NUM \ 
    --threshold ENT_THREAHOLD \
    --model_type llava-v1.5-13b

For hyperparameters, we use apha=0.1, layer=38, threshold=0.9

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
RLC-bench		RLC-bench
img		img
junkai/Dataset/RLC-bench		junkai/Dataset/RLC-bench
DTC.py		DTC.py
README.md		README.md
confidence.py		confidence.py
draw.py		draw.py
infer.sh		infer.sh
infer_LLaVA_yesandno.py		infer_LLaVA_yesandno.py
no_yes.py		no_yes.py
test.pdf		test.pdf
test.png		test.png
yes_no.py		yes_no.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

Contents

Dataset

Contruction Method

Download

Mitigation

Model Setup

Mitigation Setup

Usage

About

Releases

Packages

Contributors 2

Languages

JackChen-seu/Reefknot

Folders and files

Latest commit

History

Repository files navigation

Reefknot: A Comprehensive Benchmark for Relation Hallucination Evaluation, Analysis and Mitigation in Multimodal Large Language Models

Contents

Dataset

Contruction Method

Download

Mitigation

Model Setup

Mitigation Setup

Usage

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages