Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models

Please give us a star ⭐ if you find this work useful

News

2025.2.08 🚀 Released the paper arXiv.
2025.2.04 🚀 Released the codebase.

🐳 Model Zoo

Vision-Language Models (VLMs) 🖼️

Model Series	Model Name	Parameters	Architecture
LLaVA	LLaVA-v1.6-34B	34B	Vision-Language
	LLaVA-v1.6-13B	13B	Vision-Language
	LLaVA-v1.6-7B	7B	Vision-Language
Lightweight	MoE-LLaVA-Phi2	2.7B	Vision-Language
	MobileVLM-v2	7B	Vision-Language
Other VLMs	mPLUG-Owl2	7B	Vision-Language
	Qwen-VL-Chat	7B	Vision-Language
	Yi-VL	6B	Vision-Language
	CogAgent-VQA	7B	Vision-Language

Large Language Models (LLMs) 📚

Model Series	Model Name	Parameters	Architecture
Yi	Yi-34B	34B	Language
Qwen	Qwen-14B	14B	Language
	Qwen-7B	7B	Language
Llama-2	Llama-2-13B	13B	Language
	Llama-2-7B	7B	Language

📊 Evaluation

Key Improvements Over Baselines 🚀

Hallucination Detection: Up to 22.19% improvement in AUROC
Uncertainty Estimation: 21.17% boost in uncertainty-guided selective generation (AUARC)
Calibration: 70-85% reduction in calibration error
Coverage: Consistently meets 90% coverage target while reducing prediction set size

Benchmarks 🔖

for detailed results, please refer to the paper.

🤖 Getting started

6 groups of models could be launch from one environment: LLaVa, CogVLM, Yi-VL, Qwen-VL, internlm-xcomposer, MoE-LLaVA. This environment could be created by the following code:

python3 -m venv venv
source venv/bin/activate
pip install git+https://github.com/haotian-liu/LLaVA.git 
pip install git+https://github.com/PKU-YuanGroup/MoE-LLaVA.git --no-deps
pip install deepspeed==0.9.5
pip install -r requirements.txt
pip install xformers==0.0.23 --no-deps

mPLUG-Owl model can be launched from the following environment:

python3 -m venv venv_mplug
source venv_mplug/bin/activate
git clone https://github.com/X-PLUG/mPLUG-Owl.git
cd mPLUG-Owl/mPLUG-Owl2
git checkout 74f6be9f0b8d42f4c0ff9142a405481e0f859e5c
pip install -e .
pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
cd ../../
pip install -r requirements.txt

Monkey models can be launched from the following environment:

python3 -m venv venv_monkey
source venv_monkey/bin/activate
git clone https://github.com/Yuliang-Liu/Monkey.git
cd ./Monkey
pip install -r requirements.txt
pip install git+https://github.com/haotian-liu/LLaVA.git --no-deps
cd ../
pip install -r requirements.txt

To check all models you can run scripts/test_model_logits.sh

To work with Yi-VL:

apt-get install git-lfs
cd ../
git clone https://huggingface.co/01-ai/Yi-VL-6B

Model logits

To get model logits in four benchmarks run command from scripts/run.sh.

To train the abstention model with RL

bash scripts/train_all_models.sh

To evaluate the abstention model + uncertainty quantification benchmark

bash scripts/evaluate_policies.sh

Name		Name	Last commit message	Last commit date
Latest commit History 61 Commits
configs		configs
data_utils		data_utils
input_utils		input_utils
models_utils		models_utils
prompt_utils		prompt_utils
scripts		scripts
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt
test_datasets.py		test_datasets.py
test_model.py		test_model.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models

Please give us a star ⭐ if you find this work useful

News

🐳 Model Zoo

Vision-Language Models (VLMs) 🖼️

Large Language Models (LLMs) 📚

📊 Evaluation

Key Improvements Over Baselines 🚀

Benchmarks 🔖

🤖 Getting started

Model logits

To train the abstention model with RL

To evaluate the abstention model + uncertainty quantification benchmark

About

Releases

Packages

Languages

License

sinatayebati/vlm-uncertainty

Folders and files

Latest commit

History

Repository files navigation

Learning Conformal Abstention Policies for Adaptive Risk Management in Large Language and Vision-Language Models

Please give us a star ⭐ if you find this work useful

News

🐳 Model Zoo

Vision-Language Models (VLMs) 🖼️

Large Language Models (LLMs) 📚

📊 Evaluation

Key Improvements Over Baselines 🚀

Benchmarks 🔖

🤖 Getting started

Model logits

To train the abstention model with RL

To evaluate the abstention model + uncertainty quantification benchmark

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages