InstructRAG

Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
[arXiv] [Website] [Model] [Dataset] [X Summary]

InstructRAG is a simple yet effective RAG framework that allows LMs to explicitly denoise retrieved contents by generating rationales for better verifiability and trustworthiness.

InstructRAG Key Features:

🤖 Self-Synthesis: Leverage instruction-tuned LMs to generate their OWN supervision for denoising.
🔌 Easy-to-Use: Support both in-context learning (ICL) and supervised fine-tuning (SFT).
🚀 Effectiveness: Up to 8.3% better results across 5 benchmarks (Table 3).
💪 Noise Robustness: Robust to increased noise ratios in various scenarios (Figure 3).
🔁 Task Transferability: InstructRAG can also solve out-of-domain unseen tasks (Figure 4).

Please see also our paper and X summary for more details.

🔗 Quick Links

InstructRAG: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales

Installation

Run the following script to create a Python virtual environment and install all required packages.

bash setup.sh

Alternatively, you can also directly create a conda environment using the provided configuration file.

conda env create -f environment.yml

Training Script

To train the model (i.e., InstructRAG-FT), just activate the environment and run the following training script. The training config is set for 4xH100 80G GPUs. You may need to adjust NUM_DEVICE and PER_DEVICE_BATCH_SIZE based on your computation environment.

conda activate instrag
bash train.sh

Evaluation

There are two instantiations of our framework:

InstructRAG-ICL: training-free & easy-to-adapt
InstructRAG-FT: trainable & better performance

Use the following script to evaluate InstructRAG in both training-free and trainable settings. You can specify the task and model by adjusting DATASET and MODEL in eval.sh.

conda activate instrag
bash eval.sh

Generation Example

The following case study shows that InstructRAG can effectively identify relevant information from noisy input and leverage its own knowledge to correctly answer questions when required. The red texts denote irrelevant or inaccurate model generations, while the green texts denote contents relevant to the question.

Model Checkpoints

Below is the full list of InstructRAG models fine-tuned on each dataset in our work.

Dataset	HF Model Repo	Retriever
PopQA	meng-lab/PopQA-InstructRAG-FT	Contriever
TriviaQA	meng-lab/TriviaQA-InstructRAG-FT	Contriever
Natural Questions	meng-lab/NaturalQuestions-InstructRAG-FT	DPR
ASQA	meng-lab/ASQA-InstructRAG-FT	GTR
2WikiMultiHopQA	meng-lab/2WikiMultiHopQA-InstructRAG-FT	BM25

Bugs or Questions?

If you have any questions related to the code or the paper, feel free to email Zhepei (zhepei.wei@virginia.edu). If you encounter any problems when using the code, or want to report a bug, feel free to open an issue! Please try to specify the problem with details so we can help you better and quicker!

Citation

Please cite our paper if you find the repo helpful in your work:

@inproceedings{
wei2025instructrag,
title={Instruct{RAG}: Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales},
author={Zhepei Wei and Wei-Lin Chen and Yu Meng},
booktitle={The Thirteenth International Conference on Learning Representations},
year={2025},
url={https://openreview.net/forum?id=P1qhkp8gQT}
}

Name	Name	Last commit message	Last commit date
Latest commit weizhepei Update README.md Feb 6, 2025 44d5d60 · Feb 6, 2025 History 21 Commits
dataset	dataset	Update README.md	Nov 15, 2024
src	src	clean code	Jun 24, 2024
.gitignore	.gitignore	update README & website	Jun 27, 2024
LICENSE	LICENSE	Initial commit	Jun 14, 2024
README.md	README.md	Update README.md	Feb 6, 2025
environment.yml	environment.yml	update README & clean code	Jun 24, 2024
eval.sh	eval.sh	clean code	Jun 24, 2024
generate_rationale.sh	generate_rationale.sh	set cache_dir optional	Jun 27, 2024
setup.sh	setup.sh	update README & clean code	Jun 24, 2024
train.sh	train.sh	update README & clean code	Jun 24, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

InstructRAG

Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
[arXiv] [Website] [Model] [Dataset] [X Summary]

InstructRAG Key Features:

🔗 Quick Links

Installation

Training Script

Evaluation

Generation Example

Model Checkpoints

Bugs or Questions?

Citation

About

Releases

Packages

Languages

License

weizhepei/InstructRAG

Folders and files

Latest commit

History

Repository files navigation

InstructRAG

Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales [arXiv] [Website] [Model] [Dataset] [X Summary]

InstructRAG Key Features:

🔗 Quick Links

Installation

Training Script

Evaluation

Generation Example

Model Checkpoints

Bugs or Questions?

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Instructing Retrieval-Augmented Generation via Self-Synthesized Rationales
[arXiv] [Website] [Model] [Dataset] [X Summary]

Packages