Official Implementation of the paper: InstrAug: Automatic Instruction Augmentation for Multimodal Instruction Fine-tuning
InstrAug is a framework for instruction augmentation. It can expand extant small instruction set to up to 30x larger. The whole pipeline of InstrAug includes (as illustrated in the figure below):
- Meta-prompt Generation
- Augmented Instruction Generation and Rule-based Filtering
- Multi-temp sampling (
$\rm MIns+_{\rm MT}$ ) - Iterative rephrasing (
$\rm MIns+_{\rm Iter}$ )
- Multi-temp sampling (
- Instruction-following Dataset Construction
We apply InstrAug to Multimodal Instruction Fine-tuning (MIFT) benchmarks and test on 12 downstream tasks from MultiInstruct and InstrutBLIP-Bench and the whole MMMU benchmark. The results show that the model's capability on instruction-augmented dataset (59K) is competitive to or even exceeds non-augmented but larger datasets (564K).
The file structure in this repository is as below, we only show important folders/files
.
├── IBLIP # Implementation code on Instruct-BLIP
├── OFA # Implementation code on OFA
├── MultiInstruct # Code to create MINS+
├──llama # Code to generate augmented instructions using LLaMA
├──mminstr_dataset # folder to store MINS and MINS+ dataset
└──instruction_data # folder to store original and generated instruction set
├── LICENSE
└── README.md
Please refer to the README.md under individual folder for more details.
Please cite our paper if you find this work useful for your research and applications
@misc{han2024robust,
title={Towards Robust Instruction Tuning on Multimodal Large Language Models},
author={Wei Han and Hui Chen and Soujanya Poria},
year={2024},
eprint={2402.14492},
archivePrefix={arXiv},
}