Skip to content

Repository for the paper "LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language"

License

Notifications You must be signed in to change notification settings

swapUniba/LLaVA-NDiNO

Repository files navigation

LLaVA-NDiNO

🤗📚 Datasets | 🤗💻 Models

Repository for the paper "LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language"

Introduction

LLaVA-NDiNO is a family of models trained for optimized performance in the Italian language. Specifically, the models have been trained using three different approaches (either only one of them or by applying them in sequence):

  • Language Adaptation: by pre-training the model on a rich collection of image-text data
  • Instruction-Tuning: by fine-tuning the model on instruction-following image-text data (where the model answer is brief)
  • Long Instruction-Tuning: by fine-tuning the model on instruction-following image-text data (where the model answer is long)

In this repository we provide everything we used for training and evaluation. Please note that this work used the LLaVA-NeXT codebase for the training procedure. We modified a single script, we provide this script in the repository.

Repository Structure

  • 📁 lmms-eval-tasks: contains the tasks implementations to be added to the lmms-eval library to reproduce the evaluation results on the Italian versions of GQA, POPE, SeedBENCH, OK-VQA, MTVQA and EXAMS-V
  • 📁 requirements: contains the Singularity definition file to build the Singularity container used for the training step
  • 📄 convert_llava_weights.py: script used to convert the LLaVA-NeXT checkpoint obtained by the original codebase into the HuggingFace format
  • 📄 evaluate.sh: template script to evaluate the models on the Italian versions of GQA, POPE, SeedBENCH, OK-VQA, MTVQA and EXAMS-V
  • 📄 evaluate_ppl.py: script to evaluate the models on the Perplexity metric
  • 📄 llava_train_modified.py: modified train script of the original LLaVA-NeXT repository to apply the LLaMA 3 chat template without system prompt
  • 📄 train_from_llm.sh: template script to train a LLaVA-NeXT model from a pre-trained LLM
  • 📄 train_from_lmm.sh: template script to train a LLaVA-NeXT model from a pre-trained LLaVA-NeXT model

Usage

To train a model, you should:

To evaluate a model, you should:

Citation

@inproceedings{musacchioLLaVANDiNO,
  title={LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language},
  author={Musacchio, Elio and Siciliani, Lucia and Basile, Pierpaolo and Semeraro, Giovanni},
  booktitle={Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI 2024) co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA 2024)},
  year={2024}
}

About

Repository for the paper "LLaVA-NDiNO: Empowering LLMs with Multimodality for the Italian Language"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published