Skip to content
/ maya Public

Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya

License

Notifications You must be signed in to change notification settings

nahidalam/maya

Repository files navigation

Maya: Multimodal Multilingual LLM

Multimodal LLM supporting 8 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi

Contents

Install

The following steps worked on a CUDA Version: 12.4.

  1. Clone this repository and navigate to maya directory
git clone https://github.com/nahidalam/maya
cd maya
  1. Install Package
conda create -n maya python=3.10 -y
conda activate maya
pip install --upgrade pip  # enable PEP 660 support
pip install -e .
  1. Install additional packages for training cases
pip install -e ".[train]"
pip install flash-attn==2.6.3 --no-build-isolation --no-cache-dir

Model Weights and Dataset

HuggingFace

Train

Pretraining

To pretrain the projection layer,

  • get the pretraining dataset from HuggingFace and keep it in /dev/data/LLaVA_Pretrain
  • get the images with wget https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/resolve/main/images.zip and keep them in /dev/data/images
bash scripts/maya/pretrain_aya_siglip.sh

Instruction Tuning

Please download the annotations from MBZUAI/palo_multilingual_dataset and all images following the below links.

After downloading all of them, organize the data as follows in /dev/data/instruction_tune_dataset/,

instruction_tune_dataset
    ├── coco
    │   └── train2017
    ├── gqa
    │   └── images
    ├── ocr_vqa
    │   └── images
    ├── textvqa
    │   └── train_images
    └── vg
        ├── VG_100K
        └── VG_100K_2

Put the palo_multilingual_dataset.json in /dev/data/annotations/palo_multilingual_dataset.json

Make sure to keep the pretrained model you have in a path that you specify in the scripts/maya/finetune_aya_siglip.sh script throught the --pretrain_mm_mlp_adapter flag

Then run

bash scripts/maya/finetune_aya_siglip.sh

Evaluation

For multilingual evaluation using PALO multilingual test dataset

  • Download the PALO evaluation dataset: Create the following directory structure if it doesn't exist.
    LLaVA/playground/data/eval
    git clone https://huggingface.co/datasets/MBZUAI/multilingual-llava-bench-in-the-wild
    
  • Specifically test images can be found here
  • Run the evaluation script
bash scripts/v1_5/eval/eval_all_languages.sh \
    "model_base" \
    "model_path" \
    "model_name" \
    "your-openai-api-key"

Citation

If you find Maya useful for your research and applications, please cite using this BibTeX:

@misc{alam2024mayainstructionfinetunedmultilingual,
      title={Maya: An Instruction Finetuned Multilingual Multimodal Model}, 
      author={Nahid Alam and Karthik Reddy Kanjula and Surya Guthikonda and Timothy Chung and Bala Krishna S Vegesna and Abhipsha Das and Anthony Susevski and Ryan Sze-Yin Chan and S M Iftekhar Uddin and Shayekh Bin Islam and Roshan Santhosh and Snegha A and Drishti Sharma and Chen Liu and Isha Chaturvedi and Genta Indra Winata and Ashvanth. S and Snehanshu Mukherjee and Alham Fikri Aji},
      year={2024},
      eprint={2412.07112},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.07112}, 
}

Contributors

In no particular order

Acknowledgement

  • This codebase is based on LLaVA. Thank you for the easily understandable codebase.
  • This project would not be possible without the support of Cohere and their Aya-35B API grant. We are thankful to Sara Hooker, Madeline, Shivalika, Shristhi and the entire Cohere for AI team for their support.
  • We thank Pytho for their generaous GPU grant

About

Maya: An Instruction Finetuned Multilingual Multimodal Model using Aya

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •