Maya: Multimodal Multilingual LLM

Models and Dataset at HuggingFace
Paper: arXiv
Try Maya Model: Demo

Multimodal LLM supporting 8 languages including English, Chinese, French, Spanish, Russian, Japanese, Arabic, Hindi

Install

The following steps worked on a CUDA Version: 12.4.

Clone this repository and navigate to maya directory

git clone https://github.com/nahidalam/maya
cd maya

Install Package

conda create -n maya python=3.10 -y
conda activate maya
pip install --upgrade pip  # enable PEP 660 support
pip install -e .

Install additional packages for training cases

pip install -e ".[train]"
pip install flash-attn==2.6.3 --no-build-isolation --no-cache-dir

Model Weights and Dataset

HuggingFace

Train

Pretraining

To pretrain the projection layer,

get the pretraining dataset from HuggingFace and keep it in /dev/data/LLaVA_Pretrain
get the images with wget https://huggingface.co/datasets/liuhaotian/LLaVA-Pretrain/resolve/main/images.zip and keep them in /dev/data/images

bash scripts/maya/pretrain_aya_siglip.sh

Instruction Tuning

Please download the annotations from MBZUAI/palo_multilingual_dataset and all images following the below links.

COCO: train2017
GQA: images
OCR-VQA: download script,
TextVQA: train_val_images
VisualGenome: part1, part2

After downloading all of them, organize the data as follows in /dev/data/instruction_tune_dataset/,

instruction_tune_dataset
    ├── coco
    │   └── train2017
    ├── gqa
    │   └── images
    ├── ocr_vqa
    │   └── images
    ├── textvqa
    │   └── train_images
    └── vg
        ├── VG_100K
        └── VG_100K_2

Put the palo_multilingual_dataset.json in /dev/data/annotations/palo_multilingual_dataset.json

Make sure to keep the pretrained model you have in a path that you specify in the scripts/maya/finetune_aya_siglip.sh script throught the --pretrain_mm_mlp_adapter flag

Then run

bash scripts/maya/finetune_aya_siglip.sh

Evaluation

For multilingual evaluation using PALO multilingual test dataset

Download the PALO evaluation dataset: Create the following directory structure if it doesn't exist.

LLaVA/playground/data/eval
git clone https://huggingface.co/datasets/MBZUAI/multilingual-llava-bench-in-the-wild

Specifically test images can be found here
Run the evaluation script

bash scripts/v1_5/eval/eval_all_languages.sh \
    "model_base" \
    "model_path" \
    "model_name" \
    "your-openai-api-key"

Citation

If you find Maya useful for your research and applications, please cite using this BibTeX:

@misc{alam2024mayainstructionfinetunedmultilingual,
      title={Maya: An Instruction Finetuned Multilingual Multimodal Model}, 
      author={Nahid Alam and Karthik Reddy Kanjula and Surya Guthikonda and Timothy Chung and Bala Krishna S Vegesna and Abhipsha Das and Anthony Susevski and Ryan Sze-Yin Chan and S M Iftekhar Uddin and Shayekh Bin Islam and Roshan Santhosh and Snegha A and Drishti Sharma and Chen Liu and Isha Chaturvedi and Genta Indra Winata and Ashvanth. S and Snehanshu Mukherjee and Alham Fikri Aji},
      year={2024},
      eprint={2412.07112},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2412.07112}, 
}

Contributors

In no particular order

Team Leads: Nahid Alam, Karthik Reddy, Surya Guthikonda
Timothy Chung
Abhipsha Das
Bala Krishna S Vegesna
Iftekhar Uddin
Drishti Sushma
Roshan Santhosh
Shayakh Islam
Isha Chaturvedi
Chen Liu
Snegha A
Anthony Susevski
Ashvanth.S
Genta Indra Winata
Ryan Chan
Sangyeon Kim
Snehanshu

Acknowledgement

This codebase is based on LLaVA. Thank you for the easily understandable codebase.
This project would not be possible without the support of Cohere and their Aya-35B API grant. We are thankful to Sara Hooker, Madeline, Shivalika, Shristhi and the entire Cohere for AI team for their support.
We thank Merve and the HuggingFace team for GPU support for the inference demo

Name		Name	Last commit message	Last commit date
Latest commit History 42 Commits
evaluation		evaluation
llava		llava
playground		playground
scripts		scripts
001.jpg		001.jpg
002.jpg		002.jpg
011.jpg		011.jpg
022.jpg		022.jpg
023.jpg		023.jpg
024.jpg		024.jpg
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml
requirements.sh		requirements.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Maya: Multimodal Multilingual LLM

Contents

Install

Model Weights and Dataset

Train

Pretraining

Instruction Tuning

Evaluation

Citation

Contributors

Acknowledgement

About

Releases

Packages

Contributors 3

Languages

License

nahidalam/maya

Folders and files

Latest commit

History

Repository files navigation

Maya: Multimodal Multilingual LLM

Contents

Install

Model Weights and Dataset

Train

Pretraining

Instruction Tuning

Evaluation

Citation

Contributors

Acknowledgement

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Languages

Packages