Skip to content

steven640pixel/GalleryGPT

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

84 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GalleryGPT

This repository contains the code, data and model for the paper titled "GalleryGPT: Analyzing Paintings with Large Multimodal Models".

Latest News 🔥

Install Packages

cd GalleryGPT
conda create -n gallery_gpt python=3.10 -y
conda activate gallery_gpt
pip install -e .
pip install protobuf

Enable Deepspeed and Flash-attention

pip install -e ".[train]"
pip install flash-attn --no-build-isolation

Data Preparation

Download PaintingForm dataset. "train_samples_tuning.json" corresponds to the annotations of painting formal analysis for instruction finetuning.

The overall pipeline of constructing our PaintingForm:

Place the data in the root or other directory. Data structure:

├── art_images_data/
│   ├── images/0.png
│   ├── images/1.png
│   ├── ...
├── train_samples_tuning.json

Run LoRA-finetuning

Download ShareGPT4V-7B as base model and place in ./share4v/llava-7b, then replace config.json with ours in root directory. Run script:

sh finetune_task_lora.sh

Checkpoints and Inference

You can download base model then replace config.json with ours in root directory, and LoRA checkpoint. For inference:

cd llava/eval
python run_llava.py --model-path llava-lora-model --model-base share4v/llava-7b --image-file your/image/path --query  

Results on PaintingForm

Captioning metrics on our test set:

Model BLEU GLEU METEOR ROUGE
LLaVA-1.5-7B 9.87 14.59 26.19 26.37
Qwen-VL-Chat-7B 13.65 16.42 29.78 26.72
ShareGPT4V-7B 12.38 16.14 31.53 26.63
GalleryGPT-7B 21.23 21.68 37.62 31.34

Cases

Formal analysis generation:

Dialogue examples:

Acknowledgement

The project is built on top of the amazing LLaVA repository and ShareGPT4V repository. Thanks for their contributions.

If you find our work helpful to your research, please consider citing us with this BibTeX:

@inproceedings{MM24GalleryGPT,
  author    = {Yi Bin and
               Wenhao Shi and
               Yujuan Ding and
               Zhiqiang Hu and
               Zheng Wang and
               Yang Yang and
               See-Kiong Ng and
               Heng Tao Shen}
  title     = {GalleryGPT: Analyzing Paintings with Large Multimodal Models},
  booktitle = {Proceedings of the 32nd ACM International Conference on Multimedia, 28 October – 1 November, 2024, Melbourne, Australia.},
  year      = {2024},
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published