A first vision-language model specially designed for the marine domain. It could generate more sensitive, informative, and scientific responses as a powerful marine AI assistant.
Ziqiang Zheng, Jipeng Zhang, Tuan-Anh Vu, Shizhe Diao, Yue Him Wong Tim, Sai-Kit Yeung
[Mar.2 2024] We include LLaVA1.5 for comparison and embed LLaVA into our MarineGPT. Pre-trained models will be uploaded soon!
[Feb.26 2024] MarineGPT now supports the GEMMA and we released the pre-trained models of GEMMA.
[Feb.19 2024] We released the pre-trained models of MarineGPT.
Coming Soon.
Key Contributions:
- MarineGPT - Domain-specific (marine) MLLM + Instruction-following tuning enable fine-grained marine object recognition and yield sensitive, informative and scientific response.
- Marine-5M Dataset (~5M) - A Large-scale, Diverse, Broad-coverage marine image-text dataset for promoting aligning visual-and-language modalities.
- A marine-specific data generation pipeline to create diverse (image, instruction, output) instruction-following training data.
Potential Applications of MarineGPT:
- Scale up Marine Organism Recognition.
- Monitoring.
- Centralized Platform.
- Interdisciplinary Research.
- General Public Access.
Large language models (LLMs), such as ChatGPT/GPT-4, have proven to be powerful tools in promoting the user experience as an AI assistant. The continuous works are proposing multi-modal large language models (MLLM), empowering LLMs with the ability to sense multiple modality inputs through constructing a joint semantic space (e.g. visual-text space). Though significant success was achieved in LLMs and MLLMs, exploring LLMs and MLLMs in domain-specific applications that required domain-specific knowledge and expertise has been less conducted, especially for marine domain. Different from general-purpose MLLMs, the marine-specific MLLM is required to yield much more sensitive, informative, and scientific responses. In this work, we demonstrate that the existing MLLMs optimized on huge amounts of readily available general-purpose training data show a minimal ability to understand domain-specific intents and then generate informative and satisfactory responses. To address these issues, we propose MarineGPT, the first vision-language model specially designed for the marine domain, unlocking the secrets of the ocean to the public. We present our Marine-5M dataset with more than 5 million marine image-text pairs to inject domain-specific marine knowledge into our model and achieve better marine vision and language alignment. Our MarineGPT not only pushes the boundaries of marine understanding to the general public but also offers a standard protocol for adapting a general-purpose assistant to downstream domain-specific experts. We pave the way for a wide range of marine applications while setting valuable data and pre-trained models for future research in both academic and industrial communities.
- Recognizing various marine objects.
- Fine-grained marine object recognition.
- Comprehensive multi-round conversation.
1. Prepare the code and the environment
Git clone our repository, creating a python environment and activate it via the following command
git clone https://github.com/hkust-vgd/MarineGPT
cd MarineGPT
conda env create -f environment.yml
conda activate marinegpt
2. Prepare the pretrained LLM weights
MarineGPT is based on Vicuna V0 7B/13B. Please download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
Vicuna V0 13B | Vicuna V0 7B |
---|---|
Downlad | Download |
Then, set the variable llama_model in the model config file to the LLM weight path.
### modify the path of LLM weights in Line 16 of marinegpt/configs/models/marinegpt.yaml
llama_model: "/path/to/LLM_weights/"
MarineGPT can also support GEMMA-2B/7B. Please download the corresponding LLM weights from the following huggingface space via clone the repository using git-lfs.
Pre-trained GEMMA models
GEMMA 2B | GEMMA 7B |
---|---|
Downlad | Download |
GEMMA models after instruction tuning
GEMMA 2B-it | GEMMA 7B-it |
---|---|
Downlad | Download |
Then, set the variable gemma_model in the model config file to the LLM weight path.
### modify the path of LLM weights in Line 16 of marinegpt/configs/models/marinegpt_gemma.yaml
llama_model: "/path/to/GEMMA_weights/"
For MarineGPT, we will also plan to support the LLaMA and LLaMA 2 version. We will release the trained weights very soon.
Vicuna
MarineGPT Stage 1 (Vicuna 13B) | MarineGPT Stage 2 (Vicuna 13B) | MarineGPT stage 1 (Vicuna 7B) | MarineGPT stage 2 (Vicuna 7B) |
---|---|---|---|
Download | Download | Download | Download |
GEMMA
MarineGPT Stage 2 (GEMMA 2B) | MarineGPT Stage 2 (GENNA 2B-it) | MarineGPT stage 2 (GEMMA 7B) | MarineGPT stage 2 (GEMMA 7B-it) |
---|---|---|---|
Download | Download | Download | Download |
For MarineGPT, set the path to the pretrained checkpoint in the evaluation config file in eval_configs/marinegpt_eval.yaml at Line 11.
3. Launching Demo Locally
Vicuna
For MarineGPT, run
python demo.py --cfg-path eval_configs/marinegpt_eval.yaml --gpu-id 0
Please specify the path of pre-trained checkpoints (stage 1 or stage 2; Vicuda 7B or Vicuna 13B) in eval_configs/marinegpt_eval.yaml at Line 11.
### modify the path of pre-trained ckpts in Line 1q of eval_configs/marinegpt_eval.yaml
ckpt: './ckpt/vicuna_7B/stage1/marinegpt_vicuna_7B_stage1_ckpt.pth'
GEMMA
For MarineGPT, run
python demo.py --cfg-path eval_configs/marinegpt_gemma_eval.yaml --gpu-id 0 --model_type gemma_model
Please specify the path of pre-trained checkpoints in eval_configs/marinegpt_gemma_eval.yaml at Line 11.
### modify the path of pre-trained ckpts in Line 1q of eval_configs/marinegpt_eval.yaml
ckpt: './ckpt/gemma_2B/stage2/marinegpt_gemma_2B_stage2_ckpt.pth'
4. Other applications
MarineGPT could also support to generate feature embedding and captions for the visual images
### generate the feature embedding for retrieval
python generate_embeddings.py --cfg-path eval_configs/marinegpt_eval.yaml --gpu-id 0 --img_path ./img_path --output_path ./output_path
### generate the feature embedding for retrieval
python generate_captions_for_imgs.py --cfg-path eval_configs/marinegpt_eval.yaml --gpu-id 0 --img_path ./img_path
1. Datasets
We will provide more details of our training data.
2. Implementation Details
Stage 1 (pre-training): please refer to train_configs/marinegpt_stage1_pretrain.yaml
Stage 2 (finetuning): please refer to train_configs/marinegpt_stage2_finetune.yaml
More implementation details will be added soon.
- BLIP2 The model architecture of MarineGPT follows BLIP-2. Please check this great open-source work if you are not familiar with VLMs!
- MiniGPT-4 Our codes are mainly based on MiniGPT-4. Thanks for their contributions to the whole community.
- Lavis Our project is also built upon Lavis!
- Vicuna A powerful and open-source LLM to understand the user intents!
- LLaVA A powerful and open-source MLLM!
If you find MarineGPT helpful, please consider citing:
@misc{zheng2023marinegpt,
title={MarineGPT: Unlocking Secrets of "Ocean" to the Public},
author={Ziqiang Zheng and Jipeng Zhang and Tuan-Anh Vu and Shizhe Diao and Yue Him Wong Tim and Sai-Kit Yeung},
year={2023},
eprint={2310.13596},
archivePrefix={arXiv},
primaryClass={cs.CV}
}