Skip to content

zhijian11/DriveMM

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

arXiv Web HF

This repository contains the implementation of the paper:

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang*, Chenjian Feng*, Feng Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan liang†, Lin Ma
*Equal Contribution †Corresponding Authors

🔥 Updates

  • 2024.12: We release DriveMM paper on arxiv!We release the models and inference code!

✨ Hightlights

🔥 We propose a novel all-in-one large multimodal model, DriveMM, robustly equipped with the general capabilities to execute a wide range of AD tasks and the generalization ability to effectively transfer to new datasets.

🔥 We introduce comprehensive benchmarks for evaluating autonomous driving LMMs, which include six public datasets, four input types, and thirteen challenging tasks. To the best of our knowledge, this is the first to use multiple benchmarks to evaluate autonomous driving LLMs.

🔥 We present a curriculum principle for pre-training and fine-tuning on both diverse multimodal data and AD data. DriveMM demonstrates state-of-the-art performances and consistently outperforms models trained on the individual dataset across all evaluated benchmarks.

🏁 Getting Started

Installation

1. Clone this repository and navigate to the DriveMM folder:

git clone https://github.com/zhijian11/DriveMM
cd DriveMM

2. Install the inference package:

conda create -n drivemm python=3.10 -y
conda activate drivemm
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"

3. Inference DriveMM demo:

  • Download the checkpoint and put them on ckpt/ floder.
cd scripts/inference_demo
python demo_image.py # for image input 
python demo_video.py # for video input

✅ TODO

  • DriveMM models
  • DriveMM inference code
  • DriveMM evaluation code
  • DriveMM training data
  • DriveMM training code

😊 Acknowledge

This project has referenced some excellent open-sourced repos(LLaVa-NeXT). Thanks for their wonderful works and contributions to the community.

📌 Citation

If you find DriveMM is helpful for your research or applications, please consider giving us a star 🌟 and citing it by the following BibTex entry.

@article{huang2024drivemm,
  title={DriveMM: All-in-One Large Multimodal Model for Autonomous Driving},
  author={Huang, Zhijian and Fen, Chengjian and Yan, Feng and Xiao, Baihui and Jie, Zequn and Zhong, Yujie and Liang, Xiaodan and Ma, Lin},
  journal={arXiv preprint arXiv:2412.07689},
  year={2024}
}

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages