DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

This repository contains the implementation of the paper:

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving
Zhijian Huang*, Chenjian Feng*, Feng Yan, Baihui Xiao, Zequn Jie, Yujie Zhong, Xiaodan liang†, Lin Ma†
*Equal Contribution †Corresponding Authors

🔥 Updates

2024.12: We release DriveMM paper on arxiv！We release the models and inference code!

✨ Hightlights

🔥 We propose a novel all-in-one large multimodal model, DriveMM, robustly equipped with the general capabilities to execute a wide range of AD tasks and the generalization ability to effectively transfer to new datasets.

🔥 We introduce comprehensive benchmarks for evaluating autonomous driving LMMs, which include six public datasets, four input types, and thirteen challenging tasks. To the best of our knowledge, this is the first to use multiple benchmarks to evaluate autonomous driving LLMs.

🔥 We present a curriculum principle for pre-training and fine-tuning on both diverse multimodal data and AD data. DriveMM demonstrates state-of-the-art performances and consistently outperforms models trained on the individual dataset across all evaluated benchmarks.

🏁 Getting Started

Installation

1. Clone this repository and navigate to the DriveMM folder:

git clone https://github.com/zhijian11/DriveMM
cd DriveMM

2. Install the inference package:

conda create -n drivemm python=3.10 -y
conda activate drivemm
pip install --upgrade pip  # Enable PEP 660 support.
pip install -e ".[train]"

3. Inference DriveMM demo:

Download the checkpoint and put them on ckpt/ floder.

cd scripts/inference_demo
python demo_image.py # for image input 
python demo_video.py # for video input

✅ TODO

😊 Acknowledge

This project has referenced some excellent open-sourced repos(LLaVa-NeXT). Thanks for their wonderful works and contributions to the community.

📌 Citation

If you find DriveMM is helpful for your research or applications, please consider giving us a star 🌟 and citing it by the following BibTex entry.

@article{huang2024drivemm,
  title={DriveMM: All-in-One Large Multimodal Model for Autonomous Driving},
  author={Huang, Zhijian and Fen, Chengjian and Yan, Feng and Xiao, Baihui and Jie, Zequn and Zhong, Yujie and Liang, Xiaodan and Ma, Lin},
  journal={arXiv preprint arXiv:2412.07689},
  year={2024}
}

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
images		images
llava		llava
scripts		scripts
trl		trl
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
cog.yaml		cog.yaml
predict.py		predict.py
pyproject.toml		pyproject.toml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

🔥 Updates

✨ Hightlights

🏁 Getting Started

Installation

1. Clone this repository and navigate to the DriveMM folder:

2. Install the inference package:

3. Inference DriveMM demo:

✅ TODO

😊 Acknowledge

📌 Citation

About

Releases

Packages

Languages

License

zhijian11/DriveMM

Folders and files

Latest commit

History

Repository files navigation

DriveMM: All-in-One Large Multimodal Model for Autonomous Driving

🔥 Updates

✨ Hightlights

🏁 Getting Started

Installation

1. Clone this repository and navigate to the DriveMM folder:

2. Install the inference package:

3. Inference DriveMM demo:

✅ TODO

😊 Acknowledge

📌 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages