Bo Jiang1, Shaoyu Chen1, Bencheng Liao1, Xingyu Zhang2, Wei Yin2, Qian Zhang2, Chang Huang2, Wenyu Liu1, Xinggang Wang1,📧
1 Huazhong University of Science and Technology, 2 Horizon Robotics, 📧 corresponding author
senna_demo.mp4
[2024-12-08]:
We have released the code and weight of Senna-VLM, along with the training and evaluation scripts.
[2024-10-29]:
Senna arXiv paper released. Code/Models are coming soon. Please stay tuned! ☕️
-
Senna is an autonomous driving system that integrates a Large Vision-Language Model with an end-to-end model to improve planning safety, robustness and generalization.
-
Senna achieves SOTA planning performance and demonstrates strong cross-scenario generalization and transferability.
git clone git@github.com:hustvl/Senna.git
conda create -n senna python=3.10 -y
conda activate senna
pip install -r requirements.txt
We provide a script for generating QA data required for Senna training. The script uses LLaVA-v1.6-34b as the model for generating scene descriptions and planning explanations. You can run the script as follows:
sh data_tools/senna_nusc_converter.sh
Method | Model Size | Base LLM | Input View | Token per Image | Download |
---|---|---|---|---|---|
Senna | 7B | vicuna-7b-v1.5 | 6 View | 128 | Hugging Face |
For Stage-1 Mix Pre-training:
sh train_tools/pretrain_senna_llava.sh
For Stage-2 Driving Fine-tuning and Stage-3 Planning Fine-tuning (full-parameter fine-tuning):
sh train_tools/train_senna_llava.sh
For Stage-2 Driving Fine-tuning and Stage-3 Planning Fine-tuning (LoRA fine-tuning):
sh train_tools/train_senna_llava_lora.sh
In our experiments, we observed that full-parameter fine-tuning outperforms LoRA fine-tuning. Therefore, we recommend using full-parameter fine-tuning. However, if your machine has limited GPU memory (e.g., only 24GB), you may consider using LoRA fine-tuning as an alternative.
You can evaluate the accuracy of Senna meta-action planning using the script below.
sh eval_tools/senna_plan_cmd_eval_multi_img.sh
By running the visualization script below, you can overlay the predicted meta-actions and front-view scene descriptions onto the front-view image and save the results to the specified path.
sh eval_tools/senna_plan_visualization.sh
LLaVA, the codebase we built upon, we sincerely thank the contributors for their great work!
If you find Senna useful in your research or applications, please consider giving us a star 🌟 and citing it by the following BibTeX entry.
@article{jiang2024senna,
title={Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving},
author={Bo Jiang and Shaoyu Chen and Bencheng Liao and Xingyu Zhang and Wei Yin and Qian Zhang and Chang Huang and Wenyu Liu and Xinggang Wang},
year={2024},
eprint={2410.22313},
archivePrefix={arXiv},
primaryClass={cs.CV},
url={https://arxiv.org/abs/2410.22313},
}