Skip to content

Official code of “MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning”

License

Notifications You must be signed in to change notification settings

xiaomi-mlab/MindDrive

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 
 
 

Repository files navigation

MindDrive: A Vision-Language-Action Model for Autonomous Driving Utilizing Language as Action in Online Reinforcement Learning

Haoyu Fu1*, Diankun Zhang2*, Zongchuang Zhao1,
Jianfeng Cui2, Hongwei Xie2†, Bing Wang2, Guang Chen2, Dingkang Liang1†, Xiang Bai1

1 Huazhong University of Science & Technology, 2 Xiaomi EV

(*) Equal contribution. (†) Project leader.

Paper PDF Project Page

Abstract

Current Vision-Language-Action (VLA) paradigms in autonomous driving primarily rely on Imitation Learning (IL), which introduces inherent challenges such as distribution shift and causal confusion. Online Reinforcement Learning offers a promising pathway to address these issues through trial-and-error learning. However, applying online reinforcement learning to VLA models in autonomous driving is hindered by inefficient exploration in continuous action spaces. To overcome this limitation, we propose MindDrive, a VLA framework comprising a large language model (LLM) with two distinct sets of LoRA parameters. The one LLM serves as a Decision Expert for scenario reasoning and driving decision-making, while the other acts as an Action Expert that dynamically maps linguistic decisions into feasible trajectories. By feeding trajectory-level rewards back into the reasoning space, MindDrive enables trial-and-error learning over a finite set of discrete linguistic driving decisions, instead of operating directly in a continuous action space. This approach effectively balances optimal decision-making in complex scenarios, human-like driving behavior, and efficient exploration in online reinforcement learning. MindDrive achieves strong closed-loop performance on the challenging Bench2Drive benchmark, with a Driving Score (DS) of 78.04 and a Success Rate (SR) of 55.09%. To the best of our knowledge, this is the first work to demonstrate the effectiveness of online reinforcement learning for the VLA model in autonomous driving.

Overview

News

[2025/12/16] ArXiv paper release.

Currently Supported Features

  • MindDrive Inference Framework
  • Close-loop Evaluation
  • MindDrive Checkpoint
  • MindDrive Training Framework

Results and Checkpoints

Orion and other baselines

Method L2 (m) 2s Driving Score Success Rate(%) Config Download Eval Json
UniAD-Tiny 0.80 40.73 13.18 config Hugging Face/Baidu Cloud Json
UniAD-Base 0.73 45.81 16.36 config Hugging Face/Baidu Cloud Json
VAD 0.91 42.35 15.00 config Hugging Face/Baidu Cloud Json
ORION-7B 0.68 77.74 54.62 config Hugging Face Json
MindDrive-0.5B 0.73 78.04 55.09 config - -

Citation

If this work is helpful for your research, please consider citing:

@article{fu2025minddrive,
  title={MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning},
  author={Haoyu Fu and Diankun Zhang and Zongchuang Zhao and Jianfeng Cui and Hongwei Xie and Bing Wang and Guang Chen and Dingkang Liang and Xiang Bai},
  journal={arXiv Preprint arXiv:2512.13636},  
  year={2025},
}
@inproceedings{fu2025orion,
  title={ORION: A Holistic End-to-End Autonomous Driving Framework by Vision-Language Instructed Action Generation},
  author={Haoyu Fu and Diankun Zhang and Zongchuang Zhao and Jianfeng Cui and Dingkang Liang and Chong Zhang and Dingyuan Zhang and Hongwei Xie and Bing Wang and Xiang Bai},
  booktitle={Proceedings of the IEEE/CVF International Conference on Computer Vision},
  year={2025}
}

About

Official code of “MindDrive: A Vision-Language-Action Model for Autonomous Driving via Online Reinforcement Learning”

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •