-
Notifications
You must be signed in to change notification settings - Fork 531
Description
This is a living document!
Our vision is to enable vLLM to run seamlessly on Ascend NPU. We are fully committed to making vLLM one of the best engines for Ascend NPU. In Q1 2025, we have provided initial support for vLLM on Ascend NPU.
In 2025 Q2, we will focus on 4 themes: vLLM Ascend for Production, Performance Optimization, Key Features, Ecosystem Connect.
1. Performance Optimization
We will focus on the performance optimization of dense models (Qwen/Llama/Qwen-VL) and MOE models (DeepSeek V3/R1), users of vLLM Ascend can trust its performance to be competitive for Ascend NPU.
- (P0) [V0] Support torch.compile (aka Graph mode): support aclgraph #426
- (P0) (v0.7.3 only) MindIE-turbo integration
pip install vllm-ascend[mindie-turbo]@MengqingCao - (P0) vLLM V1 engine improvement RFC Llink @wangxiyuan
- (P1) Performance report for DeepSeek R1, Qwen3, Qwen2.5, Qwen2.5-VL: Add official Performance Guide Doc
2. vLLM Ascend for Production
Align with vLLM, vLLM Ascend is designed for production, the first official version for vLLM 0.7.3 will be published, we will also actively promote the key features of vLLM v0.8.x/v1 to production availability
- (P0) Stable Plugin Architecture for hardware platforms @wangxiyuan
- (P0) CI: Model/Feature coverage: [RFC]: E2E CI test for key features #413 @MengqingCao
- (P0) Testing: Performance test @leo-pony @hfadzxy
- (P0) Testing: Accuracy test @leo-pony @hfadzxy
- (P0) Testing: Stress and longevity test (downstream)
3. Key Features
We will focus on the integration and support of key lifecycle workflow of model training (SFT / RL) and inference (singe node / distributed).
3.1 Workflows
Cluster Scale Serving
- (P0) MLA enhancements: Support multistream of MLA vector operations #1135 , [Perf]remove unnecessary padding before MLA V1 prefill #917
- (P0) Distributed EP: EP + DP [Feature] Implement EP-compatible fused_moe #121
- (P0) Prefill Decode Disaggregation: 1P1D, xPyD: [RFC]: P/D Disaggregation Support #841
- (P0) EPLB: Add static EPLB #1116
Core feature support
- (P0) LoRA / MultiLora: [RFC]: Join the MultiLora and MultiLora Dynammic Serving feature develop #396 @ZhengJun9
- (P1) Structured Output on V1: [Feature]: Add Support for Guided Decoding (Structured Output) #177 @shen-shanshan
- (Help wanted) Prompt adapter
RLHF
3.2 Models support
- (P0) Quantization support: w8a8 (DeepSeek R1 with 2 nodes): [quantization] Support w8a8 quantization #580
- (P1) Upcoming Qwen3 / DeepSeek-R2 / Llama4/DeepSeek DRM series new models support
- (P1) Qwen-Omini-thinker: Add qwen2.5 vl multimodal feature for vllm-ascend v1 #736
- (P1) Model format support: gguf
- (Help wanted) Quantization support: w4a16/w4a8 (DeepSeek R1 with 1 node)
- (Help wanted) Whisper
- (Help wanted) enc-dec
- (Help wanted) Gemma
3.3 User / Developer Experience
Distributions
- (P0) Docker image (mirror)
- (P0) Python Wheel: add workflow to build and release wheel #775
Docs
- Developer Design doc: [RFC]: Doc enhancement #1248
- Developer Evaluation doc: [Doc]: Add developer guide docs for OpenCompass evaluation #367
Dashboard
- Perf Dashboard
- Accuracy DashBoard: https://vllm-ascend.readthedocs.io/en/latest/developer_guide/evaluation/accuracy_report/index.html
3.4 Hardware support
- 310 series supported: [Platform] Add initial experimental support for Altlas 300I series #1333
4. Ecosystem Connect
It is key to seamlessly integrate key lifecycle components with vLLM Ascend, so we are also actively connecting with the ecosystem.
- (P1) [SFT] LLaMA-Factory: Support inference with vLLM-Ascend hiyouga/LLaMA-Factory#7739
- (P1) [RLHF] verl: [RFC][sub roadmap][25Q2] Add Ascend NPU support for verl volcengine/verl#900
- (P1) [RLHF] OpenRLHF: [WIP] support Ascend NPU backend OpenRLHF/OpenRLHF#605
- (P1) [RLHF] MindSpeed-RL: https://github.com/Ascend/MindSpeed-RL
- (P1) [RLHF] TRL: 🧗 Add Ascend NPU support for vLLM server huggingface/trl#3286
- (P1) [Deploy] GPUStack Support vllm-ascend gpustack/gpustack#1495
If any of the items you wanted is not on the roadmap, your suggestion and contribution is strongly welcomed! Please feel free to comment in this thread, open feature request, or create an RFC.
Historical Roadmap: #71