vLLM Ascend Roadmap Q1 2025

This is a living document!

Note that: vLLM Ascend 0.7.3 (match vLLM v0.7.3) is main release for 2025 Q1, see more in [link](https://vllm-ascend.readthedocs.io/en/main/developer_guide/versioning_policy.html#branch-state).

Supported models track: https://github.com/vllm-project/vllm-ascend/issues/260

## Hardware Plugin

- [x] https://github.com/vllm-project/vllm/issues/11162

## Basic support
Initial vLLM Ascend support will start to support with [basic hardware compatibility support](https://docs.vllm.ai/en/latest/features/compatibility_matrix.html#feature-x-hardware).
- [x] (P0) Chunked Prefill
- [x] (P1) Automatic Prefix Caching (Improve performance)
- [x] (P1) Speculative decoding
- [x] (P1) Guided Decoding: https://github.com/vllm-project/vllm-ascend/issues/177
- [x] (P1) Multi step scheduler: https://github.com/vllm-project/vllm-ascend/pull/222
- [ ] LoRA
- [ ] Prompt adapter


## Feature support
- [x] (P0) V1 engine support: https://github.com/vllm-project/vllm-ascend/issues/9
- [ ] (P1) EP support https://github.com/vllm-project/vllm/pull/12583
- [x] (P1) Custom op https://github.com/vllm-project/vllm-ascend/issues/156
- [x] (P1) MTP https://github.com/vllm-project/vllm-ascend/pull/236
- [ ] disaggregated prefill https://github.com/vllm-project/vllm/pull/12957
- [ ] Scheduler plugin https://github.com/vllm-project/vllm/pull/12544
- [ ] RLHF Post train support - verl: https://github.com/volcengine/verl/issues/338
- [ ] RLHF Post train suppor - OpenRLHF: https://github.com/OpenRLHF/OpenRLHF/pull/605


## Model support

- [x] (P0) DeepSeek V3 / DeepSeek R1: https://github.com/vllm-project/vllm-ascend/issues/72
- [x] (P0) Llama3
- [x] (P0) Qwen2.5
- [x] (P0) Qwen2-VL: https://github.com/vllm-project/vllm-ascend/issues/246
- [x] (P0) Qwen2.5-VL: https://github.com/vllm-project/vllm-ascend/issues/75
- [x] (P1) BAAI/bge-m3: https://github.com/vllm-project/vllm-ascend/issues/235
- [x] MiniCPM
- [ ] GLM4
- [ ] InternLM
- [ ] llava
- [ ] GLM4v
- [ ] InternVL

## Performance

- add vllm-ascend perf website like vLLM does https://perf.vllm.ai/
- focus on llama3, qwen2.5, qwen2-vl, deepseek v3/R1, improve the performance

## Quality

- [ ] Full UT coverage 
- [ ] Model e2e test
- [ ] Multi card/node e2e test

## Docs
- [x] README
- [x] vllm-ascend website: https://vllm-ascend.readthedocs.org/
- [x] Quick start / Installation / Turtorial
- [x] User guide: supported feature / models
- [x] Developer guide: Contributing / Versioning policy




## CI and Developer Productivity
- [x] vllm-ascend Docker image: https://github.com/vllm-project/vllm-ascend/pull/64
- [x] Ascend CI for main / dev branch: https://github.com/vllm-project/vllm-ascend/pull/3


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM Ascend Roadmap Q1 2025 #71

Hardware Plugin

Basic support

Feature support

Model support

Performance

Quality

Docs

CI and Developer Productivity

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM Ascend Roadmap Q1 2025 #71

Description

Hardware Plugin

Basic support

Feature support

Model support

Performance

Quality

Docs

CI and Developer Productivity

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions