@@ -4,37 +4,37 @@ The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We a
44
55You can check the [ support status of vLLM V1 Engine] [ v1_user_guide ] . Below is the feature support status of vLLM Ascend:
66
7- | Feature | vLLM V0 Engine | vLLM V1 Engine | Next Step |
8- | -------------------------------| ----------------| ----------------| ---------------- --------------------------------------------------------|
9- | Chunked Prefill | 🟢 Functional | 🟢 Functional | Functional, see detail note: [ Chunked Prefill] [ cp ] |
10- | Automatic Prefix Caching | 🟢 Functional | 🟢 Functional | Functional, see detail note: [ vllm-ascend #732 ] [ apc ] |
11- | LoRA | 🟢 Functional | 🟢 Functional | [ vllm-ascend #396 ] [ multilora ] , [ vllm-ascend #893 ] [ v1 multilora ] |
12- | Prompt adapter | 🔴 No plan | 🔴 No plan | This feature has been deprecated by vllm . |
13- | Speculative decoding | 🟢 Functional | 🟢 Functional | Basic support |
14- | Pooling | 🟢 Functional | 🟡 Planned | CI needed and adapting more models; V1 support rely on vLLM support. |
15- | Enc-dec | 🔴 NO plan | 🟡 Planned | Plan in 2025.06.30 |
16- | Multi Modality | 🟢 Functional | 🟢 Functional | [ Tutorial] [ multimodal ] , optimizing and adapting more models |
17- | LogProbs | 🟢 Functional | 🟢 Functional | CI needed |
18- | Prompt logProbs | 🟢 Functional | 🟢 Functional | CI needed |
19- | Async output | 🟢 Functional | 🟢 Functional | CI needed |
20- | Multi step scheduler | 🟢 Functional | 🔴 Deprecated | [ vllm #8779 ] [ v1_rfc ] , replaced by [ vLLM V1 Scheduler] [ v1_scheduler ] |
21- | Best of | 🟢 Functional | 🔴 Deprecated | [ vllm #13361 ] [ best_of ] , CI needed |
22- | Beam search | 🟢 Functional | 🟢 Functional | CI needed |
23- | Guided Decoding | 🟢 Functional | 🟢 Functional | [ vllm-ascend #177 ] [ guided_decoding ] |
24- | Tensor Parallel | 🟢 Functional | 🟢 Functional | CI needed |
25- | Pipeline Parallel | 🟢 Functional | 🟢 Functional | CI needed |
26- | Expert Parallel | 🔴 NO plan | 🟢 Functional | CI needed; No plan on V0 support |
27- | Data Parallel | 🔴 NO plan | 🟢 Functional | CI needed; No plan on V0 support |
28- | Prefill Decode Disaggregation | 🟢 Functional | 🟢 Functional | 1P1D available, working on xPyD and V1 support. |
29- | Quantization | 🟢 Functional | 🟢 Functional | W8A8 available, CI needed ; working on more quantization method support |
30- | Graph Mode | 🔴 NO plan | 🔵 Experimental| Experimental, see detail note: [ vllm-ascend #767 ] [ graph_mode ] |
31- | Sleep Mode | 🟢 Functional | 🟢 Functional | level=1 available, CI needed, working on V1 support |
7+ | Feature | Status | Next Step |
8+ | -------------------------------| ----------------| ------------------------------------------------------------------------|
9+ | Chunked Prefill | 🟢 Functional | Functional, see detail note: [ Chunked Prefill] [ cp ] |
10+ | Automatic Prefix Caching | 🟢 Functional | Functional, see detail note: [ vllm-ascend #732 ] [ apc ] |
11+ | LoRA | 🟢 Functional | [ vllm-ascend #396 ] [ multilora ] , [ vllm-ascend #893 ] [ v1 multilora ] |
12+ | Prompt adapter | 🔴 No plan | This feature has been deprecated by vLLM . |
13+ | Speculative decoding | 🟢 Functional | Basic support |
14+ | Pooling | 🟢 Functional | CI needed and adapting more models; V1 support rely on vLLM support. |
15+ | Enc-dec | 🟡 Planned | vLLM should support this feature first. |
16+ | Multi Modality | 🟢 Functional | [ Tutorial] [ multimodal ] , optimizing and adapting more models |
17+ | LogProbs | 🟢 Functional | CI needed |
18+ | Prompt logProbs | 🟢 Functional | CI needed |
19+ | Async output | 🟢 Functional | CI needed |
20+ | Multi step scheduler | 🔴 Deprecated | [ vllm #8779 ] [ v1_rfc ] , replaced by [ vLLM V1 Scheduler] [ v1_scheduler ] |
21+ | Best of | 🔴 Deprecated | [ vllm #13361 ] [ best_of ] |
22+ | Beam search | 🟢 Functional | CI needed |
23+ | Guided Decoding | 🟢 Functional | [ vllm-ascend #177 ] [ guided_decoding ] |
24+ | Tensor Parallel | 🟢 Functional | Make TP >4 work with graph mode |
25+ | Pipeline Parallel | 🟢 Functional | Write official guide and tutorial. |
26+ | Expert Parallel | 🟢 Functional | Dynamic EPLB support. |
27+ | Data Parallel | 🟢 Functional | Data Parallel support for Qwen3 MoE. |
28+ | Prefill Decode Disaggregation | 🚧 WIP | working on [ 1P1D ] and xPyD. |
29+ | Quantization | 🟢 Functional | W8A8 available; working on more quantization method support(W4A8, etc) |
30+ | Graph Mode | 🔵 Experimental| Experimental, see detail note: [ vllm-ascend #767 ] [ graph_mode ] |
31+ | Sleep Mode | 🟢 Functional | |
3232
3333- 🟢 Functional: Fully operational, with ongoing optimizations.
3434- 🔵 Experimental: Experimental support, interfaces and functions may change.
3535- 🚧 WIP: Under active development, will be supported soon.
3636- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).
37- - 🔴 NO plan / Deprecated: No plan for V0 or deprecated by vLLM v1 .
37+ - 🔴 NO plan / Deprecated: No plan or deprecated by vLLM.
3838
3939[ v1_user_guide ] : https://docs.vllm.ai/en/latest/getting_started/v1_user_guide.html
4040[ multimodal ] : https://vllm-ascend.readthedocs.io/en/latest/tutorials/single_npu_multimodal.html
@@ -47,3 +47,5 @@ You can check the [support status of vLLM V1 Engine][v1_user_guide]. Below is th
4747[ graph_mode ] : https://github.com/vllm-project/vllm-ascend/issues/767
4848[ apc ] : https://github.com/vllm-project/vllm-ascend/issues/732
4949[ cp ] : https://docs.vllm.ai/en/stable/performance/optimization.html#chunked-prefill
50+ [ 1P1D ] : https://github.com/vllm-project/vllm-ascend/pull/950
51+ [ ray ] : https://github.com/vllm-project/vllm-ascend/issues/1751
0 commit comments