Skip to content

Commit 98636e6

Browse files
authored
[Build][0.7.3] Integrate MindIE Turbo into vLLM Ascend (#708)
### What this PR does / why we need it? Integrate MindIE Turbo into vLLM Ascend: - Added support for MindIE Turbo in `setup.py` via an optional dependency under `extras_require`. - Enhanced `try_register_lib` utility to log specific exceptions when MindIE Turbo is not found. - Updated documentation to include instructions for installing MindIE Turbo: - Improved documentation with links to Bisheng compiler optimization tutorials and MindIE Turbo documentation. ### Does this PR introduce _any_ user-facing change? NO ### How was this patch tested? CI passed --------- Signed-off-by: MengqingCao <cmq0113@163.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1 parent e066683 commit 98636e6

File tree

13 files changed

+139
-27
lines changed

13 files changed

+139
-27
lines changed

docs/requirements-docs.txt

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -7,3 +7,4 @@ sphinx-togglebutton
77
myst-parser
88
msgspec
99
sphinx-substitution-extensions
10+
snowballstemmer<3.0.0

docs/source/conf.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -72,6 +72,8 @@
7272
# This value should be updated when cut down release.
7373
'pip_vllm_ascend_version': "0.7.3rc2",
7474
'pip_vllm_version': "0.7.3",
75+
# The maching MindIE Turbo for vLLM Ascend
76+
'pip_mindie_turbo_version': "2.0rc1",
7577
# CANN image tag
7678
'cann_image_tag': "8.1.rc1-910b-ubuntu22.04-py3.10",
7779
}

docs/source/installation.md

Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -195,6 +195,16 @@ The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/v
195195

196196
:::::
197197

198+
## (Optional) Install MindIE Turbo
199+
200+
Install MindIE Turbo for performance acceleration:
201+
202+
```{code-block} bash
203+
:substitutions:
204+
205+
pip install mindie_turbo==|pip_mindie_turbo_version|
206+
```
207+
198208
## Extra information
199209

200210
### Verify installation
@@ -254,3 +264,10 @@ Prompt: 'The president of the United States is', Generated text: ' a very import
254264
Prompt: 'The capital of France is', Generated text: ' Paris. The oldest part of the city is Saint-Germain-des-Pr'
255265
Prompt: 'The future of AI is', Generated text: ' not bright\n\nThere is no doubt that the evolution of AI will have a huge'
256266
```
267+
268+
### Compile Enhancement
269+
270+
Get more performance gains by optimizing Python and torch-npu with the Bisheng compiler, please follow these official turtorial:
271+
272+
[Optimizing Python with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0063.html)
273+
[Optimizing torch-npu with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0058.html)

docs/source/quick_start.md

Lines changed: 14 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -33,6 +33,15 @@ docker run --rm \
3333

3434
The default workdir is `/workspace`, vLLM and vLLM Ascend code are placed in `/vllm-workspace` and installed in [development mode](https://setuptools.pypa.io/en/latest/userguide/development_mode.html)(`pip install -e`) to help developer immediately take place changes without requiring a new installation.
3535

36+
## (Optional) Install MindIE Turbo
37+
38+
Install MindIE Turbo for performance acceleration:
39+
40+
```{code-block} bash
41+
:substitutions:
42+
pip install mindie_turbo==|pip_mindie_turbo_version|
43+
```
44+
3645
## Usage
3746

3847
You can use Modelscope mirror to speed up download:
@@ -130,4 +139,8 @@ INFO: Application shutdown complete.
130139

131140
Finally, you can exit container by using `ctrl-D`.
132141
::::
133-
:::::
142+
:::::
143+
144+
### Performance enhancement related environment variables in Mindie Turbo
145+
146+
Currently, some performance enhancement features in MindIE Turbo have certain scenario restrictions. For these features, environment variables are used to control whether to enable them. For related environment variables, see its [official documentation](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0010.html).

docs/source/tutorials/multi_node.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,14 @@ docker run --rm \
3030
-it quay.io/ascend/vllm-ascend:|vllm_ascend_version| bash
3131
```
3232

33+
(Optional) Install MindIE Turbo for performance acceleration:
34+
35+
```{code-block} bash
36+
:substitutions:
37+
38+
pip install mindie_turbo==|pip_mindie_turbo_version|
39+
```
40+
3341
Choose one machine as head node, the other are worker nodes, then start ray on each machine:
3442

3543
:::{note}

docs/source/tutorials/multi_npu.md

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -27,6 +27,14 @@ docker run --rm \
2727
-it $IMAGE bash
2828
```
2929

30+
(Optional) Install MindIE Turbo for performance acceleration:
31+
32+
```{code-block} bash
33+
:substitutions:
34+
35+
pip install mindie_turbo==|pip_mindie_turbo_version|
36+
```
37+
3038
Setup environment variables:
3139

3240
```bash

docs/source/tutorials/single_npu.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,14 @@ docker run --rm \
2626
-it $IMAGE bash
2727
```
2828

29+
(Optional) Install MindIE Turbo for performance acceleration:
30+
31+
```{code-block} bash
32+
:substitutions:
33+
34+
pip install mindie_turbo==|pip_mindie_turbo_version|
35+
```
36+
2937
Setup environment variables:
3038

3139
```bash
@@ -90,7 +98,20 @@ docker run --rm \
9098
-p 8000:8000 \
9199
-e VLLM_USE_MODELSCOPE=True \
92100
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
93-
-it $IMAGE \
101+
-it $IMAGE bash
102+
```
103+
104+
(Optional) Install MindIE Turbo for performance acceleration:
105+
106+
```{code-block} bash
107+
:substitutions:
108+
109+
pip install mindie_turbo==|pip_mindie_turbo_version|
110+
```
111+
112+
Run the following script to start the vLLM server:
113+
114+
```
94115
vllm serve Qwen/Qwen2.5-7B-Instruct --max_model_len 26240
95116
```
96117

docs/source/tutorials/single_npu_multimodal.md

Lines changed: 22 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,6 +26,14 @@ docker run --rm \
2626
-it $IMAGE bash
2727
```
2828

29+
(Optional) Install MindIE Turbo for performance acceleration:
30+
31+
```{code-block} bash
32+
:substitutions:
33+
34+
pip install mindie_turbo==|pip_mindie_turbo_version|
35+
```
36+
2937
Setup environment variables:
3038

3139
```bash
@@ -143,7 +151,20 @@ docker run --rm \
143151
-p 8000:8000 \
144152
-e VLLM_USE_MODELSCOPE=True \
145153
-e PYTORCH_NPU_ALLOC_CONF=max_split_size_mb:256 \
146-
-it $IMAGE \
154+
-it $IMAGE bash
155+
```
156+
157+
(Optional) Install MindIE Turbo for performance acceleration:
158+
159+
```{code-block} bash
160+
:substitutions:
161+
162+
pip install mindie_turbo==|pip_mindie_turbo_version|
163+
```
164+
165+
Run the following script to start the vLLM server:
166+
167+
```
147168
vllm serve Qwen/Qwen2.5-VL-7B-Instruct --dtype bfloat16 --max_model_len 16384 --max-num-batched-tokens 16384
148169
```
149170
Lines changed: 36 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -1,21 +1,38 @@
11
# Feature Support
22

3-
| Feature | Supported | CI Coverage | Guidance Document | Current Status | Next Step |
4-
|--------------------------|-----------|-------------|-------------------|---------------------------|--------------------|
5-
| Chunked Prefill || | | NA | Rely on CANN 8.1 NNAL package release |
6-
| Automatic Prefix Caching || | | Basic functions available | Rely on CANN 8.1 NNAL package release |
7-
| LoRA || | | NA | Plan in 2025.06.30 |
8-
| Prompt adapter || | | NA | Plan in 2025.06.30 |
9-
| Speculative decoding || | | Basic functions available | Need fully test |
10-
| Pooling || | | Basic functions available(Bert) | Need fully test and add more models support|
11-
| Enc-dec || | | NA | Plan in 2025.06.30|
12-
| Multi Modality || || Basic functions available(LLaVA/Qwen2-vl/Qwen2-audio/internVL)| Improve perforamance, and add more models support |
13-
| LogProbs || | | Basic functions available | Need fully test |
14-
| Prompt logProbs || | | Basic functions available | Need fully test |
15-
| Async output || | | Basic functions available | Need fully test |
16-
| Multi step scheduler || | | Basic functions available | Need fully test, Find more details at [<u> Blog </u>](https://blog.vllm.ai/2024/09/05/perf-update.html#batch-scheduling-multiple-steps-ahead-pr-7000), [<u> RFC </u>](https://github.com/vllm-project/vllm/issues/6854) and [<u>issue</u>](https://github.com/vllm-project/vllm/pull/7000) |
17-
| Best of || | | Basic functions available | Need fully test |
18-
| Beam search || | | Basic functions available | Need fully test |
19-
| Guided Decoding || | | Basic functions available | Find more details at the [<u>issue</u>](https://github.com/vllm-project/vllm-ascend/issues/177) |
20-
| Tensor Parallel || | | Basic functions available | Need fully test |
21-
| Pipeline Parallel || | | Basic functions available | Need fully test |
3+
The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We are also actively collaborating with the community to accelerate support.
4+
5+
vLLM Ascend offers the overall functional support of the most features in vLLM, and the usage keep the same with vLLM except for some limits.
6+
7+
```{note}
8+
MindIE Turbo is an optional performace optimization plugin. Find more information about the feature support of MindIE Turbo here(UPDATE_ME_AS_A_LINK).
9+
```
10+
11+
| Feature | vLLM Ascend | MindIE Turbo | Notes |
12+
|-------------------------------|----------------|-----------------|------------------------------------------------------------------------|
13+
| V1Engine | 🔵 Experimental| 🔵 Experimental| Will enhance in v0.8.x |
14+
| Chunked Prefill | 🟢 Functional | 🟢 Functional | / |
15+
| Automatic Prefix Caching | 🟢 Functional | 🟢 Functional | [Usage Limits][#732](https://github.com/vllm-project/vllm-ascend/issues/732) |
16+
| LoRA | 🟢 Functional | 🟢 Functional | / |
17+
| Prompt adapter | 🟡 Planned | 🟡 Planned | / |
18+
| Speculative decoding | 🟢 Functional | 🟢 Functional | [Usage Limits][#734](https://github.com/vllm-project/vllm-ascend/issues/734) |
19+
| Pooling | 🟢 Functional | 🟢 Functional | / |
20+
| Enc-dec | 🟡 Planned | 🟡 Planned | / |
21+
| Multi Modality | 🟢 Functional | 🟢 Functional | / |
22+
| LogProbs | 🟢 Functional | 🟢 Functional | / |
23+
| Prompt logProbs | 🟢 Functional | 🟢 Functional | / |
24+
| Async output | 🟢 Functional | 🟢 Functional | / |
25+
| Multi step scheduler | 🟢 Functional | 🟢 Functional | / |
26+
| Best of | 🟢 Functional | 🟢 Functional | / |
27+
| Beam search | 🟢 Functional | 🟢 Functional | / |
28+
| Guided Decoding | 🟢 Functional | 🟢 Functional | / |
29+
| Tensor Parallel | 🟢 Functional | ⚡Optimized | / |
30+
| Pipeline Parallel | 🟢 Functional | ⚡Optimized | / |
31+
| Expert Parallel | 🟡 Planned | 🟡 Planned | Will support in v0.8.x |
32+
| Data Parallel | 🟡 Planned | 🟡 Planned | Will support in v0.8.x |
33+
| Prefill Decode Disaggregation | 🟢 Functional | 🟢 Functional | todo |
34+
| Quantization | 🟡 Planned | 🟢 Functional | Will support in v0.8.x |
35+
| Graph Mode | 🟡 Planned | 🟡 Planned | Will support in v0.8.x |
36+
| Sleep Mode | 🟢 Functional | 🟢 Functional | [Usage Limits][#733](https://github.com/vllm-project/vllm-ascend/issues/733) |
37+
| MTP | 🟢 Functional | 🟢 Functional | [Usage Limits][#734](https://github.com/vllm-project/vllm-ascend/issues/734) |
38+
| Custom Scheduler | 🟢 Functional | 🟢 Functional | [Usage Limits][#788](https://github.com/vllm-project/vllm-ascend/issues/788) |

setup.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -368,7 +368,7 @@ def _read_requirements(filename: str) -> List[str]:
368368
install_requires=get_requirements(),
369369
ext_modules=ext_modules,
370370
cmdclass=cmdclass,
371-
extras_require={},
371+
extras_require={"mindie_turbo": ["mindie-turbo==2.0rc1"]},
372372
entry_points={
373373
"vllm.platform_plugins": ["ascend = vllm_ascend:register"],
374374
"vllm.general_plugins":

0 commit comments

Comments
 (0)