Skip to content

Commit 7ddce50

Browse files
wangxiyuanYikun
authored andcommitted
[Doc] Add release note for 0.7.3
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com> Signed-off-by: Yikun Jiang <yikunkero@gmail.com>
1 parent 98636e6 commit 7ddce50

File tree

8 files changed

+48
-209
lines changed

8 files changed

+48
-209
lines changed

docs/source/conf.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -67,10 +67,10 @@
6767
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
6868
# - main branch: 'main'
6969
# - vX.Y.Z branch: latest vllm-ascend release tag
70-
'vllm_ascend_version': 'v0.7.3rc2',
70+
'vllm_ascend_version': 'v0.7.3',
7171
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
7272
# This value should be updated when cut down release.
73-
'pip_vllm_ascend_version': "0.7.3rc2",
73+
'pip_vllm_ascend_version': "0.7.3",
7474
'pip_vllm_version': "0.7.3",
7575
# The maching MindIE Turbo for vLLM Ascend
7676
'pip_mindie_turbo_version': "2.0rc1",

docs/source/developer_guide/contributing.zh.md

Lines changed: 0 additions & 102 deletions
This file was deleted.

docs/source/developer_guide/versioning_policy.md

Lines changed: 7 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -65,19 +65,20 @@ As shown above:
6565

6666
Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
6767

68-
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
69-
|--------------|--------------| --- | --- | --- |
70-
| v0.7.3rc2 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250320 |
71-
| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 |
72-
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 |
68+
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
69+
|--------------|--------------| --- | --- | --- | --- |
70+
| v0.7.3 | v0.7.3 | 3.9 - 3.11 | 8.1.0 | 2.5.1 / 2.5.1 | 2.0rc1 |
71+
| v0.7.3rc2 | v0.7.3 | 3.9 - 3.11 | 8.0.0 | 2.5.1 / 2.5.1.dev20250320 | / |
72+
| v0.7.3rc1 | v0.7.3 | 3.9 - 3.11 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 | / |
73+
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.11 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 | / |
7374

7475
## Release cadence
7576

7677
### release window
7778

7879
| Date | Event |
7980
|------------|-------------------------------------------|
80-
| 2025.04.30 | Final release, v0.7.3(The official release rely on the release of torch-npu and CANN8.1, so it's delayed)|
81+
| 2025.05.08 | Final release, v0.7.3 |
8182
| 2025.04.17 | Release candidates, v0.8.4rc1 |
8283
| 2025.03.28 | Release candidates, v0.7.3rc2 |
8384
| 2025.03.14 | Release candidates, v0.7.3rc1 |

docs/source/developer_guide/versioning_policy.zh.md

Lines changed: 0 additions & 79 deletions
This file was deleted.

docs/source/installation.md

Lines changed: 4 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -7,6 +7,7 @@ This document describes how to install vllm-ascend manually.
77
- OS: Linux
88
- Python: >= 3.9, < 3.12
99
- A hardware with Ascend NPU. It's usually the Atlas 800 A2 series.
10+
- Firmware: Ascend HDK >= 24.1RC1
1011
- Software:
1112

1213
| Software | Supported version | Note |
@@ -119,7 +120,7 @@ First install system dependencies:
119120

120121
```bash
121122
apt update -y
122-
apt install -y gcc g++ libnuma-dev
123+
apt install -y gcc g++ libnuma-dev git
123124
```
124125

125126
You can install `vllm` and `vllm-ascend` from **pre-built wheel**:
@@ -269,5 +270,5 @@ Prompt: 'The future of AI is', Generated text: ' not bright\n\nThere is no doubt
269270

270271
Get more performance gains by optimizing Python and torch-npu with the Bisheng compiler, please follow these official turtorial:
271272

272-
[Optimizing Python with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0063.html)
273-
[Optimizing torch-npu with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0058.html)
273+
- [Optimizing Python with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0063.html)
274+
- [Optimizing torch-npu with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0058.html)

docs/source/user_guide/release_notes.md

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,30 @@
11
# Release note
22

3+
## v0.7.3
4+
5+
🎉 Hello, World!
6+
7+
We are excited to announce the release of 0.7.3 for vllm-ascend. This is the first official release. The functionality, performance, and stability of this release are fully tested and verified. We encourage you to try it out and provide feedback. We'll post bug fix versions in the future if needed. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
8+
9+
### Highlights
10+
- This release includes all features landed in the previous release candidates ([v0.7.1rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.1rc1), [v0.7.3rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc1), [v0.7.3rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc2)). And all the features are fully tested and verified. Visit the official doc the get the detail [feature](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/user_guide/suppoted_features.html) and [model](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/user_guide/supported_models.html) support matrix.
11+
- Upgrade CANN to 8.1.RC1 to enable chunked prefill and automatic prefix caching features. You can now enable them now.
12+
- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. [#662](https://github.com/vllm-project/vllm-ascend/pull/662)
13+
- Integrate MindIE Turbo into vLLM Ascend to improve DeepSeek V3/R1, Qwen 2 series performance. [#708](https://github.com/vllm-project/vllm-ascend/pull/708)
14+
15+
### Core
16+
- LoRA、Multi-LoRA And Dynamic Serving is supported now. The performance will be improved in the next release. Please follow the official doc for more usage information. Thanks for the contribution from China Merchants Bank. [#700](https://github.com/vllm-project/vllm-ascend/pull/700)
17+
18+
### Model
19+
- The performance of Qwen2 vl and Qwen2.5 vl is improved. [#702](https://github.com/vllm-project/vllm-ascend/pull/702)
20+
- The performance of `apply_penalties` and `topKtopP` ops are improved. [#525](https://github.com/vllm-project/vllm-ascend/pull/525)
21+
22+
### Other
23+
- Fixed a issue that may lead CPU memory leak. [#691](https://github.com/vllm-project/vllm-ascend/pull/691) [#712](https://github.com/vllm-project/vllm-ascend/pull/712)
24+
- A new environment `SOC_VERSION` is added. If you hit any soc detection erro when building with custom ops enabled, please set `SOC_VERSION` to a suitable value. [#606](https://github.com/vllm-project/vllm-ascend/pull/606)
25+
- openEuler container image supported with v0.7.3-openeuler tag. [#665](https://github.com/vllm-project/vllm-ascend/pull/665)
26+
- Prefix cache feature works on V1 engine now. [#559](https://github.com/vllm-project/vllm-ascend/pull/559)
27+
328
## v0.7.3rc2
429

530
This is 2nd release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.

docs/source/user_guide/supported_models.md

Lines changed: 0 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -12,18 +12,6 @@
1212
| QwQ-32B |||
1313
| MiniCPM || |
1414
| LLama3.1/3.2 |||
15-
| Mistral | | Need test |
16-
| DeepSeek v2.5 | |Need test |
17-
| Gemma-2 | |Need test|
18-
| Baichuan | |Need test|
1915
| Internlm |||
20-
| ChatGLM || Plan in Q2|
2116
| InternVL2.5 |||
22-
| GLM-4v | |Need test|
2317
| Molomo |||
24-
| LLaVA1.5 | | Need test|
25-
| Mllama | |Need test|
26-
| LLaVA-Next | |Need test|
27-
| LLaVA-Next-Video | |Need test|
28-
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
29-
| Ultravox | |Need test|

docs/source/user_guide/suppoted_features.md

Lines changed: 10 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -4,11 +4,7 @@ The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We a
44

55
vLLM Ascend offers the overall functional support of the most features in vLLM, and the usage keep the same with vLLM except for some limits.
66

7-
```{note}
8-
MindIE Turbo is an optional performace optimization plugin. Find more information about the feature support of MindIE Turbo here(UPDATE_ME_AS_A_LINK).
9-
```
10-
11-
| Feature | vLLM Ascend | MindIE Turbo | Notes |
7+
| Feature | vLLM Ascend | vLLM Ascend (+ MindIE Turbo) | Notes |
128
|-------------------------------|----------------|-----------------|------------------------------------------------------------------------|
139
| V1Engine | 🔵 Experimental| 🔵 Experimental| Will enhance in v0.8.x |
1410
| Chunked Prefill | 🟢 Functional | 🟢 Functional | / |
@@ -36,3 +32,12 @@ MindIE Turbo is an optional performace optimization plugin. Find more informatio
3632
| Sleep Mode | 🟢 Functional | 🟢 Functional | [Usage Limits][#733](https://github.com/vllm-project/vllm-ascend/issues/733) |
3733
| MTP | 🟢 Functional | 🟢 Functional | [Usage Limits][#734](https://github.com/vllm-project/vllm-ascend/issues/734) |
3834
| Custom Scheduler | 🟢 Functional | 🟢 Functional | [Usage Limits][#788](https://github.com/vllm-project/vllm-ascend/issues/788) |
35+
36+
37+
*MindIE Turbo is an LLM inference engine acceleration plug-in library on Ascend hardware. Find more information [here](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0001.html).*
38+
39+
- 🟢 Functional: Fully operational, with ongoing optimizations.
40+
41+
- 🔵 Experimental: Experimental support, interfaces and functions may change.
42+
43+
- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).

0 commit comments

Comments
 (0)