[Doc] Add release note for 0.7.3

wangxiyuan · Yikun · commit 7ddce506f6e7 · 2025-05-08T20:07:31.000+08:00
Signed-off-by: wangxiyuan &lt;wangxiyuan1007@gmail.com&gt;
Signed-off-by: Yikun Jiang &lt;yikunkero@gmail.com&gt;
diff --git a/docs/source/conf.py b/docs/source/conf.py
@@ -67,10 +67,10 @@
     # the branch of vllm-ascend, used in vllm-ascend clone and image tag
     # - main branch: 'main'
     # - vX.Y.Z branch: latest vllm-ascend release tag
-    'vllm_ascend_version': 'v0.7.3rc2',
+    'vllm_ascend_version': 'v0.7.3',
     # the newest release version of vllm-ascend and matched vLLM, used in pip install.
     # This value should be updated when cut down release.
-    'pip_vllm_ascend_version': "0.7.3rc2",
+    'pip_vllm_ascend_version': "0.7.3",
     'pip_vllm_version': "0.7.3",
     # The maching MindIE Turbo for vLLM Ascend
     'pip_mindie_turbo_version': "2.0rc1",
diff --git a/docs/source/developer_guide/contributing.zh.md b/docs/source/developer_guide/contributing.zh.md
diff --git a/docs/source/developer_guide/versioning_policy.md b/docs/source/developer_guide/versioning_policy.md
@@ -65,19 +65,20 @@ As shown above:
 
 Following is the Release Compatibility Matrix for vLLM Ascend Plugin:
 
-| vllm-ascend  | vLLM         | Python | Stable CANN | PyTorch/torch_npu |
-|--------------|--------------| --- | --- | --- |
-| v0.7.3rc2 | v0.7.3 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250320 |
-| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250308 |
-| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0   |  2.5.1 / 2.5.1.dev20250218 |
+| vllm-ascend  | vLLM         | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
+|--------------|--------------| --- | --- | --- | --- |
+| v0.7.3 | v0.7.3 | 3.9 - 3.11 | 8.1.0   |  2.5.1 / 2.5.1 | 2.0rc1 |
+| v0.7.3rc2 | v0.7.3 | 3.9 - 3.11 | 8.0.0   |  2.5.1 / 2.5.1.dev20250320 | / |
+| v0.7.3rc1 | v0.7.3 | 3.9 - 3.11 | 8.0.0   |  2.5.1 / 2.5.1.dev20250308 | / |
+| v0.7.1rc1 | v0.7.1 | 3.9 - 3.11 | 8.0.0   |  2.5.1 / 2.5.1.dev20250218 | / |
 
 ## Release cadence
 
 ### release window
 
 | Date       | Event                                     |
 |------------|-------------------------------------------|
-| 2025.04.30 | Final release, v0.7.3(The official release rely on the release of torch-npu and CANN8.1, so it's delayed)|
+| 2025.05.08 | Final release, v0.7.3                     |
 | 2025.04.17 | Release candidates, v0.8.4rc1             |
 | 2025.03.28 | Release candidates, v0.7.3rc2             |
 | 2025.03.14 | Release candidates, v0.7.3rc1             |
diff --git a/docs/source/developer_guide/versioning_policy.zh.md b/docs/source/developer_guide/versioning_policy.zh.md
diff --git a/docs/source/installation.md b/docs/source/installation.md
@@ -7,6 +7,7 @@ This document describes how to install vllm-ascend manually.
 - OS: Linux
 - Python: >= 3.9, < 3.12
 - A hardware with Ascend NPU. It's usually the Atlas 800 A2 series.
+- Firmware: Ascend HDK >= 24.1RC1
 - Software:
 
     | Software     | Supported version | Note                                   |
@@ -119,7 +120,7 @@ First install system dependencies:
 
 ```bash
 apt update  -y
-apt install -y gcc g++ libnuma-dev
+apt install -y gcc g++ libnuma-dev git
 ```
 
 You can install `vllm` and `vllm-ascend` from **pre-built wheel**:
@@ -269,5 +270,5 @@ Prompt: 'The future of AI is', Generated text: ' not bright\n\nThere is no doubt
 
 Get more performance gains by optimizing Python and torch-npu with the Bisheng compiler, please follow these official turtorial:
 
-[Optimizing Python with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0063.html)
-[Optimizing torch-npu with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0058.html)
+- [Optimizing Python with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0063.html)
+- [Optimizing torch-npu with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0058.html)
diff --git a/docs/source/user_guide/release_notes.md b/docs/source/user_guide/release_notes.md
@@ -1,5 +1,30 @@
 # Release note
 
+## v0.7.3
+
+🎉 Hello, World!
+
+We are excited to announce the release of 0.7.3 for vllm-ascend. This is the first official release. The functionality, performance, and stability of this release are fully tested and verified. We encourage you to try it out and provide feedback. We'll post bug fix versions in the future if needed. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
+
+### Highlights
+- This release includes all features landed in the previous release candidates ([v0.7.1rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.1rc1), [v0.7.3rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc1), [v0.7.3rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc2)). And all the features are fully tested and verified. Visit the official doc the get the detail [feature](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/user_guide/suppoted_features.html) and [model](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/user_guide/supported_models.html) support matrix.
+- Upgrade CANN to 8.1.RC1 to enable chunked prefill and automatic prefix caching features. You can now enable them now.
+- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. [#662](https://github.com/vllm-project/vllm-ascend/pull/662)
+- Integrate MindIE Turbo into vLLM Ascend to improve DeepSeek V3/R1, Qwen 2 series performance. [#708](https://github.com/vllm-project/vllm-ascend/pull/708)
+
+### Core
+- LoRA、Multi-LoRA And Dynamic Serving is supported now. The performance will be improved in the next release. Please follow the official doc for more usage information. Thanks for the contribution from China Merchants Bank. [#700](https://github.com/vllm-project/vllm-ascend/pull/700)
+
+### Model
+- The performance of Qwen2 vl and Qwen2.5 vl is improved. [#702](https://github.com/vllm-project/vllm-ascend/pull/702)
+- The performance of `apply_penalties` and `topKtopP` ops are improved. [#525](https://github.com/vllm-project/vllm-ascend/pull/525)
+
+### Other
+- Fixed a issue that may lead CPU memory leak. [#691](https://github.com/vllm-project/vllm-ascend/pull/691) [#712](https://github.com/vllm-project/vllm-ascend/pull/712)
+- A new environment `SOC_VERSION` is added. If you hit any soc detection erro when building with custom ops enabled, please set `SOC_VERSION` to a suitable value. [#606](https://github.com/vllm-project/vllm-ascend/pull/606)
+- openEuler container image supported with v0.7.3-openeuler tag. [#665](https://github.com/vllm-project/vllm-ascend/pull/665)
+- Prefix cache feature works on V1 engine now. [#559](https://github.com/vllm-project/vllm-ascend/pull/559)
+
 ## v0.7.3rc2
 
 This is 2nd release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
diff --git a/docs/source/user_guide/supported_models.md b/docs/source/user_guide/supported_models.md
@@ -12,18 +12,6 @@
 | QwQ-32B | ✅ ||
 | MiniCPM |✅| |
 | LLama3.1/3.2 | ✅ ||
-| Mistral |  | Need test |
-| DeepSeek v2.5 | |Need test |
-| Gemma-2 |  |Need test|
-| Baichuan |  |Need test|
 | Internlm | ✅ ||
-| ChatGLM | ❌ | Plan in Q2|
 | InternVL2.5 | ✅ ||
-| GLM-4v |  |Need test|
 | Molomo | ✅ ||
-| LLaVA1.5 | | Need test|
-| Mllama |  |Need test|
-| LLaVA-Next |  |Need test|
-| LLaVA-Next-Video |  |Need test|
-| Phi-3-Vison/Phi-3.5-Vison |  |Need test|
-| Ultravox |  |Need test|
diff --git a/docs/source/user_guide/suppoted_features.md b/docs/source/user_guide/suppoted_features.md
@@ -4,11 +4,7 @@ The feature support principle of vLLM Ascend is: **aligned with the vLLM**. We a
 
 vLLM Ascend offers the overall functional support of the most features in vLLM, and the usage keep the same with vLLM except for some limits.
 
-```{note}
-MindIE Turbo is an optional performace optimization plugin. Find more information about the feature support of MindIE Turbo here(UPDATE_ME_AS_A_LINK).
-```
-
-| Feature                       | vLLM Ascend    | MindIE Turbo    | Notes                                                                  |
+| Feature                       | vLLM Ascend    | vLLM Ascend (+ MindIE Turbo)    | Notes                                                                  |
 |-------------------------------|----------------|-----------------|------------------------------------------------------------------------|
 | V1Engine                      | 🔵 Experimental| 🔵 Experimental| Will enhance in v0.8.x                                                 |
 | Chunked Prefill               | 🟢 Functional  | 🟢 Functional  | /                                                                      |
@@ -36,3 +32,12 @@ MindIE Turbo is an optional performace optimization plugin. Find more informatio
 | Sleep Mode                    | 🟢 Functional  | 🟢 Functional  | [Usage Limits][#733](https://github.com/vllm-project/vllm-ascend/issues/733) |
 | MTP                           | 🟢 Functional  | 🟢 Functional  | [Usage Limits][#734](https://github.com/vllm-project/vllm-ascend/issues/734) |
 | Custom Scheduler              | 🟢 Functional  | 🟢 Functional  | [Usage Limits][#788](https://github.com/vllm-project/vllm-ascend/issues/788) |
+
+
+*MindIE Turbo is an LLM inference engine acceleration plug-in library on Ascend hardware. Find more information [here](https://www.hiascend.com/document/detail/zh/mindie/20RC1/AcceleratePlugin/turbodev/mindie-turbo-0001.html).*
+
+- 🟢 Functional: Fully operational, with ongoing optimizations.
+
+- 🔵 Experimental: Experimental support, interfaces and functions may change.
+
+- 🟡 Planned: Scheduled for future implementation (some may have open PRs/RFCs).