Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,10 +67,10 @@
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
# - main branch: 'main'
# - vX.Y.Z branch: latest vllm-ascend release tag
'vllm_ascend_version': 'v0.7.3rc2',
'vllm_ascend_version': 'v0.7.3',
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
# This value should be updated when cut down release.
'pip_vllm_ascend_version': "0.7.3rc2",
'pip_vllm_ascend_version': "0.7.3",
'pip_vllm_version': "0.7.3",
# The maching MindIE Turbo for vLLM Ascend
'pip_mindie_turbo_version': "2.0rc1",
Expand Down
102 changes: 0 additions & 102 deletions docs/source/developer_guide/contributing.zh.md

This file was deleted.

13 changes: 7 additions & 6 deletions docs/source/developer_guide/versioning_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -65,19 +65,20 @@ As shown above:

Following is the Release Compatibility Matrix for vLLM Ascend Plugin:

| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu |
|--------------|--------------| --- | --- | --- |
| v0.7.3rc2 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250320 |
| v0.7.3rc1 | v0.7.3 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 |
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.12 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 |
| vllm-ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
|--------------|--------------| --- | --- | --- | --- |
| v0.7.3 | v0.7.3 | 3.9 - 3.11 | 8.1.0 | 2.5.1 / 2.5.1 | 2.0rc1 |
| v0.7.3rc2 | v0.7.3 | 3.9 - 3.11 | 8.0.0 | 2.5.1 / 2.5.1.dev20250320 | / |
| v0.7.3rc1 | v0.7.3 | 3.9 - 3.11 | 8.0.0 | 2.5.1 / 2.5.1.dev20250308 | / |
| v0.7.1rc1 | v0.7.1 | 3.9 - 3.11 | 8.0.0 | 2.5.1 / 2.5.1.dev20250218 | / |

## Release cadence

### release window

| Date | Event |
|------------|-------------------------------------------|
| 2025.04.30 | Final release, v0.7.3(The official release rely on the release of torch-npu and CANN8.1, so it's delayed)|
| 2025.05.08 | Final release, v0.7.3 |
| 2025.04.17 | Release candidates, v0.8.4rc1 |
| 2025.03.28 | Release candidates, v0.7.3rc2 |
| 2025.03.14 | Release candidates, v0.7.3rc1 |
Expand Down
79 changes: 0 additions & 79 deletions docs/source/developer_guide/versioning_policy.zh.md

This file was deleted.

7 changes: 4 additions & 3 deletions docs/source/installation.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,7 @@ This document describes how to install vllm-ascend manually.
- OS: Linux
- Python: >= 3.9, < 3.12
- A hardware with Ascend NPU. It's usually the Atlas 800 A2 series.
- Firmware: Ascend HDK >= 24.1RC1
- Software:

| Software | Supported version | Note |
Expand Down Expand Up @@ -119,7 +120,7 @@ First install system dependencies:

```bash
apt update -y
apt install -y gcc g++ libnuma-dev
apt install -y gcc g++ libnuma-dev git
```

You can install `vllm` and `vllm-ascend` from **pre-built wheel**:
Expand Down Expand Up @@ -269,5 +270,5 @@ Prompt: 'The future of AI is', Generated text: ' not bright\n\nThere is no doubt

Get more performance gains by optimizing Python and torch-npu with the Bisheng compiler, please follow these official turtorial:

[Optimizing Python with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0063.html)
[Optimizing torch-npu with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0058.html)
- [Optimizing Python with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0063.html)
- [Optimizing torch-npu with Bisheng](https://www.hiascend.com/document/detail/zh/Pytorch/600/ptmoddevg/trainingmigrguide/performance_tuning_0058.html)
25 changes: 25 additions & 0 deletions docs/source/user_guide/release_notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,30 @@
# Release note

## v0.7.3

🎉 Hello, World!

We are excited to announce the release of 0.7.3 for vllm-ascend. This is the first official release. The functionality, performance, and stability of this release are fully tested and verified. We encourage you to try it out and provide feedback. We'll post bug fix versions in the future if needed. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.

### Highlights
- This release includes all features landed in the previous release candidates ([v0.7.1rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.1rc1), [v0.7.3rc1](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc1), [v0.7.3rc2](https://github.com/vllm-project/vllm-ascend/releases/tag/v0.7.3rc2)). And all the features are fully tested and verified. Visit the official doc the get the detail [feature](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/user_guide/suppoted_features.html) and [model](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev/user_guide/supported_models.html) support matrix.
- Upgrade CANN to 8.1.RC1 to enable chunked prefill and automatic prefix caching features. You can now enable them now.
- Upgrade PyTorch to 2.5.1. vLLM Ascend no longer relies on the dev version of torch-npu now. Now users don't need to install the torch-npu by hand. The 2.5.1 version of torch-npu will be installed automaticlly. [#662](https://github.com/vllm-project/vllm-ascend/pull/662)
- Integrate MindIE Turbo into vLLM Ascend to improve DeepSeek V3/R1, Qwen 2 series performance. [#708](https://github.com/vllm-project/vllm-ascend/pull/708)

### Core
- LoRA、Multi-LoRA And Dynamic Serving is supported now. The performance will be improved in the next release. Please follow the official doc for more usage information. Thanks for the contribution from China Merchants Bank. [#700](https://github.com/vllm-project/vllm-ascend/pull/700)

### Model
- The performance of Qwen2 vl and Qwen2.5 vl is improved. [#702](https://github.com/vllm-project/vllm-ascend/pull/702)
- The performance of `apply_penalties` and `topKtopP` ops are improved. [#525](https://github.com/vllm-project/vllm-ascend/pull/525)

### Other
- Fixed a issue that may lead CPU memory leak. [#691](https://github.com/vllm-project/vllm-ascend/pull/691) [#712](https://github.com/vllm-project/vllm-ascend/pull/712)
- A new environment `SOC_VERSION` is added. If you hit any soc detection erro when building with custom ops enabled, please set `SOC_VERSION` to a suitable value. [#606](https://github.com/vllm-project/vllm-ascend/pull/606)
- openEuler container image supported with v0.7.3-openeuler tag. [#665](https://github.com/vllm-project/vllm-ascend/pull/665)
- Prefix cache feature works on V1 engine now. [#559](https://github.com/vllm-project/vllm-ascend/pull/559)

## v0.7.3rc2

This is 2nd release candidate of v0.7.3 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey.
Expand Down
12 changes: 0 additions & 12 deletions docs/source/user_guide/supported_models.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,18 +12,6 @@
| QwQ-32B | ✅ ||
| MiniCPM |✅| |
| LLama3.1/3.2 | ✅ ||
| Mistral | | Need test |
| DeepSeek v2.5 | |Need test |
| Gemma-2 | |Need test|
| Baichuan | |Need test|
| Internlm | ✅ ||
| ChatGLM | ❌ | Plan in Q2|
| InternVL2.5 | ✅ ||
| GLM-4v | |Need test|
| Molomo | ✅ ||
| LLaVA1.5 | | Need test|
| Mllama | |Need test|
| LLaVA-Next | |Need test|
| LLaVA-Next-Video | |Need test|
| Phi-3-Vison/Phi-3.5-Vison | |Need test|
| Ultravox | |Need test|
Loading