Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -52,7 +52,7 @@ Please use the following recommended versions to get started quickly:

| Version | Release type | Doc |
|------------|--------------|--------------------------------------|
|v0.11.0rc0|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
|v0.11.0rc1|Latest release candidate|[QuickStart](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/latest/installation.html) for more details|
|v0.9.1|Latest stable version|[QuickStart](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [Installation](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html) for more details|

## Contributing
Expand Down
2 changes: 1 addition & 1 deletion README.zh.md
Original file line number Diff line number Diff line change
Expand Up @@ -53,7 +53,7 @@ vLLM 昇腾插件 (`vllm-ascend`) 是一个由社区维护的让vLLM在Ascend NP

| Version | Release type | Doc |
|------------|--------------|--------------------------------------|
|v0.11.0rc0| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
|v0.11.0rc1| 最新RC版本 |请查看[快速开始](https://vllm-ascend.readthedocs.io/en/latest/quick_start.html)和[安装指南](https://vllm-ascend.readthedocs.io/en/latest/installation.html)了解更多|
|v0.9.1| 最新正式/稳定版本 |[快速开始](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/quick_start.html) and [安装指南](https://vllm-ascend.readthedocs.io/en/v0.9.1-dev/installation.html)了解更多|

## 贡献
Expand Down
2 changes: 2 additions & 0 deletions docs/source/community/versioning_policy.md
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ The table below is the release compatibility matrix for vLLM Ascend release.

| vLLM Ascend | vLLM | Python | Stable CANN | PyTorch/torch_npu | MindIE Turbo |
|-------------|--------------|------------------|-------------|--------------------|--------------|
| v0.11.0rc1 | v0.11.0 | >= 3.9, < 3.12 | 8.3.RC1 | 2.7.1 / 2.7.1 | |
| v0.11.0rc0 | v0.11.0rc3 | >= 3.9, < 3.12 | 8.2.RC1 | 2.7.1 / 2.7.1.dev20250724 | |
| v0.10.2rc1 | v0.10.2 | >= 3.9, < 3.12 | 8.2.RC1 | 2.7.1 / 2.7.1.dev20250724 | |
| v0.10.1rc1 | v0.10.1/v0.10.1.1 | >= 3.9, < 3.12 | 8.2.RC1 | 2.7.1 / 2.7.1.dev20250724 | |
Expand Down Expand Up @@ -49,6 +50,7 @@ For main branch of vLLM Ascend, we usually make it compatible with the latest vL

| Date | Event |
|------------|-------------------------------------------|
| 2025.11.10 | Release candidates, v0.11.0rc1 |
| 2025.09.30 | Release candidates, v0.11.0rc0 |
| 2025.09.16 | Release candidates, v0.10.2rc1 |
| 2025.09.04 | Release candidates, v0.10.1rc1 |
Expand Down
6 changes: 3 additions & 3 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -65,14 +65,14 @@
# the branch of vllm, used in vllm clone
# - main branch: 'main'
# - vX.Y.Z branch: 'vX.Y.Z'
'vllm_version': 'v0.11.0rc3',
'vllm_version': 'v0.11.0',
# the branch of vllm-ascend, used in vllm-ascend clone and image tag
# - main branch: 'main'
# - vX.Y.Z branch: latest vllm-ascend release tag
'vllm_ascend_version': 'v0.11.0rc0',
'vllm_ascend_version': 'v0.11.0rc1',
# the newest release version of vllm-ascend and matched vLLM, used in pip install.
# This value should be updated when cut down release.
'pip_vllm_ascend_version': "0.11.0rc0",
'pip_vllm_ascend_version': "0.11.0rc1",
'pip_vllm_version': "0.11.0",
# CANN image tag
'cann_image_tag': "8.3.rc1-910b-ubuntu22.04-py3.11",
Expand Down
2 changes: 1 addition & 1 deletion docs/source/faqs.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
## Version Specific FAQs

- [[v0.9.1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/2643)
- [[v0.11.0rc0] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/3222)
- [[v0.11.0rc1] FAQ & Feedback](https://github.com/vllm-project/vllm-ascend/issues/3222)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

plz update the issue number


## General FAQs

Expand Down
32 changes: 32 additions & 0 deletions docs/source/user_guide/release_notes.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,37 @@
# Release Notes

## v0.11.0rc1 - 2025.11.10

This is the first release candidate of v0.11.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.11.0-dev) to get started.
v0.11.0 will be the next official release version of vLLM Ascend. We'll release it in the next few days. Any feedback is welcome to help us to improve v0.11.0.

### Highlights
- CANN is upgrade to 8.3.RC1. Torch-npu is upgrade to 2.7.1. [#3945](https://github.com/vllm-project/vllm-ascend/pull/3945) [#3896](https://github.com/vllm-project/vllm-ascend/pull/3896)
- PrefixCache and Chunked Prefill are enabled by default. [#3967](https://github.com/vllm-project/vllm-ascend/pull/3967)
- W4A4 quantization is supported now. [#3427](https://github.com/vllm-project/vllm-ascend/pull/3427) Official tutorial is available at [here](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/single_npu_qwen3_w4a4.html.

### Core
- Performance of Qwen3 and Deepseek V3 series models are improved.
- Mooncake layerwise connector is supported now [#2602](https://github.com/vllm-project/vllm-ascend/pull/2602). Find tutorial [here](https://docs.vllm.ai/projects/ascend/en/latest/tutorials/multi_node_pd_disaggregation_mooncake.html).
- MTP > 1 is supported now. [#2708](https://github.com/vllm-project/vllm-ascend/pull/2708)
- [Experimental] Graph mode `FULL_DECODE_ONLY` is supported now! And `FULL` will be landing in the next few weeks. [#2128](https://github.com/vllm-project/vllm-ascend/pull/2128)
- Pooling models, such as bge-m3, are supported now. [#3171](https://github.com/vllm-project/vllm-ascend/pull/3171)

### Other
- Refactor the MOE module to make it clearer and easier to understand and the performance has improved in both quantitative and non-quantitative scenarios.
- Refactor model register module to make it easier to maintain. We'll remove this module in Q4 2025. [#3004](https://github.com/vllm-project/vllm-ascend/pull/3004)
- Torchair is deprecated. We'll remove it once the performance of ACL Graph is good enough. The deadline is Q1 2026.
- LLMDatadist KV Connector is deprecated. We'll remove it in Q1 2026.
- Refactor the linear module to support features flashcomm1 and flashcomm2 in paper [flashcomm](https://arxiv.org/pdf/2412.04964) [#3004](https://github.com/vllm-project/vllm-ascend/pull/3004) [#3334](https://github.com/vllm-project/vllm-ascend/pull/3334)

### Known issue
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Each known issue should have a related issue link

- The memory may be leaked and the service may be stuck after long time serving. This is a bug from torch-npu, we'll upgrade and fix it soon.
- The accuracy of qwen2.5 VL is not very good. This is a bug lead by CANN, we fix it soon.
- For long sequence input case, there is no response sometimes and the kv cache usage is become higher. This is a bug for scheduler. We are working on it.
- Qwen2-audio doesn't work by default, we're fixing it. Temporary solution is to set `--gpu-memory-utilization` to a suitable value, such as 0.8.
- When running Qwen3-Next with expert parallel enabled, please set `HCCL_BUFFSIZE` environment variable to a suitable value, such as 1024.
- The accuracy of DeepSeek3.2 with aclgraph is not correct. Temporary solution is to set `cudagraph_capture_sizes` to a suitable value depending on the batch size for the input.

## v0.11.0rc0 - 2025.09.30

This is the special release candidate of v0.11.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started.
Expand Down
Loading