Skip to content

Commit 9c2b88c

Browse files
committed
[Doc] Add release note for v0.11.0rc1
Signed-off-by: wangxiyuan <wangxiyuan1007@gmail.com>
1 parent 892f1ee commit 9c2b88c

File tree

1 file changed

+30
-0
lines changed

1 file changed

+30
-0
lines changed

docs/source/user_guide/release_notes.md

Lines changed: 30 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,35 @@
11
# Release Notes
22

3+
## v0.11.0rc1 - 2025.11.04
4+
5+
This is the first release candidate of v0.11.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.11.0-dev) to get started.
6+
v0.11.0 will be the next official release version of vLLM Ascend. We'll release it in the next few days. Any feedback is welcome to help us improve v0.11.0.
7+
8+
### Highlights
9+
- CANN is upgrade to 8.3.RC1. Torch-npu is upgrade to 2.7.1. [#3945](https://github.com/vllm-project/vllm-ascend/pull/3945) [#3896](https://github.com/vllm-project/vllm-ascend/pull/3896)
10+
- PrefixCache and Chunked Prefill are enabled by default. ~~[#3300](https://github.com/vllm-project/vllm-ascend/pull/3300)~~
11+
- W4A4 quantization is supported now. [#3427](https://github.com/vllm-project/vllm-ascend/pull/3427)
12+
13+
### Core
14+
- Performance of Qwen and Deepseek series models are improved.
15+
- Mooncake layerwise connector is supported now [#2602](https://github.com/vllm-project/vllm-ascend/pull/2602)
16+
- MTP > 1 is supported now. [#2708](https://github.com/vllm-project/vllm-ascend/pull/2708)
17+
- [Experimental] Graph mode `FULL_DECODE_ONLY` is supported now! And `FULL` will be landing in the next few weeks. [#2128](https://github.com/vllm-project/vllm-ascend/pull/2128)
18+
- Pooling models, such as bge-m3, are supported now. [#3171](https://github.com/vllm-project/vllm-ascend/pull/3171)
19+
20+
### Other
21+
- Refactor the MOE module to make it clearer and easier to understand and the performance has improved in both quantitative and non-quantitative scenarios.
22+
- Refactor model register module to make it easier to maintain. We'll remove this module in Q4 2025. [#3004](https://github.com/vllm-project/vllm-ascend/pull/3004)
23+
- Torchair is deprecated. We'll remove it once the performance of ACL Graph is good enough. The deadline is Q1 2026.
24+
- LLMDatadist KV Connector is deprecated. We'll remove it in Q1 2026.
25+
- Refactor the linear module to support features flashcomm1 and flashcomm2 in paper [flashcomm](https://arxiv.org/pdf/2412.04964) [#3004](https://github.com/vllm-project/vllm-ascend/pull/3004) [#3334](https://github.com/vllm-project/vllm-ascend/pull/3334)
26+
27+
### Known issue
28+
- Sleep mode doesn't work as expect. It will be fixed in the next release.
29+
- The memory may be leaked and the service may be stuck after long time serving. This is a bug from torch-npu, we'll upgrade and fix it soon.
30+
- For long sequence input case, there is no response sometimes and the kv cache usage is become higher. This is a bug for scheduler. We are working on it.
31+
- The accuracy of qwen2.5 VL is not very good. This is a bug lead by CANN, we fix it soon.
32+
333
## v0.11.0rc0 - 2025.09.30
434

535
This is the special release candidate of v0.11.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started.

0 commit comments

Comments
 (0)