|
1 | 1 | # Release Notes |
2 | 2 |
|
| 3 | +## v0.11.0rc1 - 2025.11.04 |
| 4 | + |
| 5 | +This is the first release candidate of v0.11.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.11.0-dev) to get started. |
| 6 | +v0.11.0 will be the next official release version of vLLM Ascend. We'll release it in the next few days. Any feedback is welcome to help us improve v0.11.0. |
| 7 | + |
| 8 | +### Highlights |
| 9 | +- CANN is upgrade to 8.3.RC1. Torch-npu is upgrade to 2.7.1. [#3945](https://github.com/vllm-project/vllm-ascend/pull/3945) [#3896](https://github.com/vllm-project/vllm-ascend/pull/3896) |
| 10 | +- PrefixCache and Chunked Prefill are enabled by default. ~~[#3300](https://github.com/vllm-project/vllm-ascend/pull/3300)~~ |
| 11 | +- W4A4 quantization is supported now. [#3427](https://github.com/vllm-project/vllm-ascend/pull/3427) |
| 12 | + |
| 13 | +### Core |
| 14 | +- Performance of Qwen and Deepseek series models are improved. |
| 15 | +- Mooncake layerwise connector is supported now [#2602](https://github.com/vllm-project/vllm-ascend/pull/2602) |
| 16 | +- MTP > 1 is supported now. [#2708](https://github.com/vllm-project/vllm-ascend/pull/2708) |
| 17 | +- [Experimental] Graph mode `FULL_DECODE_ONLY` is supported now! And `FULL` will be landing in the next few weeks. [#2128](https://github.com/vllm-project/vllm-ascend/pull/2128) |
| 18 | +- Pooling models, such as bge-m3, are supported now. [#3171](https://github.com/vllm-project/vllm-ascend/pull/3171) |
| 19 | + |
| 20 | +### Other |
| 21 | +- Refactor the MOE module to make it clearer and easier to understand and the performance has improved in both quantitative and non-quantitative scenarios. |
| 22 | +- Refactor model register module to make it easier to maintain. We'll remove this module in Q4 2025. [#3004](https://github.com/vllm-project/vllm-ascend/pull/3004) |
| 23 | +- Torchair is deprecated. We'll remove it once the performance of ACL Graph is good enough. The deadline is Q1 2026. |
| 24 | +- LLMDatadist KV Connector is deprecated. We'll remove it in Q1 2026. |
| 25 | +- Refactor the linear module to support features flashcomm1 and flashcomm2 in paper [flashcomm](https://arxiv.org/pdf/2412.04964) [#3004](https://github.com/vllm-project/vllm-ascend/pull/3004) [#3334](https://github.com/vllm-project/vllm-ascend/pull/3334) |
| 26 | + |
| 27 | +### Known issue |
| 28 | +- Sleep mode doesn't work as expect. It will be fixed in the next release. |
| 29 | +- The memory may be leaked and the service may be stuck after long time serving. This is a bug from torch-npu, we'll upgrade and fix it soon. |
| 30 | +- For long sequence input case, there is no response sometimes and the kv cache usage is become higher. This is a bug for scheduler. We are working on it. |
| 31 | +- The accuracy of qwen2.5 VL is not very good. This is a bug lead by CANN, we fix it soon. |
| 32 | + |
3 | 33 | ## v0.11.0rc0 - 2025.09.30 |
4 | 34 |
|
5 | 35 | This is the special release candidate of v0.11.0 for vLLM Ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to get started. |
|
0 commit comments