You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/source/community/governance.md
+17Lines changed: 17 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -29,3 +29,20 @@ vLLM Ascend is an open-source project under the vLLM community, where the author
29
29
Requires approval from existing Maintainers. The vLLM community has the final decision-making authority.
30
30
31
31
Maintainer will be empowered [vllm-project/vllm-ascend](https://github.com/vllm-project/vllm-ascend) Github repo write permissions (`Can read, clone, and push to this repository. Can also manage issues and pull requests`).
32
+
33
+
## Nominating and Removing Maintainers
34
+
35
+
### The Principles
36
+
37
+
- Membership in vLLM Ascend is given to individuals on merit basis after they demonstrated strong expertise of the vLLM / vLLM Ascend through contributions, reviews and discussions.
38
+
39
+
- For membership in the maintainer group the individual has to demonstrate strong and continued alignment with the overall vLLM / vLLM Ascend principles.
40
+
41
+
- Light criteria of moving module maintenance to ‘emeritus’ status if they don’t actively participate over long periods of time.
42
+
43
+
- The membership is for an individual, not a company.
44
+
45
+
### Nomination and Removal
46
+
47
+
- Nomination: Anyone can nominate someone to become a maintainer (include self-nominate). All existing maintainers are responsible for evaluating the nomination. The nominator should provide nominee's info around the strength of the candidate to be a maintainer, include but not limited to review quality, quality contribution, community involvement.
48
+
- Removal: Anyone can nominate a person to be removed from maintainer position (include self-nominate). All existing maintainers are responsible for evaluating the nomination. The nominator should provide nominee's info, include but not limited to lack of activity, conflict with the overall direction and other information that makes them unfit to be a maintainer.
This feature is currently experimental. In future versions, there may be behavioral changes around configuration, coverage, performance improvement.
5
+
6
+
This guide provides instructions for using Ascend Graph Mode with vLLM Ascend. Please note that graph mode is only available on V1 Engine. And only Qwen, DeepSeek series models are well tested in 0.9.0rc1. We'll make it stable and generalize in the next release.
7
+
8
+
## Getting Started
9
+
10
+
From v0.9.0rc1 with V1 Engine, vLLM Ascend will run models in graph mode by default to keep the same behavior with vLLM. If you hit any issues, please feel free to open an issue on GitHub and fallback to eager mode temporarily by set `enforce_eager=True` when initializing the model.
11
+
12
+
There are two kinds for graph mode supported by vLLM Ascend:
13
+
-**ACLGraph**: This is the default graph mode supported by vLLM Ascend. In v0.9.0rc1, only Qwen series models are well tested.
14
+
-**TorchAirGraph**: This is the GE graph mode. In v0.9.0rc1, only DeepSeek series models are supported.
15
+
16
+
## Using ACLGraph
17
+
ACLGraph is enabled by default. Take Qwen series models as an example, just set to use V1 Engine is enough.
18
+
19
+
offline example:
20
+
21
+
```python
22
+
import os
23
+
24
+
from vllm importLLM
25
+
26
+
os.environ["VLLM_USE_V1"] =1
27
+
28
+
model = LLM(model="Qwen/Qwen2-7B-Instruct")
29
+
outputs = model.generate("Hello, how are you?")
30
+
```
31
+
32
+
online example:
33
+
34
+
```shell
35
+
vllm serve Qwen/Qwen2-7B-Instruct
36
+
```
37
+
38
+
## Using TorchAirGraph
39
+
40
+
If you want to run DeepSeek series models with graph mode, you should use [TorchAirGraph](https://www.hiascend.com/document/detail/zh/Pytorch/700/modthirdparty/torchairuseguide/torchair_0002.html). In this case, additional config is required.
41
+
42
+
offline example:
43
+
44
+
```python
45
+
import os
46
+
from vllm importLLM
47
+
48
+
os.environ["VLLM_USE_V1"] =1
49
+
50
+
model = LLM(model="deepseek-ai/DeepSeek-R1-0528", additional_config={"torchair_graph_config": {"enable": True}})
Copy file name to clipboardExpand all lines: docs/source/user_guide/release_notes.md
+38Lines changed: 38 additions & 0 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -1,5 +1,43 @@
1
1
# Release note
2
2
3
+
## v0.9.0rc1 - 2025.06.09
4
+
5
+
This is the 1st release candidate of v0.9.0 for vllm-ascend. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/) to start the journey. From this release, V1 Engine is recommended to use. The code of V0 Engine is frozen and will not be maintained any more. Please set environment `VLLM_USE_V1=1` to enable V1 Engine.
6
+
7
+
### Highlights
8
+
9
+
- DeepSeek works with graph mode now. Follow the [official doc](https://vllm-ascend.readthedocs.io/en/latest/user_guide/graph_mode.html) to take a try. [#789](https://github.com/vllm-project/vllm-ascend/pull/789)
10
+
- Qwen series models works with graph mode now. It works by default with V1 Engine. Please note that in this release, only Qwen series models are well tested with graph mode. We'll make it stable and generalize in the next release. If you hit any issues, please feel free to open an issue on GitHub and fallback to eager mode temporarily by set `enforce_eager=True` when initializing the model.
11
+
12
+
### Core
13
+
14
+
- The performance of multi-step scheduler has been improved. Thanks for the contribution from China Merchants Bank. [#814](https://github.com/vllm-project/vllm-ascend/pull/814)
15
+
- LoRA、Multi-LoRA And Dynamic Serving is supported for V1 Engine now. Thanks for the contribution from China Merchants Bank. [#893](https://github.com/vllm-project/vllm-ascend/pull/893)
16
+
- prefix cache and chunked prefill feature works now [#782](https://github.com/vllm-project/vllm-ascend/pull/782)[#844](https://github.com/vllm-project/vllm-ascend/pull/844)
17
+
- Spec decode and MTP features work with V1 Engine now. [#874](https://github.com/vllm-project/vllm-ascend/pull/874)[#890](https://github.com/vllm-project/vllm-ascend/pull/890)
18
+
- DP feature works with DeepSeek now. [#1012](https://github.com/vllm-project/vllm-ascend/pull/1012)
19
+
- Input embedding feature works with V0 Engine now. [#916](https://github.com/vllm-project/vllm-ascend/pull/916)
20
+
- Sleep mode feature works with V1 Engine now. [#1084](https://github.com/vllm-project/vllm-ascend/pull/1084)
21
+
22
+
### Model
23
+
24
+
- Qwen2.5 VL works with V1 Engine now. [#736](https://github.com/vllm-project/vllm-ascend/pull/736)
25
+
- LLama4 works now. [#740](https://github.com/vllm-project/vllm-ascend/pull/740)
26
+
- A new kind of DeepSeek model called dual-batch overlap(DBO) is added. Please set `VLLM_ASCEND_ENABLE_DBO=1` to use it. [#941](https://github.com/vllm-project/vllm-ascend/pull/941)
27
+
28
+
### Other
29
+
30
+
- online serve with ascend quantization works now. [#877](https://github.com/vllm-project/vllm-ascend/pull/877)
31
+
- A batch of bugs for graph mode and moe model have been fixed. [#773](https://github.com/vllm-project/vllm-ascend/pull/773)[#771](https://github.com/vllm-project/vllm-ascend/pull/771)[#774](https://github.com/vllm-project/vllm-ascend/pull/774)[#816](https://github.com/vllm-project/vllm-ascend/pull/816)[#817](https://github.com/vllm-project/vllm-ascend/pull/817)[#819](https://github.com/vllm-project/vllm-ascend/pull/819)[#912](https://github.com/vllm-project/vllm-ascend/pull/912)[#897](https://github.com/vllm-project/vllm-ascend/pull/897)[#961](https://github.com/vllm-project/vllm-ascend/pull/961)[#958](https://github.com/vllm-project/vllm-ascend/pull/958)[#913](https://github.com/vllm-project/vllm-ascend/pull/913)[#905](https://github.com/vllm-project/vllm-ascend/pull/905)
32
+
- A batch of performance improvement PRs have been merged. [#784](https://github.com/vllm-project/vllm-ascend/pull/784)[#803](https://github.com/vllm-project/vllm-ascend/pull/803)[#966](https://github.com/vllm-project/vllm-ascend/pull/966)[#839](https://github.com/vllm-project/vllm-ascend/pull/839)[#970](https://github.com/vllm-project/vllm-ascend/pull/970)[#947](https://github.com/vllm-project/vllm-ascend/pull/947)[#987](https://github.com/vllm-project/vllm-ascend/pull/987)[#1085](https://github.com/vllm-project/vllm-ascend/pull/1085)
33
+
- From this release, binary wheel package will be released as well. [#775](https://github.com/vllm-project/vllm-ascend/pull/775)
34
+
- The contributor doc site is [added](https://vllm-ascend.readthedocs.io/en/latest/community/contributors.html)
35
+
36
+
### Known Issue
37
+
38
+
- In some case, vLLM process may be crashed with aclgraph enabled. We're working this issue and it'll be fixed in the next release.
39
+
- Multi node data-parallel doesn't work with this release. This is a known issue in vllm and has been fixed on main branch. [#18981](https://github.com/vllm-project/vllm/pull/18981)
40
+
3
41
## v0.7.3.post1 - 2025.05.29
4
42
5
43
This is the first post release of 0.7.3. Please follow the [official doc](https://vllm-ascend.readthedocs.io/en/v0.7.3-dev) to start the journey. It includes the following changes:
0 commit comments