Skip to content

Commit 0b5e300

Browse files
committed
move variable to additional config
Signed-off-by: chenwaner <861645847@qq.com>
1 parent 8740191 commit 0b5e300

File tree

4 files changed

+20
-20
lines changed

4 files changed

+20
-20
lines changed

docs/source/user_guide/additional_config.md

Lines changed: 17 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -24,28 +24,29 @@ LLM(model="Qwen/Qwen3-8B", additional_config={"config_key":"config_value"})
2424

2525
The following table lists the additional configuration options available in vLLM Ascend:
2626

27-
| Name | Type | Default | Description |
28-
| ---- | ---- | ------- | ----------- |
29-
| `torchair_graph_config` | dict | `{}` | The config options for torchair graph mode |
30-
| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler |
31-
| `expert_tensor_parallel_size` | str | `1` | Expert tensor parallel size the model to use. |
27+
| Name | Type | Default | Description |
28+
| ----------------------------- | ---- | ------- | --------------------------------------------- |
29+
| `torchair_graph_config` | dict | `{}` | The config options for torchair graph mode |
30+
| `ascend_scheduler_config` | dict | `{}` | The config options for ascend scheduler |
31+
| `expert_tensor_parallel_size` | str | `1` | Expert tensor parallel size the model to use. |
3232

3333
The details of each config option are as follows:
3434

3535
**torchair_graph_config**
3636

37-
| Name | Type | Default | Description |
38-
| ---- | ---- | ------- | ----------- |
39-
| `enabled` | bool | `False` | Whether to enable torchair graph mode |
40-
| `use_cached_graph` | bool | `False` | Whether to use cached graph |
41-
| `graph_batch_sizes` | list[int] | `[]` | The batch size for torchair graph cache |
42-
| `graph_batch_sizes_init` | bool | `False` | Init graph batch size dynamically if `graph_batch_sizes` is empty |
37+
| Name | Type | Default | Description |
38+
| ------------------------ | --------- | ------- | ----------------------------------------------------------------- |
39+
| `enabled` | bool | `False` | Whether to enable torchair graph mode |
40+
| `use_cached_graph` | bool | `False` | Whether to use cached graph |
41+
| `graph_batch_sizes` | list[int] | `[]` | The batch size for torchair graph cache |
42+
| `graph_batch_sizes_init` | bool | `False` | Init graph batch size dynamically if `graph_batch_sizes` is empty |
43+
| `enable_kv_nz` | bool | `False` | Whether to enable kvcache NZ layout |
4344

4445
**ascend_scheduler_config**
4546

46-
| Name | Type | Default | Description |
47-
| ---- | ---- | ------- | ----------- |
48-
| `enabled` | bool | `False` | Whether to enable ascend scheduler for V1 engine|
47+
| Name | Type | Default | Description |
48+
| --------- | ---- | ------- | ------------------------------------------------ |
49+
| `enabled` | bool | `False` | Whether to enable ascend scheduler for V1 engine |
4950

5051
ascend_scheduler_config also support the options from [vllm scheduler config](https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig). For example, you can add `chunked_prefill_enabled: true` to ascend_scheduler_config as well.
5152

@@ -59,7 +60,8 @@ A full example of additional configuration is as follows:
5960
"enabled": true,
6061
"use_cached_graph": true,
6162
"graph_batch_sizes": [1, 2, 4, 8],
62-
"graph_batch_sizes_init": true
63+
"graph_batch_sizes_init": true,
64+
"enable_kv_nz": false
6365
},
6466
"ascend_scheduler_config": {
6567
"enabled": true,

vllm_ascend/ascend_config.py

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,8 @@ def __init__(self, torchair_graph_config):
5555
"graph_batch_sizes_init", False)
5656
self.enable_multistream_shared_expert = torchair_graph_config.get(
5757
"enable_multistream_shared_expert", False)
58+
self.enable_kv_nz = torchair_graph_config.get(
59+
"enable_kv_nz", False)
5860

5961
if not isinstance(self.graph_batch_sizes, list):
6062
raise TypeError("graph_batch_sizes must be list[int]")

vllm_ascend/attention/mla_v1.py

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -13,7 +13,6 @@
1313

1414
from vllm_ascend.ascend_config import get_ascend_config
1515
from vllm_ascend.attention.attention_v1 import AscendAttentionState
16-
import vllm_ascend.envs as envs_ascend
1716
from vllm_ascend.ops.attention import vanilla_chunked_prefill_mla
1817

1918
if TYPE_CHECKING:
@@ -444,9 +443,9 @@ def __init__(
444443
self.kv_a_proj_with_mqa = kwargs.get('kv_a_proj_with_mqa', None)
445444
self.kv_a_layernorm = kwargs.get('kv_a_layernorm', None)
446445

447-
self.enable_kv_nz = envs_ascend.VLLM_ENABLE_KV_NZ
448446
ascend_config = get_ascend_config()
449447
self.torchair_graph_enabled = ascend_config.torchair_graph_config.enabled
448+
self.enable_kv_nz = ascend_config.torchair_graph_config.enable_kv_nz
450449

451450
def _v_up_proj_and_o_proj(self, x):
452451
# Convert from (B, N, L) to (N, B, L)

vllm_ascend/envs.py

Lines changed: 0 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -55,9 +55,6 @@
5555
# Find more detail here: https://www.hiascend.com/document/detail/zh/canncommercial/81RC1/developmentguide/opdevg/ascendcbestP/atlas_ascendc_best_practices_10_0043.html
5656
"VLLM_ENABLE_MC2":
5757
lambda: bool(int(os.getenv("VLLM_ENABLE_MC2", '0'))),
58-
# Whether to enable the kvcache nz optimization, the default value is False.
59-
"VLLM_ENABLE_KV_NZ":
60-
lambda: bool(int(os.getenv("VLLM_ENABLE_KV_NZ", '0'))),
6158
# Whether to enable the topk optimization. It's disabled by default for experimental support
6259
# We'll make it enabled by default in the future.
6360
"VLLM_ASCEND_ENABLE_TOPK_OPTIMZE":

0 commit comments

Comments
 (0)