@@ -24,28 +24,29 @@ LLM(model="Qwen/Qwen3-8B", additional_config={"config_key":"config_value"})
2424
2525The following table lists the additional configuration options available in vLLM Ascend:
2626
27- | Name | Type | Default | Description |
28- | ---- | ---- | ------- | ----------- |
29- | ` torchair_graph_config ` | dict | ` {} ` | The config options for torchair graph mode |
30- | ` ascend_scheduler_config ` | dict | ` {} ` | The config options for ascend scheduler |
31- | ` expert_tensor_parallel_size ` | str | ` 1 ` | Expert tensor parallel size the model to use. |
27+ | Name | Type | Default | Description |
28+ | ----------------------------- | ---- | ------- | ---------------------------------- ----------- |
29+ | ` torchair_graph_config ` | dict | ` {} ` | The config options for torchair graph mode |
30+ | ` ascend_scheduler_config ` | dict | ` {} ` | The config options for ascend scheduler |
31+ | ` expert_tensor_parallel_size ` | str | ` 1 ` | Expert tensor parallel size the model to use. |
3232
3333The details of each config option are as follows:
3434
3535** torchair_graph_config**
3636
37- | Name | Type | Default | Description |
38- | ---- | ---- | ------- | ----------- |
39- | ` enabled ` | bool | ` False ` | Whether to enable torchair graph mode |
40- | ` use_cached_graph ` | bool | ` False ` | Whether to use cached graph |
41- | ` graph_batch_sizes ` | list[ int] | ` [] ` | The batch size for torchair graph cache |
42- | ` graph_batch_sizes_init ` | bool | ` False ` | Init graph batch size dynamically if ` graph_batch_sizes ` is empty |
37+ | Name | Type | Default | Description |
38+ | ------------------------ | --------- | ------- | ----------------------------------------------------------------- |
39+ | ` enabled ` | bool | ` False ` | Whether to enable torchair graph mode |
40+ | ` use_cached_graph ` | bool | ` False ` | Whether to use cached graph |
41+ | ` graph_batch_sizes ` | list[ int] | ` [] ` | The batch size for torchair graph cache |
42+ | ` graph_batch_sizes_init ` | bool | ` False ` | Init graph batch size dynamically if ` graph_batch_sizes ` is empty |
43+ | ` enable_kv_nz ` | bool | ` False ` | Whether to enable kvcache NZ layout |
4344
4445** ascend_scheduler_config**
4546
46- | Name | Type | Default | Description |
47- | ---- | ---- | ------- | ----------- |
48- | ` enabled ` | bool | ` False ` | Whether to enable ascend scheduler for V1 engine|
47+ | Name | Type | Default | Description |
48+ | --------- | ---- | ------- | ------------------------------------- ----------- |
49+ | ` enabled ` | bool | ` False ` | Whether to enable ascend scheduler for V1 engine |
4950
5051ascend_scheduler_config also support the options from [ vllm scheduler config] ( https://docs.vllm.ai/en/stable/api/vllm/config.html#vllm.config.SchedulerConfig ) . For example, you can add ` chunked_prefill_enabled: true ` to ascend_scheduler_config as well.
5152
@@ -59,7 +60,8 @@ A full example of additional configuration is as follows:
5960 "enabled": true,
6061 "use_cached_graph": true,
6162 "graph_batch_sizes": [1, 2, 4, 8],
62- "graph_batch_sizes_init": true
63+ "graph_batch_sizes_init": true,
64+ "enable_kv_nz": false
6365 },
6466 "ascend_scheduler_config": {
6567 "enabled": true,
0 commit comments