[ModelRunnerV1] Adapt kv_cache quant in v1. #685

whx-sjtu · 2025-04-27T13:38:50Z

set self.kv_cache_dtype to kv_cache_spec in model_runner_v1 in order to support kv_cache quant in v1

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

set self.kv_cache_dtype to kv_cache_spec in model_runner_v1

b86b000

Signed-off-by: hw_whx <wanghexiang7@huawei.com>

wangxiyuan mentioned this pull request Apr 28, 2025

[Release]: vLLM Ascend v0.7.3 release checklist #644

Closed

46 tasks

wangxiyuan merged commit abf1faa into vllm-project:v0.7.3-dev Apr 28, 2025
12 checks passed

Provide feedback