Skip to content

Commit b409658

Browse files
committed
new subclass
Signed-off-by: NickLucche <nlucches@redhat.com>
1 parent 413c104 commit b409658

File tree

1 file changed

+3
-1
lines changed

1 file changed

+3
-1
lines changed

vllm/v1/worker/gpu_model_runner.py

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -4574,7 +4574,9 @@ def get_kv_cache_spec(self) -> dict[str, KVCacheSpec]:
45744574
kv_cache_spec: dict[str, KVCacheSpec] = {}
45754575
attn_layers = get_layers_from_vllm_config(self.vllm_config, AttentionLayerBase)
45764576
for layer_name, attn_module in attn_layers.items():
4577-
if (kv_tgt_layer := attn_module.kv_sharing_target_layer_name) is not None:
4577+
if isinstance(attn_module, Attention) and (
4578+
kv_tgt_layer := attn_module.kv_sharing_target_layer_name
4579+
):
45784580
# The layer doesn't need its own KV cache and will use that of
45794581
# the target layer. We skip creating a KVCacheSpec for it, so
45804582
# that KV cache management logic will act as this layer does

0 commit comments

Comments
 (0)