Skip to content

Commit b18c17f

Browse files
committed
support TRTLLM FP8 sinks attn kernel
Signed-off-by: elvischenv <219235043+elvischenv@users.noreply.github.com>
1 parent cf4cd6c commit b18c17f

File tree

1 file changed

+0
-5
lines changed

1 file changed

+0
-5
lines changed

vllm/utils/flashinfer.py

Lines changed: 0 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -269,11 +269,6 @@ def use_trtllm_attention(
269269

270270
# Must use TRTLLM attention if query is FP8 quantized
271271
if q_dtype == current_platform.fp8_dtype():
272-
if has_sinks:
273-
raise RuntimeError(
274-
"TRTLLM FP8-qkv kernel is not supported for attention sinks. "
275-
"Use kv_cache_dtype=auto for now."
276-
)
277272
logger.info_once("Using TRTLLM attention (query is quantized).")
278273
return True
279274

0 commit comments

Comments
 (0)