Skip to content

Commit 1b7cfd5

Browse files
authored
[ROCm][V0][Attention] Revert to the previous FA triton kernel (#18226)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com>
1 parent da4b69d commit 1b7cfd5

File tree

3 files changed

+694
-1083
lines changed

3 files changed

+694
-1083
lines changed

vllm/attention/backends/rocm_flash_attn.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -770,8 +770,9 @@ def forward(
770770
and layer._v_scale and layer._prob_scale
771771
and self.kv_cache_dtype == "fp8")
772772
full_scales = (
773-
layer._q_scale, layer._k_scale, layer._v_scale,
774-
layer._prob_scale) if use_fp8_scales else None
773+
layer._q_scale.item(), layer._k_scale.item(),
774+
layer._v_scale.item(),
775+
layer._prob_scale.item()) if use_fp8_scales else None
775776
self.triton_attn_func(
776777
query,
777778
key,

0 commit comments

Comments
 (0)