Skip to content

Commit 7cdd1be

Browse files
gshtrasamitm02
authored andcommitted
[ROCm][V0][Attention] Revert to the previous FA triton kernel (vllm-project#18226)
Signed-off-by: Gregory Shtrasberg <Gregory.Shtrasberg@amd.com> Signed-off-by: amit <amit.man@gmail.com>
1 parent 7eca5f2 commit 7cdd1be

File tree

3 files changed

+694
-1083
lines changed

3 files changed

+694
-1083
lines changed

vllm/attention/backends/rocm_flash_attn.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -770,8 +770,9 @@ def forward(
770770
and layer._v_scale and layer._prob_scale
771771
and self.kv_cache_dtype == "fp8")
772772
full_scales = (
773-
layer._q_scale, layer._k_scale, layer._v_scale,
774-
layer._prob_scale) if use_fp8_scales else None
773+
layer._q_scale.item(), layer._k_scale.item(),
774+
layer._v_scale.item(),
775+
layer._prob_scale.item()) if use_fp8_scales else None
775776
self.triton_attn_func(
776777
query,
777778
key,

0 commit comments

Comments
 (0)