[Misc] [ROCm] Prevent surplus tensor reshape (#19803)

zsolt-borbely-htec · web-flow · commit aa20d10a9182 · 2025-06-19T13:57:16.000+08:00
Signed-off-by: Zsolt Borbely &lt;zsolt.borbely@htecgroup.com&gt;
diff --git a/vllm/v1/attention/backends/triton_attn.py b/vllm/v1/attention/backends/triton_attn.py
@@ -376,7 +376,7 @@ def forward(
                     query.reshape(
                         (num_tokens, num_heads * head_size)).contiguous(),
                     layer._q_scale)
-            query = query.reshape((num_tokens, num_heads, head_size))
+                query = query.reshape((num_tokens, num_heads, head_size))
 
         use_local_attn = \
             (self.use_irope and attn_metadata.local_attn_metadata is not None)