Skip to content

[fix] disable gluon pa for llama#113

Open
gbyu-amd wants to merge 1 commit intomainfrom
guanbao/fix_llama
Open

[fix] disable gluon pa for llama#113
gbyu-amd wants to merge 1 commit intomainfrom
guanbao/fix_llama

Conversation

@gbyu-amd
Copy link
Contributor

@gbyu-amd gbyu-amd commented Jan 6, 2026

Motivation

Gluon pa is not fully verified with llama, thus disable this path for llama for now.

Submission Checklist

else:
# Qwen only uses gluon pa decode when bs=64
return self.paged_attention_triton if ctx.batch_size == 64 else self.paged_attention_asm
if ATOM_ENABLE_QK_NORM_ROPE_CACHE_QUANT_FUSION:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we also make a change for the paged_attention_triton for line 405?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llama and qwen will not trigger the if condition in line 404, check line 132. One problem is that some if conditions here maybe not orthogonal enough for different models.
The accuracy of llama is back to normal with this pr:
image

Copy link
Contributor

@scxiao scxiao Jan 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But the Qwen3 model failed for the gluon attn here, https://github.com/ROCm/ATOM/actions/runs/20798013041/job/59736412001?pr=56. Do you know why it failed here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image the gluon pa api has been changed in aiter side, but the integration remains unchanged in ATOM. bernard_ps_pa_upstream gave a fix and waiting for merge.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants