Skip to content

Conversation

@amirumoAMD
Copy link
Contributor

@amirumoAMD amirumoAMD commented Jan 16, 2026

Motivation

Aim was to create a proper solution that didn't just skip over the parameter for kv_scale or output_scale in LLFP4 or LLFP8,
and loaded each parameter properly.

Technical Details

Small changes to attention_mha to have k_scale and v_scale load as parameters not just NoneType, functions to remap the parameter names for the inputted tensors, and handling weight loading.

Test Plan

Trace looks normal, and lm_eval results match vllm lm_eval results with the same command used.

Test Result

Testing passed.

Submission Checklist

Copy link
Collaborator

@ChuanLi1101 ChuanLi1101 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good with minor suggestions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants