-
-
Notifications
You must be signed in to change notification settings - Fork 10.8k
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not
#25925
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model][Bugfix] Fix MiDashengLM audio encoder mask by removing incorrect logical_not
#25925
Conversation
Signed-off-by: zhoukz <me@zhoukz.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request provides a crucial bugfix for the MiDashengLM audio encoder. The change correctly removes an erroneous logical_not() call on the attention mask. This inversion was a leftover from a previous implementation and is incompatible with the scaled_dot_product_attention function, which expects True values for unmasked positions. By removing the inversion, this PR ensures the attention mask is correctly interpreted, fixing the generation of incorrect audio embeddings. The change is precise, well-explained, and I find no further issues.
…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com>
…ect `logical_not` (#25925) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: yewentao256 <zhyanwentao@126.com>
…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com>
…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com>
…ect `logical_not` (vllm-project#25925) Signed-off-by: zhoukz <me@zhoukz.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Purpose
In PR #25854, we addressed an issue in MiDashengLM’s audio encoder attention. The initial fix used a hand-written attention implementation, which, per review feedback, was then replaced with an SDPA-based implementation before merge.
SDPA interprets attention masks in the opposite way compared to our original hand-written code. To stay consistent, the previous logical inversion of the mask should have been removed. However, that adjustment was inadvertently omitted in the submitted changes, causing the audio encoder to produce incorrect embeddings.
This PR removes the leftover inversion and restores correct outputs from the MiDashengLM encoder.
Test Plan
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.