-
-
Notifications
You must be signed in to change notification settings - Fork 11k
[Model][Bugfix] fix ernie45 vl run failed from shared experts optimization #26885
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… optimization pr Signed-off-by: wangyafeng <wangyafeng@baidu.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly fixes a crash in the ernie45_vl_moe model that occurred when processing mixed-modality inputs. The original code incorrectly assumed the MoE layer returns a tensor, while it returns a tuple, leading to an AttributeError. The fix correctly unpacks the tuple and handles the outputs from shared and regular experts.
However, I've identified a critical issue with the current implementation. For layers that are MoE for one modality (e.g., vision) but a standard MLP for another (e.g., text), the code will crash. This is because it unconditionally tries to unpack a tuple, but the MLP returns a single tensor. I've provided a suggestion to make the code robust by checking the type of the expert module before processing its output, which will resolve this issue.
|
cc @bnellnm |
| text_token_mask = ~visual_token_mask | ||
| final_hidden_states = torch.zeros_like(hidden_states) | ||
| final_experts_hidden_states = torch.zeros_like(hidden_states) | ||
| final_shard_ouput = ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: shard -> shared
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for fixing this!
… optimization pr v2 Signed-off-by: wangyafeng <wangyafeng@baidu.com>
@bnellnm Can you trigger the CI? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the delay
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: Alberto Perdomo <aperdomo@redhat.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
…ation (vllm-project#26885) Signed-off-by: wangyafeng <wangyafeng@baidu.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Purpose
Fix the following issue
Due to
SharedFusedMoEforwardreturn is tuple (#26145), it is noflatten()methodTest Plan
Test Result
English translation is