-
-
Notifications
You must be signed in to change notification settings - Fork 11.3k
[Model] Use merge_by_field_config for MM models (G)
#26117
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Use merge_by_field_config for MM models (G)
#26117
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a great refactoring that simplifies the multimodal input processing for several models by adopting the new merge_by_field_config mechanism. The changes consistently remove manual input processing logic, such as calls to flatten_bn and type checks, and delegate this responsibility to the MultiModalFieldConfig system. This makes the code cleaner and more maintainable.
The refactoring has been applied to gemma3_mm, gemma3n_mm, glm4_1v, glm4v, and granite_speech models. Key improvements include:
- Simplification of
_parse_and_validate_*_inputmethods. - Removal of redundant helper functions like
_validate_and_reshape_mm_tensor. - In
gemma3n_mm.py, the input data classes were nicely refactored fromTypedDicttoTensorSchema, improving type safety and clarity. - In
glm4v.py, a subtle but important bug was fixed by explicitly overridingget_input_embeddingsto use the implementation fromSupportsMultiModal, ensuring correct multimodal embedding merging. - In
granite_speech.py, a small but effective optimization was introduced for calculatingaudio_embed_sizes.
The changes are well-executed and consistent across all affected models. The code is now more robust and easier to understand. I have not found any issues of high or critical severity. Great work!
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: yewentao256 <zhyanwentao@126.com>
Failing is captured when vllm-project/vllm#26117 merged And the actual update should be done according to vllm-project/vllm#25676 Signed-off-by: Chendi Xue <Chendi.Xue@intel.com>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Tomer Asida <57313761+tomeras91@users.noreply.github.com>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…6117) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Purpose
Part of #26149
Test Plan
Model and tensor schema tests should pass. I have also run the example script on all five models.
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.