-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Model] Use merge_by_field_config for MM models (Llava family)
#26280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Model] Use merge_by_field_config for MM models (Llava family)
#26280
Conversation
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request is a nice refactoring that standardizes multi-modal input processing for the Llava family of models by enabling merge_by_field_config. This change simplifies the model-specific code by removing boilerplate for input validation and flattening, which improves maintainability. It also enables new functionality, such as multi-video support for LlavaNextVideoForConditionalGeneration. The only concern is the addition of trust_remote_code=True in the example file, which poses a security risk. I've added a comment to suggest adding a warning for users.
Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…m-project#26280) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: Karan Goel <3261985+karan@users.noreply.github.com>
…m-project#26280) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…m-project#26280) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…m-project#26280) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…m-project#26280) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
…m-project#26280) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
…m-project#26280) Signed-off-by: DarkLight1337 <tlleungac@connect.ust.hk>
Purpose
Part of #26149
Also:
Enable multi-video input for LLaVA-NeXT-VideoTest Plan
Tested all models other than MiniMax-VL (OOM on my setup)
Test Result
Essential Elements of an Effective PR Description Checklist
supported_models.mdandexamplesfor a new model.