-
-
Notifications
You must be signed in to change notification settings - Fork 11.2k
[Bugfix] Support compile for Transformers multimodal #23095
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: raushan <raushan@huggingface.co>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enables torch.compile for multimodal transformer models, which specifically addresses a compilation failure with the Qwen-VL model family. The fix involves specifying dynamic dimensions for arguments in the model's forward pass. While the change is small and targeted, I have a concern that the implementation might be too broad and could potentially impact other multimodal models. My review includes one high-severity comment detailing this concern.
| @support_torch_compile( | ||
| dynamic_arg_dims={ | ||
| "input_ids": 0, | ||
| "positions": -1, | ||
| "intermediate_tensors": 0, | ||
| "inputs_embeds": 0, | ||
| }) # set `positions` to last dim to support Qwen-mrope |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change hardcodes the dynamic dimension for the positions argument to -1. While the comment indicates this is to support Qwen-mrope, applying this configuration to the generic TransformersForMultimodalLM class may cause issues for other multimodal models.
For many models, the positions tensor has a shape of (num_tokens, 2), where the number of tokens is the dynamic dimension (dim 0). Marking dimension -1 as dynamic would incorrectly target the feature dimension of size 2, potentially causing compilation or runtime errors for non-Qwen models that use this class.
To ensure this fix doesn't introduce regressions, consider making this configuration specific to Qwen-VL models. One approach could be to introduce a separate class for Qwen-VL that encapsulates this compilation behavior, and update the model loading logic to use it for the relevant models.
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
Isotr0py
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, LGTM!
As per title, ensures that compile is supported for all models. The only issue we had previously was with Qwen-VL model family and now it is fixed if we mark position ids' last dim as dynamic
Apparently in
mainbranch compile is enabled already because the base model has the decorator, so it is simply failing to run Qwen-VL if we don't enforce eager.Benchmark results on Vision-Arena for Qwen with/without compilation is below. The vLLM is still much faster for this model which ig is due to vision attention. For other models like LlaVa there is no difference in throughput
cc @hmellor @Isotr0py