Skip to content

Conversation

@zucchini-nlp
Copy link
Contributor

@zucchini-nlp zucchini-nlp commented Aug 18, 2025

As per title, ensures that compile is supported for all models. The only issue we had previously was with Qwen-VL model family and now it is fixed if we mark position ids' last dim as dynamic

Apparently in main branch compile is enabled already because the base model has the decorator, so it is simply failing to run Qwen-VL if we don't enforce eager.

Benchmark results on Vision-Arena for Qwen with/without compilation is below. The vLLM is still much faster for this model which ig is due to vision attention. For other models like LlaVa there is no difference in throughput

Model: 'Qwen/Qwen2-VL-2B-Instruct'; Total num prompt tokens:  770572; Total num output tokens:  128000

Transformers backend:
Compiled - Throughput: 10.71 requests/s, 9625.89 total tokens/s, 1371.19 output tokens/s
Eager - Throughput: 9.76 requests/s, 8771.64 total tokens/s, 1249.51 output tokens/s

vLLM backend:
Compiled- Throughput: 12.95 requests/s, 11634.86 total tokens/s, 1657.37 output tokens/s
Eager - Throughput: 12.87 requests/s, 11561.67 total tokens/s, 1646.94 output tokens/s


Another model to compare throughput - "llava-hf/llava-v1.6-mistral-7b-hf"
vLLM compiled: Throughput: 3.17 requests/s, 7664.14 total tokens/s, 405.24 output tokens/s
Transformers compiled: Throughput: 3.07 requests/s, 7444.31 total tokens/s, 393.45 output tokens/s

cc @hmellor @Isotr0py

Signed-off-by: raushan <raushan@huggingface.co>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enables torch.compile for multimodal transformer models, which specifically addresses a compilation failure with the Qwen-VL model family. The fix involves specifying dynamic dimensions for arguments in the model's forward pass. While the change is small and targeted, I have a concern that the implementation might be too broad and could potentially impact other multimodal models. My review includes one high-severity comment detailing this concern.

Comment on lines +684 to +690
@support_torch_compile(
dynamic_arg_dims={
"input_ids": 0,
"positions": -1,
"intermediate_tensors": 0,
"inputs_embeds": 0,
}) # set `positions` to last dim to support Qwen-mrope
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This change hardcodes the dynamic dimension for the positions argument to -1. While the comment indicates this is to support Qwen-mrope, applying this configuration to the generic TransformersForMultimodalLM class may cause issues for other multimodal models.

For many models, the positions tensor has a shape of (num_tokens, 2), where the number of tokens is the dynamic dimension (dim 0). Marking dimension -1 as dynamic would incorrectly target the feature dimension of size 2, potentially causing compilation or runtime errors for non-Qwen models that use this class.

To ensure this fix doesn't introduce regressions, consider making this configuration specific to Qwen-VL models. One approach could be to introduce a separate class for Qwen-VL that encapsulates this compilation behavior, and update the model loading logic to use it for the relevant models.

@github-actions
Copy link

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Member

@Isotr0py Isotr0py left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, LGTM!

@Isotr0py Isotr0py enabled auto-merge (squash) August 18, 2025 09:54
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Aug 18, 2025
@Isotr0py Isotr0py merged commit 0e3bb54 into vllm-project:main Aug 18, 2025
41 of 42 checks passed
princepride pushed a commit to princepride/vllm that referenced this pull request Aug 20, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
divakar-amd pushed a commit to divakar-amd/vllm_upstream that referenced this pull request Aug 20, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
cyang49 pushed a commit to cyang49/vllm that referenced this pull request Aug 20, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
djmmoss pushed a commit to djmmoss/vllm that referenced this pull request Aug 21, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Duncan Moss <djm.moss@gmail.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 28, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
xiao-llm pushed a commit to xiao-llm/vllm that referenced this pull request Aug 28, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
Signed-off-by: Xiao Yu <xiao.yu@amd.com>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Aug 28, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
mengxingkongzhouhan pushed a commit to mengxingkongzhouhan/vllm that referenced this pull request Aug 30, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
zhewenl pushed a commit to zhewenl/vllm that referenced this pull request Sep 3, 2025
)

Signed-off-by: raushan <raushan@huggingface.co>
@hmellor hmellor moved this to Done in Transformers backend Oct 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

2 participants