Skip to content

Conversation

@what-in-the-nim
Copy link
Contributor

@what-in-the-nim what-in-the-nim commented Sep 5, 2025

This PR removes unnecessary CUDA sync in _process_image_input and _process_video_input of vllm/model_executor/models/glm4_1v.py by utilising grid_thw_list.

Related PR #22792
Related issue #23884

Copilot AI review requested due to automatic review settings September 5, 2025 16:45
@github-actions
Copy link

github-actions bot commented Sep 5, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors.

You ask your reviewers to trigger select CI tests on top of fastcheck CI.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

If you have any questions, please reach out to us on Slack at https://slack.vllm.ai.

🚀

Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull Request Overview

This PR removes unnecessary CUDA synchronization in GLM-4.1V model by optimizing the image and video preprocessing methods. The changes prevent implicit CUDA syncs that occur when calling .tolist() on GPU tensors during size calculations.

  • Converts grid_thw tensor to list once at the beginning of each method
  • Replaces tensor operations with CPU-based calculations for computing split sizes
  • Uses the pre-converted list for both visual processing and size calculations

Tip: Customize your code reviews with copilot-instructions.md. Create the file or learn how to get started.

Comment on lines +1475 to +1476
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new tensor from the list and then converting back to list is inefficient. Consider using numpy operations or native Python calculations since grid_thw_list is already a Python list.

Copilot uses AI. Check for mistakes.
Comment on lines +1500 to +1501
Copy link

Copilot AI Sep 5, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Creating a new tensor from the list and then converting back to list is inefficient. Consider using numpy operations or native Python calculations since grid_thw_list is already a Python list.

Copilot uses AI. Check for mistakes.
Signed-off-by: Win <chatcharinsang@gmail.com>
Signed-off-by: Win <chatcharinsang@gmail.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a performance optimization in vllm/model_executor/models/glm4_1v.py by removing an unnecessary CUDA synchronization. The change involves pre-computing a list from the grid_thw tensor at the beginning of the _process_image_input and _process_video_input functions. This allows the subsequent calculation of sizes to be performed on the CPU, overlapping with the main GPU-bound visual processing task and thus avoiding a synchronization point after the main kernel launch. The implementation is correct and should lead to improved performance. Additionally, the change improves robustness by using torch.long for calculations, preventing potential overflows, and enhances readability by grouping (merge_size * merge_size).

@ywang96 ywang96 added the ready ONLY add when PR is ready to merge/full CI is needed label Sep 5, 2025
@ywang96 ywang96 enabled auto-merge (squash) September 5, 2025 20:07
@vllm-bot vllm-bot merged commit 8a46602 into vllm-project:main Sep 8, 2025
38 of 41 checks passed
eicherseiji pushed a commit to eicherseiji/vllm that referenced this pull request Sep 9, 2025
…rocess (vllm-project#24332)

Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
skyloevil pushed a commit to skyloevil/vllm that referenced this pull request Sep 13, 2025
…rocess (vllm-project#24332)

Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
FeiDaLI pushed a commit to FeiDaLI/vllm that referenced this pull request Sep 25, 2025
…rocess (vllm-project#24332)

Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 10, 2025
…rocess (vllm-project#24332)

Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
…rocess (vllm-project#24332)

Signed-off-by: Win <chatcharinsang@gmail.com>
Co-authored-by: Roger Wang <hey@rogerw.io>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants