Enable multimodal support + qwen2.5-vl #92

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Merged

xuechendi merged 8 commits into vllm-project:main from attafosu:dev/attafosu/multimodal-qwen2.5-vl

Aug 21, 2025

Contributor

attafosu commented Aug 20, 2025

Enables v1 multmodal support
Enables qwen2.5-vl: Support for MRope

attafosu requested review from adobrzyn, kzawora-intel, mswiniarsk and xuechendi as code owners

August 20, 2025 18:36

attafosu mentioned this pull request

Enable v1 multimodal + qwen2.5-vl #83

Closed

attafosu added 3 commits

August 21, 2025 03:27


          Enable multimodal support and Qwen2.5-vl

f4137ce

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>


          Add CI test for qwen2.5-vl

6b1595a

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>


          Style formatting (#2)

d8a23f5

* Style formatting

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

* Extra mops

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

* appease yapf and ruff

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

---------

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

attafosu force-pushed the dev/attafosu/multimodal-qwen2.5-vl branch from 5de633f to d8a23f5 Compare

August 21, 2025 00:28

Contributor Author

attafosu commented Aug 21, 2025

/run-gaudi-tests

Collaborator

sys-hab-pt-service commented Aug 21, 2025

Only codeowners can request to run Gaudi tests. Contact list: kzawora-intel, xuechendi, mswiniarsk, adobrzyn

attafosu added 2 commits

August 21, 2025 01:04


          Merge branch 'main' into dev/attafosu/multimodal-qwen2.5-vl

f82929b


          Merge branch 'main' into dev/attafosu/multimodal-qwen2.5-vl

89eb3e4

xuechendi reviewed

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated Show resolved Hide resolved


          Remove redundant warmp_scenario

485eeb5

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

xuechendi reviewed

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

    
                      token_ids = _async_h2d_tensor(token_ids, torch.int32)

                      token_positions = _async_h2d_tensor(token_positions, torch.int32)

                      if not self.uses_mrope:

                          token_positions = _async_h2d_tensor(token_positions, torch.int32)

Collaborator

xuechendi Aug 21, 2025

can we not hard_code tensor as HPU in mrope_token_positions = self._align_and_pad_mrope_positions()
So we don't need to add condition for if do H2D for token_positions here?

xuechendi reviewed

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

    
                      token_ids_device = _async_h2d_tensor_copy(token_ids, self.device)

                      positions_device = _async_h2d_tensor_copy(positions, self.device)

                      positions_device = input_mrope_positions if self.uses_mrope \

                          else _async_h2d_tensor_copy(positions, self.device)

Collaborator

xuechendi Aug 21, 2025

same suggestion for input_mrope_positions , let's keep the original logic that firstly done on 'cpu' and use _async_h2d_tensor_copy to convert


          Consolidate position_ids casts for mrope/non-mrope

5f439cb

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

attafosu requested a review from xuechendi

August 21, 2025 22:59

xuechendi reviewed

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

    
                      mrope_position_tensor = torch.full(out_shape,

                                                         padding_gen,

                                                         dtype=torch.int32,

                                                         device='hpu')

Collaborator

xuechendi Aug 21, 2025 •

edited

Loading

Is this right, I assume we will init as cpu tensor firstly and use async_h2d function to convert

Contributor Author

attafosu Aug 21, 2025

Good catch.

xuechendi reviewed

View reviewed changes

vllm_gaudi/v1/worker/hpu_model_runner.py Outdated

    
              def _async_h2d_tensor(data, dtype, device='hpu'):

                  if isinstance(data, torch.Tensor):

                      return data.to(device=device, dtype=dtype, non_blocking=True)

Collaborator

xuechendi Aug 21, 2025

Why this line added?

Contributor Author

attafosu Aug 21, 2025

unnecessary, and removed.


          Clean up mrope tensors aux

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

xuechendi approved these changes

View reviewed changes

xuechendi merged commit b8217f6 into vllm-project:main

6 checks passed

mswiniarsk pushed a commit that referenced this pull request


          Enable multimodal support + qwen2.5-vl (#92)

7bbbe96

- Enables v1 multmodal support
- Enables qwen2.5-vl: Support for MRope

---------

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: Marcin Swiniarski <marcin.swiniarski@intel.com>

attafosu deleted the dev/attafosu/multimodal-qwen2.5-vl branch

October 15, 2025 21:27

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Reviewers

xuechendi xuechendi approved these changes

kzawora-intel Awaiting requested review from kzawora-intel kzawora-intel is a code owner

mswiniarsk Awaiting requested review from mswiniarsk mswiniarsk is a code owner

adobrzyn Awaiting requested review from adobrzyn adobrzyn is a code owner

Labels

None yet