Skip to content

Conversation

@attafosu
Copy link
Contributor

  • Enables v1 multmodal support
  • Enables qwen2.5-vl: Support for MRope

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
* Style formatting

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

* Extra mops

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

* appease yapf and ruff

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>

---------

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
@attafosu attafosu force-pushed the dev/attafosu/multimodal-qwen2.5-vl branch from 5de633f to d8a23f5 Compare August 21, 2025 00:28
@attafosu
Copy link
Contributor Author

/run-gaudi-tests

@sys-hab-pt-service
Copy link
Collaborator

Only codeowners can request to run Gaudi tests. Contact list: kzawora-intel, xuechendi, mswiniarsk, adobrzyn

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
token_ids = _async_h2d_tensor(token_ids, torch.int32)
token_positions = _async_h2d_tensor(token_positions, torch.int32)
if not self.uses_mrope:
token_positions = _async_h2d_tensor(token_positions, torch.int32)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we not hard_code tensor as HPU in mrope_token_positions = self._align_and_pad_mrope_positions()
So we don't need to add condition for if do H2D for token_positions here?

token_ids_device = _async_h2d_tensor_copy(token_ids, self.device)
positions_device = _async_h2d_tensor_copy(positions, self.device)
positions_device = input_mrope_positions if self.uses_mrope \
else _async_h2d_tensor_copy(positions, self.device)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same suggestion for input_mrope_positions , let's keep the original logic that firstly done on 'cpu' and use _async_h2d_tensor_copy to convert

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
@attafosu attafosu requested a review from xuechendi August 21, 2025 22:59
mrope_position_tensor = torch.full(out_shape,
padding_gen,
dtype=torch.int32,
device='hpu')
Copy link
Collaborator

@xuechendi xuechendi Aug 21, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this right, I assume we will init as cpu tensor firstly and use async_h2d function to convert

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch.


def _async_h2d_tensor(data, dtype, device='hpu'):
if isinstance(data, torch.Tensor):
return data.to(device=device, dtype=dtype, non_blocking=True)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this line added?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary, and removed.

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
@xuechendi xuechendi merged commit b8217f6 into vllm-project:main Aug 21, 2025
6 checks passed
mswiniarsk pushed a commit that referenced this pull request Aug 25, 2025
- Enables v1 multmodal support
- Enables qwen2.5-vl: Support for MRope

---------

Signed-off-by: attafosu <thomas.atta-fosu@intel.com>
Signed-off-by: Marcin Swiniarski <marcin.swiniarski@intel.com>
@attafosu attafosu deleted the dev/attafosu/multimodal-qwen2.5-vl branch October 15, 2025 21:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants