Skip to content

Conversation

@Liangliang-Ma
Copy link
Contributor

@Liangliang-Ma Liangliang-Ma commented Jul 9, 2025

This PR introduces a set of changes to enable running vLLM's tests on the XPU (Intel GPU) backend.

To achieve this, we've made various modifications across the files to improve compatibility and stability with XPU.
These include adjustments:

  • disable fork for xpu (spawn need)
  • use default block size 64 on xpu
  • xpu.py config handle
  • distributed: add xpu device setting
  • xpuCommunicator adds broadcast

This is part of an ongoing effort — our goal is to ensure that all vLLM v1 tests can eventually pass on XPU. We will continue refining and expanding test support in future PRs until full test coverage is achieved.

Co-authored-by chaojun-zhang <chzhang@habana.ai>

Co-authored-by: zufangzhu <zufang.zhu@intel.com>

Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
@github-actions
Copy link

github-actions bot commented Jul 9, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Summary of Changes

Hello @Liangliang-Ma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances vLLM's support for Intel XPU devices, primarily by enabling and stabilizing the execution of vLLM v1 tests on the XPU backend. It addresses several compatibility challenges, ensuring that the testing infrastructure and core components correctly interact with XPU hardware and its specific requirements for multiprocessing, memory management, and distributed operations.

Highlights

  • XPU Test Compatibility & Stability: The core objective of this PR is to enable and improve the reliability of vLLM v1 tests on Intel XPU devices, moving towards full test coverage.
  • XPU Multiprocessing Method: Modified test utilities to explicitly disable the fork multiprocessing method for XPU, requiring the spawn method for compatibility with XPU's multiprocessing requirements.
  • XPU-Specific Block Size: Adjusted the default block_size in test configurations to 64 for XPU devices, optimizing memory usage for this platform.
  • Distributed Communication for XPU: Added a broadcast operation to the XPUCommunicator to facilitate necessary collective communication in distributed XPU environments.
  • XPU Device and Configuration Handling: Ensured that distributed environments correctly identify and set XPU devices (xpu:{local_rank}) and introduced temporary measures to force eager execution mode for XPU stability, while also refining checks for distributed executor backends.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@mergify mergify bot added the v1 label Jul 9, 2025
@mergify
Copy link

mergify bot commented Jul 9, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Liangliang-Ma.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 9, 2025
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request enhances test support for the XPU (Intel GPU) backend by introducing several compatibility and stability improvements. The changes include disabling fork for xpu, using a default block size of 64 on xpu, and adding xpu device settings and broadcast functionality. The review focuses on improving code robustness and readability.

tests/utils.py Outdated
Comment on lines 733 to 736
# To use XPU with multiprocessing, must use the 'spawn' start method via 'VLLM_WORKER_MULTIPROC_METHOD=spawn'
if current_platform.is_xpu():
f(*args, **kwargs)
return
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

To use XPU with multiprocessing, the start method must be 'spawn'. If current_platform.is_xpu() is true, then the function f should be executed using the 'spawn' start method. Otherwise, the XPU device may not be initialized correctly in the new process, leading to runtime errors.

Suggested change
# To use XPU with multiprocessing, must use the 'spawn' start method via 'VLLM_WORKER_MULTIPROC_METHOD=spawn'
if current_platform.is_xpu():
f(*args, **kwargs)
return
# To use XPU with multiprocessing, must use the 'spawn' start method via 'VLLM_WORKER_MULTIPROC_METHOD=spawn'
if current_platform.is_xpu():
import torch.multiprocessing as mp
if mp.get_start_method() != 'spawn':
mp.set_start_method('spawn')
f(*args, **kwargs)
return

disable_log_stats: bool = True,
tensor_parallel_size: int = 1,
block_size: int = 16,
block_size: int = 16 if not hasattr(torch, 'xpu') else 64,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

For improved robustness, it's better to check for XPU availability using torch.xpu.is_available() in addition to hasattr(torch, 'xpu'). This ensures that the XPU device is not only supported but also ready for use.

Suggested change
block_size: int = 16 if not hasattr(torch, 'xpu') else 64,
block_size: int = 16 if not hasattr(torch, 'xpu') or not torch.xpu.is_available() else 64,

Comment on lines +243 to +244
elif current_platform.is_xpu():
self.device = torch.device(f"xpu:{local_rank}")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider using torch.device directly instead of string formatting for device creation. This can improve code readability and reduce the risk of errors.

self.device = torch.device("xpu", local_rank)

Comment on lines 81 to 87
# FIXME: Temporarily forcing eager mode
# remove after t.compile support stabilizes.
if envs.VLLM_USE_V1 and vllm_config.model_config is not None and \
not vllm_config.model_config.enforce_eager:
from vllm.config import CompilationLevel
vllm_config.compilation_config.level = \
CompilationLevel.NO_COMPILATION
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider wrapping the long conditional statement in parentheses for better readability and adherence to PEP 8 style guidelines.

Suggested change
# FIXME: Temporarily forcing eager mode
# remove after t.compile support stabilizes.
if envs.VLLM_USE_V1 and vllm_config.model_config is not None and \
not vllm_config.model_config.enforce_eager:
from vllm.config import CompilationLevel
vllm_config.compilation_config.level = \
CompilationLevel.NO_COMPILATION
if (envs.VLLM_USE_V1 and vllm_config.model_config is not None and
not vllm_config.model_config.enforce_eager):
from vllm.config import CompilationLevel
vllm_config.compilation_config.level = CompilationLevel.NO_COMPILATION

Comment on lines 121 to 123
elif parallel_config.distributed_executor_backend != "ray" and \
parallel_config.distributed_executor_backend != "uni":
parallel_config.distributed_executor_backend != "uni" and \
parallel_config.distributed_executor_backend != "external_launcher":
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Consider wrapping the long conditional statement in parentheses for better readability and adherence to PEP 8 style guidelines.

Suggested change
elif parallel_config.distributed_executor_backend != "ray" and \
parallel_config.distributed_executor_backend != "uni":
parallel_config.distributed_executor_backend != "uni" and \
parallel_config.distributed_executor_backend != "external_launcher":
elif (parallel_config.distributed_executor_backend != "ray" and
parallel_config.distributed_executor_backend != "uni" and
parallel_config.distributed_executor_backend != "external_launcher"):

@mergify mergify bot removed the needs-rebase label Jul 9, 2025
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
@Liangliang-Ma
Copy link
Contributor Author

Remove fork decorator modification. We would like to use this one: #20649

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Copy link
Member

@DarkLight1337 DarkLight1337 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you merge in the latest changes and see if the test still passes?

@mergify
Copy link

mergify bot commented Jul 9, 2025

This pull request has merge conflicts that must be resolved before it can be
merged. Please rebase the PR, @Liangliang-Ma.

https://docs.github.com/en/pull-requests/collaborating-with-pull-requests/working-with-forks/syncing-a-fork

@mergify mergify bot added the needs-rebase label Jul 9, 2025
@mergify mergify bot removed the needs-rebase label Jul 9, 2025
@Liangliang-Ma
Copy link
Contributor Author

Can you merge in the latest changes and see if the test still passes?

Merged.

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
@DarkLight1337
Copy link
Member

Please fix pre-commit

Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
@DarkLight1337 DarkLight1337 enabled auto-merge (squash) July 9, 2025 11:10
@github-actions github-actions bot added the ready ONLY add when PR is ready to merge/full CI is needed label Jul 9, 2025
Copy link
Contributor

@dvrogozh dvrogozh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which tests are being fixed by these changes? Can this information be added to the PR description, please? If these affect many, I guess 1 test for each change could be specified.

@DarkLight1337 DarkLight1337 merged commit a3e4e85 into vllm-project:main Jul 9, 2025
80 checks passed

def _init_device_properties(self) -> None:
pass
self.num_sms = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On top of this change, may I suggest an improvement to move these customization to gpu_model_runner.py as these are just backend specific dispatch logic which can be handled easier without class inheritance? See proposal here:

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually these function was in gpu_model_runner.py originally and move to different device model_runner for cleaner readness. So I think we could follow this design.

Pradyun92 pushed a commit to Pradyun92/vllm that referenced this pull request Aug 6, 2025
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
npanpaliya pushed a commit to odh-on-pz/vllm-upstream that referenced this pull request Aug 6, 2025
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
jinzhen-lin pushed a commit to jinzhen-lin/vllm that referenced this pull request Aug 9, 2025
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
epwalsh pushed a commit to epwalsh/vllm that referenced this pull request Aug 27, 2025
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready ONLY add when PR is ready to merge/full CI is needed v1

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants