-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[XPU][CI] enhance xpu test support #20652
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by chaojun-zhang <chzhang@habana.ai> Co-authored-by: zufangzhu <zufang.zhu@intel.com> Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai> Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @Liangliang-Ma, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request significantly enhances vLLM's support for Intel XPU devices, primarily by enabling and stabilizing the execution of vLLM v1 tests on the XPU backend. It addresses several compatibility challenges, ensuring that the testing infrastructure and core components correctly interact with XPU hardware and its specific requirements for multiprocessing, memory management, and distributed operations.
Highlights
- XPU Test Compatibility & Stability: The core objective of this PR is to enable and improve the reliability of vLLM v1 tests on Intel XPU devices, moving towards full test coverage.
- XPU Multiprocessing Method: Modified test utilities to explicitly disable the
forkmultiprocessing method for XPU, requiring thespawnmethod for compatibility with XPU's multiprocessing requirements. - XPU-Specific Block Size: Adjusted the default
block_sizein test configurations to64for XPU devices, optimizing memory usage for this platform. - Distributed Communication for XPU: Added a
broadcastoperation to theXPUCommunicatorto facilitate necessary collective communication in distributed XPU environments. - XPU Device and Configuration Handling: Ensured that distributed environments correctly identify and set XPU devices (
xpu:{local_rank}) and introduced temporary measures to force eager execution mode for XPU stability, while also refining checks for distributed executor backends.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
|
This pull request has merge conflicts that must be resolved before it can be |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request enhances test support for the XPU (Intel GPU) backend by introducing several compatibility and stability improvements. The changes include disabling fork for xpu, using a default block size of 64 on xpu, and adding xpu device settings and broadcast functionality. The review focuses on improving code robustness and readability.
tests/utils.py
Outdated
| # To use XPU with multiprocessing, must use the 'spawn' start method via 'VLLM_WORKER_MULTIPROC_METHOD=spawn' | ||
| if current_platform.is_xpu(): | ||
| f(*args, **kwargs) | ||
| return |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To use XPU with multiprocessing, the start method must be 'spawn'. If current_platform.is_xpu() is true, then the function f should be executed using the 'spawn' start method. Otherwise, the XPU device may not be initialized correctly in the new process, leading to runtime errors.
| # To use XPU with multiprocessing, must use the 'spawn' start method via 'VLLM_WORKER_MULTIPROC_METHOD=spawn' | |
| if current_platform.is_xpu(): | |
| f(*args, **kwargs) | |
| return | |
| # To use XPU with multiprocessing, must use the 'spawn' start method via 'VLLM_WORKER_MULTIPROC_METHOD=spawn' | |
| if current_platform.is_xpu(): | |
| import torch.multiprocessing as mp | |
| if mp.get_start_method() != 'spawn': | |
| mp.set_start_method('spawn') | |
| f(*args, **kwargs) | |
| return |
tests/conftest.py
Outdated
| disable_log_stats: bool = True, | ||
| tensor_parallel_size: int = 1, | ||
| block_size: int = 16, | ||
| block_size: int = 16 if not hasattr(torch, 'xpu') else 64, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For improved robustness, it's better to check for XPU availability using torch.xpu.is_available() in addition to hasattr(torch, 'xpu'). This ensures that the XPU device is not only supported but also ready for use.
| block_size: int = 16 if not hasattr(torch, 'xpu') else 64, | |
| block_size: int = 16 if not hasattr(torch, 'xpu') or not torch.xpu.is_available() else 64, |
| elif current_platform.is_xpu(): | ||
| self.device = torch.device(f"xpu:{local_rank}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
vllm/platforms/xpu.py
Outdated
| # FIXME: Temporarily forcing eager mode | ||
| # remove after t.compile support stabilizes. | ||
| if envs.VLLM_USE_V1 and vllm_config.model_config is not None and \ | ||
| not vllm_config.model_config.enforce_eager: | ||
| from vllm.config import CompilationLevel | ||
| vllm_config.compilation_config.level = \ | ||
| CompilationLevel.NO_COMPILATION |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider wrapping the long conditional statement in parentheses for better readability and adherence to PEP 8 style guidelines.
| # FIXME: Temporarily forcing eager mode | |
| # remove after t.compile support stabilizes. | |
| if envs.VLLM_USE_V1 and vllm_config.model_config is not None and \ | |
| not vllm_config.model_config.enforce_eager: | |
| from vllm.config import CompilationLevel | |
| vllm_config.compilation_config.level = \ | |
| CompilationLevel.NO_COMPILATION | |
| if (envs.VLLM_USE_V1 and vllm_config.model_config is not None and | |
| not vllm_config.model_config.enforce_eager): | |
| from vllm.config import CompilationLevel | |
| vllm_config.compilation_config.level = CompilationLevel.NO_COMPILATION |
vllm/platforms/xpu.py
Outdated
| elif parallel_config.distributed_executor_backend != "ray" and \ | ||
| parallel_config.distributed_executor_backend != "uni": | ||
| parallel_config.distributed_executor_backend != "uni" and \ | ||
| parallel_config.distributed_executor_backend != "external_launcher": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider wrapping the long conditional statement in parentheses for better readability and adherence to PEP 8 style guidelines.
| elif parallel_config.distributed_executor_backend != "ray" and \ | |
| parallel_config.distributed_executor_backend != "uni": | |
| parallel_config.distributed_executor_backend != "uni" and \ | |
| parallel_config.distributed_executor_backend != "external_launcher": | |
| elif (parallel_config.distributed_executor_backend != "ray" and | |
| parallel_config.distributed_executor_backend != "uni" and | |
| parallel_config.distributed_executor_backend != "external_launcher"): |
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
Remove fork decorator modification. We would like to use this one: #20649 |
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
DarkLight1337
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you merge in the latest changes and see if the test still passes?
|
This pull request has merge conflicts that must be resolved before it can be |
Merged. |
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
|
Please fix pre-commit |
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com>
dvrogozh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which tests are being fixed by these changes? Can this information be added to the PR description, please? If these affect many, I guess 1 test for each change could be specified.
|
|
||
| def _init_device_properties(self) -> None: | ||
| pass | ||
| self.num_sms = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On top of this change, may I suggest an improvement to move these customization to gpu_model_runner.py as these are just backend specific dispatch logic which can be handled easier without class inheritance? See proposal here:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually these function was in gpu_model_runner.py originally and move to different device model_runner for cleaner readness. So I think we could follow this design.
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Ma, Liangliang <liangliang.ma@intel.com> Co-authored-by: zhenwei-intel <zhenweiliu@habana.ai>
This PR introduces a set of changes to enable running vLLM's tests on the XPU (Intel GPU) backend.
To achieve this, we've made various modifications across the files to improve compatibility and stability with XPU.
These include adjustments:
This is part of an ongoing effort — our goal is to ensure that all vLLM v1 tests can eventually pass on XPU. We will continue refining and expanding test support in future PRs until full test coverage is achieved.