-
-
Notifications
You must be signed in to change notification settings - Fork 11.1k
[XPU] Use spawn with XPU multiprocessing #20649
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @dvrogozh, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
This pull request implements necessary changes to enable the use of the XPU backend with multiprocessing in vLLM. The core issue addressed is the requirement to use the 'spawn' multiprocessing start method with XPU, as opposed to 'fork', to prevent re-initialization errors. By adjusting process creation logic in both testing utilities and core vLLM utilities, this PR unblocks further testing and development on XPU platforms.
Highlights
- XPU Multiprocessing Support: Modified the
create_new_process_for_each_testdecorator intests/utils.pyto explicitly include XPU platforms when determining whether to default to the 'spawn' multiprocessing method. This aligns XPU behavior with ROCm platforms, which also require 'spawn'. - Test Decorator Update: Updated
tests/v1/e2e/test_cascade_attention.pyto use thecreate_new_process_for_each_test()decorator instead of the deprecatedfork_new_process_for_each_test. This ensures that tests requiring a new process are correctly initialized with the appropriate 'spawn' method for XPU. - XPU Initialization Check: Introduced a new utility function
xpu_is_initialized()invllm/utils/__init__.py. This function checks if thetorch.xpubackend is compiled and currently initialized, providing a robust way to detect XPU activity. - Forced Spawn for XPU: Enhanced the
_maybe_force_spawn()function invllm/utils/__init__.pyto force the 'spawn' multiprocessing start method if the XPU backend is detected as initialized. This directly addresses theRuntimeErrorencountered when XPU is re-initialized in forked subprocesses.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request correctly enables the spawn multiprocessing start method for the XPU backend, which is a necessary fix to avoid runtime errors. The changes are well-targeted to vllm/utils and tests/utils, and the update to test_cascade_attention.py demonstrates the correct usage. The overall logic is sound. I've included one suggestion to improve code readability in tests/utils.py.
It's required to use `spawn` start method running XPU backend with multiprocessing. There are 2 places in vllm where this needs to be fixed: * One in `vllm/utils` * Another in `test/utils` Fix in the test adjusts `create_new_process_for_each_test` decorator which further needs to be used for the actual test. Some tests are already marked with it due to work done for ROCm. In some cases it might still be missing or `fork_new_process_for_each_test` used instead. This commit unlocks running a number of tests on xpu and allows tolook into actual runtime issues. Commit behavior can be tried on these tests: * `tests/v1/engine/test_llm_engine.py::test_engine_metrics` * `tests/v1/e2e/test_cascade_attention.py` Error happenning before the fix: ``` RuntimeError: Cannot re-initialize XPU in forked subprocess. To use XPU with multiprocessing, you must use the 'spawn' start method ``` Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
|
Thanks for fixing that. This one is very clear and reasonable. |
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com> Signed-off-by: Jinzhen Lin <linjinzhen@hotmail.com>
Signed-off-by: Dmitry Rogozhkin <dmitry.v.rogozhkin@intel.com>
Purpose
It's required to use
spawnstart method running XPU backend with multiprocessing. There are 2 places in vllm where this needs to be fixed:vllm/utilstest/utilsFix in the test adjusts
create_new_process_for_each_testdecorator which further needs to be used for the actual test. Some tests are already marked with it due to work done for ROCm. In some cases it might still be missing orfork_new_process_for_each_testused instead.Test Plan
This commit unlocks running a number of tests on xpu and allows to look into actual runtime issues. Commit behavior can be tried on these tests:
tests/v1/engine/test_llm_engine.py::test_engine_metricstests/v1/e2e/test_cascade_attention.pyTest Result
Error happening before the fix:
After the fix:
vLLM core gets initialized and tests start to run. Some pass, some fail - commit basically unlocks testing on XPU unveiling actual issues.
CC: @Liangliang-Ma