-
-
Notifications
You must be signed in to change notification settings - Fork 11.4k
[CI/Build] Add tool to build vllm-tpu wheel #19165
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: mgoin <michael@neuralmagic.com>
|
👋 Hi! Thank you for contributing to the vLLM project. 💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels. Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging. To run CI, PR reviewers can either: Add 🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
Summary of Changes
Hello team, Gemini here, providing a summary for this pull request authored by @mgoin.
This PR introduces a new script and an accompanying patch file specifically designed to build a vLLM wheel targeted for TPU devices. The primary goal is to automate the process of creating a vllm-tpu Python package with a user-specified version, distinct from the standard vLLM package.
The script tools/vllm-tpu/build.sh takes the desired version as an argument, dynamically modifies the patch file to inject this version, applies the patch to the main vLLM source code (specifically pyproject.toml and setup.py), builds the wheel using python -m build with the VLLM_TARGET_DEVICE=tpu environment variable, and finally cleans up by reverting the patch and removing temporary files. The patch itself renames the package to vllm-tpu and adjusts the versioning logic in setup.py to accept the version from the patch rather than deriving it automatically, and also removes the automatic '+tpu' suffix from the version string.
The author has included test output demonstrating a successful build of vllm_tpu-0.9.0-py3-none-any.whl and verification of its dependencies using pkginfo, confirming the inclusion of torch_xla and other relevant packages.
Highlights
- TPU Wheel Build Script: Adds a new bash script (
tools/vllm-tpu/build.sh) to automate the process of building a vLLM wheel specifically for TPU environments. - Version Overriding: The build script allows specifying the desired version for the
vllm-tpuwheel via a command-line argument, which is then injected into the build configuration using a temporary patch. - Package Renaming and Versioning Patch: Includes a patch file (
tools/vllm-tpu/vllm-tpu.patch) that modifiespyproject.tomlto rename the package tovllm-tpuand adjustssetup.pyto use a placeholder version string and prevent the automatic '+tpu' suffix.
Changelog
- tools/vllm-tpu/build.sh
- Added a new executable bash script to build the vllm-tpu wheel.
- Implements logic to take a version argument, modify a patch file with the version, apply the patch, run
python -m buildfor the TPU target, and clean up. - Includes checks for script execution context, required arguments, patch file existence, and patch applicability.
- Uses
trapfor robust cleanup even if the build fails.
- tools/vllm-tpu/vllm-tpu.patch
- Added a new patch file used by
build.sh. - Modifies
pyproject.tomlto change the project name fromvllmtovllm-tpu. - Modifies
setup.pyto replace the dynamic version retrieval with a"PUT_VERSION_HERE"placeholder. - Removes the logic in
setup.pythat appends+tputo the version string for TPU builds.
- Added a new patch file used by
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
A script to build with care,
A patch applied right there.
TPU wheel takes flight,
Dependencies just right,
Ready for the work to share.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request introduces a bash script to build a vllm-tpu wheel, which is a valuable addition for TPU users. The script handles version overriding by patching setup.py and pyproject.toml temporarily. The overall approach is sound, with good use of temporary files, cleanup traps, and pre-checks before applying patches and building.
I've identified a couple of high-severity issues related to how the user-provided version string is handled in echo and sed commands, which could lead to command injection or incorrect patch application if the version string contains special characters. Addressing these will significantly improve the script's robustness and security.
Aside from these, the script is well-structured and includes helpful error messages.
Summary of Findings
- Command Injection Vulnerability in
echo: User-provided$USER_VERSIONis used unquoted inechostatements (lines 42, 55), potentially leading to command injection if the version string contains shell metacharacters. This was commented on with high severity. - Incorrect
sedReplacement: User-provided$USER_VERSIONis not escaped for thesedreplacement string (line 46), which can causesedto misinterpret special characters like&and\, leading to a malformed patch. This was commented on with high severity. - Missing Newline at EOF: The script
tools/vllm-tpu/build.shis missing a newline character at the end of the file. This is a minor stylistic issue and was not commented on due to review settings (severity: low). - Redundant File Check in Cleanup: In
tools/vllm-tpu/build.sh(line 64), the checkif [ -f "$PATCH_FILE_TEMP" ]beforerm -f "$PATCH_FILE_TEMP"is slightly redundant becauserm -fhandles non-existent files gracefully. This is a minor point and was not commented on due to review settings (severity: low).
Merge Readiness
The pull request is a good step towards providing TPU-specific builds. However, there are critical security and correctness concerns regarding the handling of the USER_VERSION variable in the build.sh script. These issues, flagged with high severity, should be addressed before this PR is merged to ensure the script is robust and safe to use. Once these are resolved, the PR should be in good shape. As a reviewer, I am not authorized to approve pull requests; please ensure further review and approval from authorized maintainers after addressing the feedback.
|
Thank you Michael for the fix! A file naming thing: since the code is already under tools/vllm-tpu folder, is it better to rename |
Signed-off-by: mgoin <mgoin64@gmail.com>
|
I've simplified the implementation by adding the |
yaochengji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for adding this, Michael!
I tested it locally and it can work!
pip install dist/vllm_tpu-0.9.2.dev360+g104a8a327.d20250710.tpu.tpu-py3-none-any.whl --no-deps
python examples/offline_inference/tpu.py
simon-mo
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good first step!
|
Hi @mgoin, thanks again for the PR! For places that are getting metadata from vllm (ex: |
|
@jcyang43 would you be able to let me know a patch to fix this? I can merge this PR first and you can open the changes too |
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>
Purpose
We would like to make a
vllm-tpuwheel that we can publish so eventually users can have an interface likeuv pip install vllm-tpu. This PR achieves it by making a custom script that applies a git patch file to override the name field in thepyproject.tomland an environment variableVLLM_VERSION_OVERRIDEto set the version string insetup.py.Test
Verified the vllm-tpu wheel has the right deps for a TPU build:
cc @bvrockwell @QiliangCui