Skip to content

Conversation

@mgoin
Copy link
Member

@mgoin mgoin commented Jun 4, 2025

Purpose

We would like to make a vllm-tpu wheel that we can publish so eventually users can have an interface like uv pip install vllm-tpu. This PR achieves it by making a custom script that applies a git patch file to override the name field in the pyproject.toml and an environment variable VLLM_VERSION_OVERRIDE to set the version string in setup.py.

Test

> bash tools/vllm-tpu/build.sh 0.9.0
User defined version: 0.9.0
Modified patch for version override '0.9.0' written to /tmp/tmp.RXBBgaWVzg
Ensuring working directory (/home/mgoin/code/vllm-tpu/) is suitable for applying patch...
Checking if patch (vllm-tpu.patch from /tmp/tmp.RXBBgaWVzg) can be applied cleanly...
Applying patch vllm-tpu.patch (from /tmp/tmp.RXBBgaWVzg)...
Building wheel for TPU...
* Creating isolated environment: venv+pip...
...
adding 'vllm_tpu-0.9.0.dist-info/top_level.txt'
adding 'vllm_tpu-0.9.0.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
Successfully built vllm_tpu-0.9.0.tar.gz and vllm_tpu-0.9.0-py3-none-any.whl
Cleaning up...
Reverting applied patch from vllm-tpu.patch (using /tmp/tmp.ULcxryZk9K)...
Temporary patch file removed.

Verified the vllm-tpu wheel has the right deps for a TPU build:

> ll dist/vllm_tpu-0.9.0-py3-none-any.whl
-rw-rw-r-- 1 mgoin mgoin 3.0M Jun  4 22:47 dist/vllm_tpu-0.9.0-py3-none-any.whl

> pkginfo -f requires_dist dist/vllm_tpu-0.9.0-py3-none-any.whl
requires_dist: ['regex', 'cachetools', 'psutil', 'sentencepiece', 'numpy', 'requests>=2.26.0', 'tqdm', 'blake3', 'py-cpuinfo', 'transformers>=4.51.1', 'huggingface-hub[hf_xet]>=0.32.0', 'tokenizers>=0.21.1', 'protobuf', 'fastapi[standard]>=0.115.0', 'aiohttp', 'openai>=1.52.0', 'pydantic>=2.10', 'prometheus_client>=0.18.0', 'pillow', 'prometheus-fastapi-instrumentator>=7.0.0', 'tiktoken>=0.6.0', 'lm-format-enforcer<0.11,>=0.10.11', 'llguidance<0.8.0,>=0.7.11; platform_machine == "x86_64" or platform_machine == "arm64" or platform_machine == "aarch64"', 'outlines==0.1.11', 'lark==1.2.2', 'xgrammar==0.1.19; platform_machine == "x86_64" or platform_machine == "aarch64"', 'typing_extensions>=4.10', 'filelock>=3.16.1', 'partial-json-parser', 'pyzmq>=25.0.0', 'msgspec', 'gguf>=0.13.0', 'importlib_metadata; python_version < "3.10"', 'mistral_common[opencv]>=1.5.4', 'opencv-python-headless>=4.11.0', 'pyyaml', 'six>=1.16.0; python_version > "3.11"', 'setuptools<80,>=77.0.3; python_version > "3.11"', 'einops', 'compressed-tensors==0.9.4', 'depyf==0.18.0', 'cloudpickle', 'watchfiles', 'python-json-logger', 'scipy', 'ninja', 'opentelemetry-sdk>=1.26.0', 'opentelemetry-api>=1.26.0', 'opentelemetry-exporter-otlp>=1.26.0', 'opentelemetry-semantic-conventions-ai>=0.4.1', 'cmake>=3.26.1', 'packaging>=24.2', 'setuptools-scm>=8', 'wheel', 'jinja2>=3.1.6', 'ray[default]', 'ray[data]', 'setuptools==78.1.0', 'torch==2.8.0.dev20250529', 'torchvision==0.22.0.dev20250529', 'torch_xla[pallas,tpu]@ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev20250529-cp39-cp39-linux_x86_64.whl ; python_version == "3.9"', 'torch_xla[pallas,tpu]@ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev20250529-cp310-cp310-linux_x86_64.whl ; python_version == "3.10"', 'torch_xla[pallas,tpu]@ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev20250529-cp311-cp311-linux_x86_64.whl ; python_version == "3.11"', 'pandas; extra == "bench"', 'datasets; extra == "bench"', 'tensorizer>=2.9.0; extra == "tensorizer"', 'fastsafetensors>=0.1.10; extra == "fastsafetensors"', 'runai-model-streamer; extra == "runai"', 'runai-model-streamer-s3; extra == "runai"', 'boto3; extra == "runai"', 'librosa; extra == "audio"', 'soundfile; extra == "audio"']

cc @bvrockwell @QiliangCui

Signed-off-by: mgoin <michael@neuralmagic.com>
@github-actions
Copy link

github-actions bot commented Jun 4, 2025

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hello @mgoin, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here, providing a summary for this pull request authored by @mgoin.

This PR introduces a new script and an accompanying patch file specifically designed to build a vLLM wheel targeted for TPU devices. The primary goal is to automate the process of creating a vllm-tpu Python package with a user-specified version, distinct from the standard vLLM package.

The script tools/vllm-tpu/build.sh takes the desired version as an argument, dynamically modifies the patch file to inject this version, applies the patch to the main vLLM source code (specifically pyproject.toml and setup.py), builds the wheel using python -m build with the VLLM_TARGET_DEVICE=tpu environment variable, and finally cleans up by reverting the patch and removing temporary files. The patch itself renames the package to vllm-tpu and adjusts the versioning logic in setup.py to accept the version from the patch rather than deriving it automatically, and also removes the automatic '+tpu' suffix from the version string.

The author has included test output demonstrating a successful build of vllm_tpu-0.9.0-py3-none-any.whl and verification of its dependencies using pkginfo, confirming the inclusion of torch_xla and other relevant packages.

Highlights

  • TPU Wheel Build Script: Adds a new bash script (tools/vllm-tpu/build.sh) to automate the process of building a vLLM wheel specifically for TPU environments.
  • Version Overriding: The build script allows specifying the desired version for the vllm-tpu wheel via a command-line argument, which is then injected into the build configuration using a temporary patch.
  • Package Renaming and Versioning Patch: Includes a patch file (tools/vllm-tpu/vllm-tpu.patch) that modifies pyproject.toml to rename the package to vllm-tpu and adjusts setup.py to use a placeholder version string and prevent the automatic '+tpu' suffix.

Changelog

  • tools/vllm-tpu/build.sh
    • Added a new executable bash script to build the vllm-tpu wheel.
    • Implements logic to take a version argument, modify a patch file with the version, apply the patch, run python -m build for the TPU target, and clean up.
    • Includes checks for script execution context, required arguments, patch file existence, and patch applicability.
    • Uses trap for robust cleanup even if the build fails.
  • tools/vllm-tpu/vllm-tpu.patch
    • Added a new patch file used by build.sh.
    • Modifies pyproject.toml to change the project name from vllm to vllm-tpu.
    • Modifies setup.py to replace the dynamic version retrieval with a "PUT_VERSION_HERE" placeholder.
    • Removes the logic in setup.py that appends +tpu to the version string for TPU builds.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in issue comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.


A script to build with care,
A patch applied right there.
TPU wheel takes flight,
Dependencies just right,
Ready for the work to share.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a bash script to build a vllm-tpu wheel, which is a valuable addition for TPU users. The script handles version overriding by patching setup.py and pyproject.toml temporarily. The overall approach is sound, with good use of temporary files, cleanup traps, and pre-checks before applying patches and building.

I've identified a couple of high-severity issues related to how the user-provided version string is handled in echo and sed commands, which could lead to command injection or incorrect patch application if the version string contains special characters. Addressing these will significantly improve the script's robustness and security.

Aside from these, the script is well-structured and includes helpful error messages.

Summary of Findings

  • Command Injection Vulnerability in echo: User-provided $USER_VERSION is used unquoted in echo statements (lines 42, 55), potentially leading to command injection if the version string contains shell metacharacters. This was commented on with high severity.
  • Incorrect sed Replacement: User-provided $USER_VERSION is not escaped for the sed replacement string (line 46), which can cause sed to misinterpret special characters like & and \, leading to a malformed patch. This was commented on with high severity.
  • Missing Newline at EOF: The script tools/vllm-tpu/build.sh is missing a newline character at the end of the file. This is a minor stylistic issue and was not commented on due to review settings (severity: low).
  • Redundant File Check in Cleanup: In tools/vllm-tpu/build.sh (line 64), the check if [ -f "$PATCH_FILE_TEMP" ] before rm -f "$PATCH_FILE_TEMP" is slightly redundant because rm -f handles non-existent files gracefully. This is a minor point and was not commented on due to review settings (severity: low).

Merge Readiness

The pull request is a good step towards providing TPU-specific builds. However, there are critical security and correctness concerns regarding the handling of the USER_VERSION variable in the build.sh script. These issues, flagged with high severity, should be addressed before this PR is merged to ensure the script is robust and safe to use. Once these are resolved, the PR should be in good shape. As a reviewer, I am not authorized to approve pull requests; please ensure further review and approval from authorized maintainers after addressing the feedback.

@mgoin mgoin changed the title Add script to build vllm-tpu wheel Add tool to build vllm-tpu wheel Jun 4, 2025
@QiliangCui
Copy link
Contributor

Thank you Michael for the fix!

A file naming thing: since the code is already under tools/vllm-tpu folder, is it better to rename vllm-tpu.patch to build.patch indicating the patch is for build only?

Signed-off-by: mgoin <mgoin64@gmail.com>
@mgoin mgoin changed the title Add tool to build vllm-tpu wheel [CI/Build] Add tool to build vllm-tpu wheel Jul 1, 2025
@mgoin mgoin marked this pull request as ready for review July 1, 2025 22:07
@mergify mergify bot added the ci/build label Jul 1, 2025
@mgoin
Copy link
Member Author

mgoin commented Jul 1, 2025

I've simplified the implementation by adding the VLLM_VERSION_OVERRIDE env var and inlining the vllm-tpu name change, PTAL

Copy link
Collaborator

@yaochengji yaochengji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding this, Michael!

I tested it locally and it can work!

pip install dist/vllm_tpu-0.9.2.dev360+g104a8a327.d20250710.tpu.tpu-py3-none-any.whl  --no-deps
python examples/offline_inference/tpu.py

Copy link
Collaborator

@simon-mo simon-mo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a good first step!

@mgoin mgoin added ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs labels Jul 23, 2025
@mergify mergify bot removed the tpu Related to Google TPUs label Jul 23, 2025
@jcyang43
Copy link
Contributor

Hi @mgoin, thanks again for the PR!

For places that are getting metadata from vllm (ex: importlib.metadata.metadata("vllm") here or importlib.metadata.version('vllm') here), it looks like we also need to change them from "vllm" to "vllm-tpu". Or else we'll get errors like the following

Traceback (most recent call last):
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 397, in from_name
    return next(cls.discover(name=name))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/bin/vllm", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 39, in main
    version=importlib.metadata.version('vllm'),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 889, in version
    return distribution(distribution_name).version
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 862, in distribution
    return Distribution.from_name(distribution_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 399, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for vllm

@mgoin
Copy link
Member Author

mgoin commented Sep 26, 2025

@jcyang43 would you be able to let me know a patch to fix this? I can merge this PR first and you can open the changes too

@jcyang43
Copy link
Contributor

jcyang43 commented Oct 2, 2025

@jcyang43 would you be able to let me know a patch to fix this? I can merge this PR first and you can open the changes too

Hi Michael, you can use this patch on your current version. It changes 'vllm' to 'vllm-tpu' for those metadata usages and then restore them at the end

@mgoin mgoin merged commit 7ef6052 into vllm-project:main Oct 12, 2025
70 checks passed
1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: 1994 <1994@users.noreply.github.com>
Dhruvilbhatt pushed a commit to Dhruvilbhatt/vllm that referenced this pull request Oct 14, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>
bbartels pushed a commit to bbartels/vllm that referenced this pull request Oct 16, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: bbartels <benjamin@bartels.dev>
lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
xuebwang-amd pushed a commit to xuebwang-amd/vllm that referenced this pull request Oct 24, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: xuebwang-amd <xuebwang@amd.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
0xrushi pushed a commit to 0xrushi/vllm that referenced this pull request Oct 26, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>
rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025
Signed-off-by: mgoin <michael@neuralmagic.com>
Signed-off-by: mgoin <mgoin64@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ci/build ready ONLY add when PR is ready to merge/full CI is needed

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants