[CI/Build] Add tool to build vllm-tpu wheel #19165

mgoin · 2025-06-04T22:53:15Z

Purpose

We would like to make a vllm-tpu wheel that we can publish so eventually users can have an interface like uv pip install vllm-tpu. This PR achieves it by making a custom script that applies a git patch file to override the name field in the pyproject.toml and an environment variable VLLM_VERSION_OVERRIDE to set the version string in setup.py.

Test

> bash tools/vllm-tpu/build.sh 0.9.0
User defined version: 0.9.0
Modified patch for version override '0.9.0' written to /tmp/tmp.RXBBgaWVzg
Ensuring working directory (/home/mgoin/code/vllm-tpu/) is suitable for applying patch...
Checking if patch (vllm-tpu.patch from /tmp/tmp.RXBBgaWVzg) can be applied cleanly...
Applying patch vllm-tpu.patch (from /tmp/tmp.RXBBgaWVzg)...
Building wheel for TPU...
* Creating isolated environment: venv+pip...
...
adding 'vllm_tpu-0.9.0.dist-info/top_level.txt'
adding 'vllm_tpu-0.9.0.dist-info/RECORD'
removing build/bdist.linux-x86_64/wheel
Successfully built vllm_tpu-0.9.0.tar.gz and vllm_tpu-0.9.0-py3-none-any.whl
Cleaning up...
Reverting applied patch from vllm-tpu.patch (using /tmp/tmp.ULcxryZk9K)...
Temporary patch file removed.

Verified the vllm-tpu wheel has the right deps for a TPU build:

> ll dist/vllm_tpu-0.9.0-py3-none-any.whl
-rw-rw-r-- 1 mgoin mgoin 3.0M Jun  4 22:47 dist/vllm_tpu-0.9.0-py3-none-any.whl

> pkginfo -f requires_dist dist/vllm_tpu-0.9.0-py3-none-any.whl
requires_dist: ['regex', 'cachetools', 'psutil', 'sentencepiece', 'numpy', 'requests>=2.26.0', 'tqdm', 'blake3', 'py-cpuinfo', 'transformers>=4.51.1', 'huggingface-hub[hf_xet]>=0.32.0', 'tokenizers>=0.21.1', 'protobuf', 'fastapi[standard]>=0.115.0', 'aiohttp', 'openai>=1.52.0', 'pydantic>=2.10', 'prometheus_client>=0.18.0', 'pillow', 'prometheus-fastapi-instrumentator>=7.0.0', 'tiktoken>=0.6.0', 'lm-format-enforcer<0.11,>=0.10.11', 'llguidance<0.8.0,>=0.7.11; platform_machine == "x86_64" or platform_machine == "arm64" or platform_machine == "aarch64"', 'outlines==0.1.11', 'lark==1.2.2', 'xgrammar==0.1.19; platform_machine == "x86_64" or platform_machine == "aarch64"', 'typing_extensions>=4.10', 'filelock>=3.16.1', 'partial-json-parser', 'pyzmq>=25.0.0', 'msgspec', 'gguf>=0.13.0', 'importlib_metadata; python_version < "3.10"', 'mistral_common[opencv]>=1.5.4', 'opencv-python-headless>=4.11.0', 'pyyaml', 'six>=1.16.0; python_version > "3.11"', 'setuptools<80,>=77.0.3; python_version > "3.11"', 'einops', 'compressed-tensors==0.9.4', 'depyf==0.18.0', 'cloudpickle', 'watchfiles', 'python-json-logger', 'scipy', 'ninja', 'opentelemetry-sdk>=1.26.0', 'opentelemetry-api>=1.26.0', 'opentelemetry-exporter-otlp>=1.26.0', 'opentelemetry-semantic-conventions-ai>=0.4.1', 'cmake>=3.26.1', 'packaging>=24.2', 'setuptools-scm>=8', 'wheel', 'jinja2>=3.1.6', 'ray[default]', 'ray[data]', 'setuptools==78.1.0', 'torch==2.8.0.dev20250529', 'torchvision==0.22.0.dev20250529', 'torch_xla[pallas,tpu]@ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev20250529-cp39-cp39-linux_x86_64.whl ; python_version == "3.9"', 'torch_xla[pallas,tpu]@ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev20250529-cp310-cp310-linux_x86_64.whl ; python_version == "3.10"', 'torch_xla[pallas,tpu]@ https://storage.googleapis.com/pytorch-xla-releases/wheels/tpuvm/torch_xla-2.8.0.dev20250529-cp311-cp311-linux_x86_64.whl ; python_version == "3.11"', 'pandas; extra == "bench"', 'datasets; extra == "bench"', 'tensorizer>=2.9.0; extra == "tensorizer"', 'fastsafetensors>=0.1.10; extra == "fastsafetensors"', 'runai-model-streamer; extra == "runai"', 'runai-model-streamer-s3; extra == "runai"', 'boto3; extra == "runai"', 'librosa; extra == "audio"', 'soundfile; extra == "audio"']

cc @bvrockwell @QiliangCui

Signed-off-by: mgoin <michael@neuralmagic.com>

github-actions · 2025-06-04T22:53:22Z

👋 Hi! Thank you for contributing to the vLLM project.

💬 Join our developer Slack at https://slack.vllm.ai to discuss your PR in #pr-reviews, coordinate on features in #feat- channels, or join special interest groups in #sig- channels.

Just a reminder: PRs would not trigger full CI run by default. Instead, it would only run fastcheck CI which starts running only a small and essential subset of CI tests to quickly catch errors. You can run other CI tests on top of those by going to your fastcheck build on Buildkite UI (linked in the PR checks section) and unblock them. If you do not have permission to unblock, ping simon-mo or khluu to add you in our Buildkite org.

Once the PR is approved and ready to go, your PR reviewer(s) can run CI to test the changes comprehensively before merging.

To run CI, PR reviewers can either: Add ready label to the PR or enable auto-merge.

🚀

gemini-code-assist

Hello @mgoin, I'm Gemini Code Assist¹! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

Summary of Changes

Hello team, Gemini here, providing a summary for this pull request authored by @mgoin.

This PR introduces a new script and an accompanying patch file specifically designed to build a vLLM wheel targeted for TPU devices. The primary goal is to automate the process of creating a vllm-tpu Python package with a user-specified version, distinct from the standard vLLM package.

The script tools/vllm-tpu/build.sh takes the desired version as an argument, dynamically modifies the patch file to inject this version, applies the patch to the main vLLM source code (specifically pyproject.toml and setup.py), builds the wheel using python -m build with the VLLM_TARGET_DEVICE=tpu environment variable, and finally cleans up by reverting the patch and removing temporary files. The patch itself renames the package to vllm-tpu and adjusts the versioning logic in setup.py to accept the version from the patch rather than deriving it automatically, and also removes the automatic '+tpu' suffix from the version string.

The author has included test output demonstrating a successful build of vllm_tpu-0.9.0-py3-none-any.whl and verification of its dependencies using pkginfo, confirming the inclusion of torch_xla and other relevant packages.

Highlights

TPU Wheel Build Script: Adds a new bash script (tools/vllm-tpu/build.sh) to automate the process of building a vLLM wheel specifically for TPU environments.
Version Overriding: The build script allows specifying the desired version for the vllm-tpu wheel via a command-line argument, which is then injected into the build configuration using a temporary patch.
Package Renaming and Versioning Patch: Includes a patch file (tools/vllm-tpu/vllm-tpu.patch) that modifies pyproject.toml to rename the package to vllm-tpu and adjusts setup.py to use a placeholder version string and prevent the automatic '+tpu' suffix.

Changelog

tools/vllm-tpu/build.sh
- Added a new executable bash script to build the vllm-tpu wheel.
- Implements logic to take a version argument, modify a patch file with the version, apply the patch, run python -m build for the TPU target, and clean up.
- Includes checks for script execution context, required arguments, patch file existence, and patch applicability.
- Uses trap for robust cleanup even if the build fails.
tools/vllm-tpu/vllm-tpu.patch
- Added a new patch file used by build.sh.
- Modifies pyproject.toml to change the project name from vllm to vllm-tpu.
- Modifies setup.py to replace the dynamic version retrieval with a "PUT_VERSION_HERE" placeholder.
- Removes the logic in setup.py that appends +tpu to the version string for TPU builds.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.

Feature	Command	Description
Code Review	`/gemini review`	Performs a code review for the current pull request in its current state.
Pull Request Summary	`/gemini summary`	Provides a summary of the current pull request in its current state.
Comment	@gemini-code-assist	Responds in comments when explicitly tagged, both in issue comments and review comments.
Help	`/gemini help`	Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist is currently in preview and may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments to provide feedback.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

A script to build with care,
A patch applied right there.
TPU wheel takes flight,
Dependencies just right,
Ready for the work to share.

Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩

gemini-code-assist

Code Review

This pull request introduces a bash script to build a vllm-tpu wheel, which is a valuable addition for TPU users. The script handles version overriding by patching setup.py and pyproject.toml temporarily. The overall approach is sound, with good use of temporary files, cleanup traps, and pre-checks before applying patches and building.

I've identified a couple of high-severity issues related to how the user-provided version string is handled in echo and sed commands, which could lead to command injection or incorrect patch application if the version string contains special characters. Addressing these will significantly improve the script's robustness and security.

Aside from these, the script is well-structured and includes helpful error messages.

Summary of Findings

Command Injection Vulnerability in echo: User-provided $USER_VERSION is used unquoted in echo statements (lines 42, 55), potentially leading to command injection if the version string contains shell metacharacters. This was commented on with high severity.
Incorrect sed Replacement: User-provided $USER_VERSION is not escaped for the sed replacement string (line 46), which can cause sed to misinterpret special characters like & and \, leading to a malformed patch. This was commented on with high severity.
Missing Newline at EOF: The script tools/vllm-tpu/build.sh is missing a newline character at the end of the file. This is a minor stylistic issue and was not commented on due to review settings (severity: low).
Redundant File Check in Cleanup: In tools/vllm-tpu/build.sh (line 64), the check if [ -f "$PATCH_FILE_TEMP" ] before rm -f "$PATCH_FILE_TEMP" is slightly redundant because rm -f handles non-existent files gracefully. This is a minor point and was not commented on due to review settings (severity: low).

Merge Readiness

The pull request is a good step towards providing TPU-specific builds. However, there are critical security and correctness concerns regarding the handling of the USER_VERSION variable in the build.sh script. These issues, flagged with high severity, should be addressed before this PR is merged to ensure the script is robust and safe to use. Once these are resolved, the PR should be in good shape. As a reviewer, I am not authorized to approve pull requests; please ensure further review and approval from authorized maintainers after addressing the feedback.

tools/vllm-tpu/build.sh

QiliangCui · 2025-06-06T16:30:40Z

Thank you Michael for the fix!

A file naming thing: since the code is already under tools/vllm-tpu folder, is it better to rename vllm-tpu.patch to build.patch indicating the patch is for build only?

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin · 2025-07-01T22:08:51Z

I've simplified the implementation by adding the VLLM_VERSION_OVERRIDE env var and inlining the vllm-tpu name change, PTAL

yaochengji

LGTM, thanks for adding this, Michael!

I tested it locally and it can work!

pip install dist/vllm_tpu-0.9.2.dev360+g104a8a327.d20250710.tpu.tpu-py3-none-any.whl  --no-deps
python examples/offline_inference/tpu.py

simon-mo

This is a good first step!

jcyang43 · 2025-09-17T16:30:26Z

Hi @mgoin, thanks again for the PR!

For places that are getting metadata from vllm (ex: importlib.metadata.metadata("vllm") here or importlib.metadata.version('vllm') here), it looks like we also need to change them from "vllm" to "vllm-tpu". Or else we'll get errors like the following

Traceback (most recent call last):
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 397, in from_name
    return next(cls.discover(name=name))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
StopIteration

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/bin/vllm", line 7, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/site-packages/vllm/entrypoints/cli/main.py", line 39, in main
    version=importlib.metadata.version('vllm'),
            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 889, in version
    return distribution(distribution_name).version
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 862, in distribution
    return Distribution.from_name(distribution_name)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/johnnyyang_google_com/miniconda3/envs/prod/lib/python3.12/importlib/metadata/__init__.py", line 399, in from_name
    raise PackageNotFoundError(name)
importlib.metadata.PackageNotFoundError: No package metadata was found for vllm

mgoin · 2025-09-26T18:59:12Z

@jcyang43 would you be able to let me know a patch to fix this? I can merge this PR first and you can open the changes too

jcyang43 · 2025-10-02T22:32:37Z

@jcyang43 would you be able to let me know a patch to fix this? I can merge this PR first and you can open the changes too

Hi Michael, you can use this patch on your current version. It changes 'vllm' to 'vllm-tpu' for those metadata usages and then restore them at the end

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: Dhruvil Bhatt <bhattdbh@amazon.com>

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: bbartels <benjamin@bartels.dev>

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: xuebwang-amd <xuebwang@amd.com>

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 0xrushi <6279035+0xrushi@users.noreply.github.com>

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>

Add script to build vllm-tpu wheel

7ea544e

Signed-off-by: mgoin <michael@neuralmagic.com>

gemini-code-assist bot reviewed Jun 4, 2025

View reviewed changes

gemini-code-assist bot suggested changes Jun 4, 2025

View reviewed changes

tools/vllm-tpu/build.sh Outdated Show resolved Hide resolved

tools/vllm-tpu/build.sh Outdated Show resolved Hide resolved

tools/vllm-tpu/build.sh Outdated Show resolved Hide resolved

mgoin changed the title ~~Add script to build vllm-tpu wheel~~ Add tool to build vllm-tpu wheel Jun 4, 2025

Simplify to use env var and inline patch

4573ab2

Signed-off-by: mgoin <mgoin64@gmail.com>

mgoin changed the title ~~Add tool to build vllm-tpu wheel~~ [CI/Build] Add tool to build vllm-tpu wheel Jul 1, 2025

mgoin marked this pull request as ready for review July 1, 2025 22:07

mergify bot added the ci/build label Jul 1, 2025

Merge branch 'main' into build-vllm-tpu-wheel

104a8a3

yaochengji approved these changes Jul 10, 2025

View reviewed changes

simon-mo approved these changes Jul 10, 2025

View reviewed changes

Merge branch 'main' into build-vllm-tpu-wheel

b8ae04e

mgoin added ready ONLY add when PR is ready to merge/full CI is needed tpu Related to Google TPUs labels Jul 23, 2025

mergify bot removed the tpu Related to Google TPUs label Jul 23, 2025

Merge branch 'main' into build-vllm-tpu-wheel

19834c4

mgoin merged commit 7ef6052 into vllm-project:main Oct 12, 2025
70 checks passed

1994 pushed a commit to 1994/vllm that referenced this pull request Oct 14, 2025

[CI/Build] Add tool to build vllm-tpu wheel (vllm-project#19165)

0a26498

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com> Signed-off-by: 1994 <1994@users.noreply.github.com>

lywa1998 pushed a commit to lywa1998/vllm that referenced this pull request Oct 20, 2025

[CI/Build] Add tool to build vllm-tpu wheel (vllm-project#19165)

9d8062b

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>

jcyang43 mentioned this pull request Oct 21, 2025

[TPU] patch TPU wheel build script to resolve metadata issue #27279

Merged

5 tasks

alhridoy pushed a commit to alhridoy/vllm that referenced this pull request Oct 24, 2025

[CI/Build] Add tool to build vllm-tpu wheel (vllm-project#19165)

70f8f98

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>

rtourgeman pushed a commit to rtourgeman/vllm that referenced this pull request Nov 10, 2025

[CI/Build] Add tool to build vllm-tpu wheel (vllm-project#19165)

7baeef5

Signed-off-by: mgoin <michael@neuralmagic.com> Signed-off-by: mgoin <mgoin64@gmail.com>

Uh oh!

[CI/Build] Add tool to build vllm-tpu wheel #19165

[CI/Build] Add tool to build vllm-tpu wheel #19165

Uh oh!

Conversation

mgoin commented Jun 4, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Test

Uh oh!

github-actions bot commented Jun 4, 2025

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Summary of Changes

Highlights

Changelog

Footnotes

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Summary of Findings

Merge Readiness

Uh oh!

Uh oh!

Uh oh!

Uh oh!

QiliangCui commented Jun 6, 2025

Uh oh!

mgoin commented Jul 1, 2025

Uh oh!

yaochengji left a comment

Choose a reason for hiding this comment

Uh oh!

simon-mo left a comment

Choose a reason for hiding this comment

Uh oh!

jcyang43 commented Sep 17, 2025

Uh oh!

mgoin commented Sep 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

jcyang43 commented Oct 2, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mgoin commented Jun 4, 2025 •

edited by github-actions bot

Loading

mgoin commented Sep 26, 2025 •

edited

Loading