build: Pin torch versions and skip lmcache installation for ARM build #2430

krishung5 · 2025-08-13T19:16:24Z

Overview:

Pin the versions for torch, torchaudio and torchvision to align with what x86 uses. The previous binaries were removed from the index.
Skip LMCache installation for the ARM build for now to avoid the error

OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

We should add it back once the functionality is also tested on ARM.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

closes GitHub issue: #xxx

Summary by CodeRabbit

Bug Fixes
- Improved arm64 reliability by skipping LMCache where CUDA-related issues occur.
- Replaced arm64 nightly PyTorch packages with stable releases to prevent installation breakages.
Chores
- Made LMCache installation architecture-aware, keeping it enabled on amd64.
- Standardized dependency source to the official PyTorch index on arm64.
- Pinned stable versions on arm64 for consistency (torch 2.7.1+cu128, torchvision 0.22.1, torchaudio 2.7.1).

krishung5 · 2025-08-13T19:16:39Z

/ok to test 9e888dd

…r-arm-build

krishung5 · 2025-08-13T19:51:05Z

/ok to test adfdc46

krishung5 · 2025-08-13T21:36:38Z

/ok to test db11f0d

…r-arm-build

krishung5 · 2025-08-13T22:57:56Z

/ok to test e80197d

container/deps/vllm/install_vllm.sh

krishung5 · 2025-08-13T23:39:38Z

/ok to test 0b1951d

coderabbitai · 2025-08-14T01:49:01Z

Walkthrough

The install_vllm.sh script now gates LMCache installation by architecture (enabled on amd64, skipped on arm64) and updates arm64 PyTorch dependencies to stable pinned wheels from the standard index, removing nightly channel usage.

Changes

Cohort / File(s)	Summary of Changes
vLLM install script `container/deps/vllm/install_vllm.sh`	- Add ARCH check: install LMCache only on amd64; skip on arm64 with TODO note - Replace arm64 nightly PyTorch/torchaudio/torchvision pins with stable versions from the standard wheel index

Sequence Diagram(s)

sequenceDiagram
    participant User as build script
    participant Install as install_vllm.sh
    participant Arch as Arch Detector
    participant LM as LMCache Installer
    participant PT as PyTorch Wheels

    User->>Install: Run install_vllm.sh
    Install->>Arch: Detect ARCH
    alt ARCH == amd64
        Install->>LM: Install LMCache
    else ARCH == arm64
        Install--xLM: Skip LMCache
    end
    Install->>PT: Install PyTorch/Audio/Vision
    note over PT: arm64 uses stable pins from standard index

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

chore: Install vLLM and WideEP kernels in vLLM runtime container #2010: Modifies the same install_vllm.sh with architecture gating for LMCache and updated arm64 PyTorch pins.
fix: Working Arm Build Dockerfile for Vllm_v1 #1844: Introduces arm64-specific vLLM install logic and wheel sourcing, overlapping with this PR’s arm64 PyTorch strategy.

Poem

A whisk of bash, a hop of care,
AMD64 gets caches fair;
ARM64, we pause today,
Stable wheels to light the way.
I twitch my nose, review is slight—
Dependencies pinned, all snug and right. 🐰

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 0

🔭 Outside diff range comments (1)

container/deps/vllm/install_vllm.sh (1)

133-139: Fix messaging and ensure PyTorch wheels resolve correctly for cu128 on arm64.

Message says “nightly” but you’re installing stable pins. Please correct the log.
Use the cu128-specific index to reliably resolve “+cu128” wheels; the root index can lead to resolution surprises. Keep immediate exit on failure as per our ARM64 learning.

Apply this diff:

-    # Try to install specific PyTorch version first, fallback to latest nightly
-    echo "Attempting to install pinned PyTorch nightly versions..."
-    if ! uv pip install torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl; then
+    # Install pinned stable PyTorch wheels for cu128 on arm64; fail fast if unavailable.
+    echo "Installing pinned stable PyTorch wheels (cu128) for arm64..."
+    if ! uv pip install \
+        --index-url https://download.pytorch.org/whl/cu128 \
+        --extra-index-url https://pypi.org/simple \
+        torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1; then
         echo "Pinned versions failed"
         exit 1
-        # uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
     fi

♻️ Duplicate comments (1)

container/deps/vllm/install_vllm.sh (1)
181-188: Pin flashinfer on arm64 for reproducibility.

On arm64 you install the latest “flashinfer-python” from PyPI without a version pin, while amd64 is pinned via a git ref. This can introduce non-determinism and break builds unexpectedly. Prior feedback also asked to freeze versions.

Apply this diff to allow optional pinning without changing current behavior:
-if [ "$ARCH" = "arm64" ]; then
-    uv pip install flashinfer-python
+if [ "$ARCH" = "arm64" ]; then
+    # Allow optional pin: set FLASHINFER_PYPI_SPEC="flashinfer-python==<version>" upstream to freeze.
+    uv pip install "${FLASHINFER_PYPI_SPEC:-flashinfer-python}"
 else

🧹 Nitpick comments (3)

container/deps/vllm/install_vllm.sh (3)
115-121: Good arch-gating for LMCache; tighten condition and fix misleading comment.

The rationale mentions CUDA_HOME not set, but CUDA_HOME is exported at Line 103. The real issue is lack of a usable CUDA toolchain on arm64 images. Update the comment to avoid confusion.

Optionally guard the amd64 install with a CUDA presence check to avoid failures on CUDA-less x86 images.

Apply this diff:
-if [ "$ARCH" = "amd64" ]; then
-    # Build issues with LMCache installation on arm64:
-    # OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
-    # TODO: Add it back once we have a working arm64 build.
-    # Install LMCache
-    uv pip install lmcache==0.3.3
-fi
+if [ "$ARCH" = "amd64" ]; then
+    # Skip LMCache on arm64 due to unresolved build/runtime issues.
+    # Re-enable after validating ARM support.
+    # Install LMCache only when CUDA is present.
+    if [ -n "${CUDA_HOME:-}" ] && [ -d "${CUDA_HOME}" ]; then
+        uv pip install lmcache==0.3.3
+    else
+        echo "Skipping LMCache install: CUDA_HOME directory not found at ${CUDA_HOME:-unset}"
+    fi
+fi
86-93: Help text defaults don’t match actual defaults.

VLLM_REF default in code is “ba81acbd…”, not the value shown.

INSTALLATION_DIR default is “/tmp”, not “/tmp/vllm”.

DEEPGEMM_REF default is “03d0be3”, not “1876566”.

Apply this diff:
-            echo "  --vllm-ref REF    Git reference to checkout (default: f4135232b9a8c4845f8961fb1cd17581c56ae2ce)"
+            echo "  --vllm-ref REF    Git reference to checkout (default: ba81acbdc1eec643ba815a76628ae3e4b2263b76)"
             echo "  --max-jobs NUM    Maximum number of parallel jobs (default: 16)"
             echo "  --arch ARCH       Architecture (amd64|arm64, default: auto-detect)"
-            echo "  --installation-dir DIR  Directory to install vllm (default: /tmp/vllm)"
-            echo "  --deepgemm-ref REF  Git reference for DeepGEMM (default: 1876566)"
+            echo "  --installation-dir DIR  Directory to clone/build vllm (default: /tmp)"
+            echo "  --deepgemm-ref REF  Git reference for DeepGEMM (default: 03d0be3)"
             echo "  --flashinf-ref REF  Git reference for Flash Infer (default: v0.2.8rc1)"
             echo "  --torch-backend BACKEND  Torch backend to use (default: cu128)"
30-30: Consider defaulting torch backend to auto.

Given vLLM’s installer handles “--torch-backend=auto”, defaulting to auto reduces maintenance when CUDA versions change across images. You can still override via flag.

Apply this diff:
-TORCH_BACKEND="cu128"
+TORCH_BACKEND="auto"
Note: This doesn’t affect users who pass “--torch-backend”.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cebe921 and 0b1951d.

📒 Files selected for processing (1)

container/deps/vllm/install_vllm.sh (2 hunks)

🧰 Additional context used

🧠 Learnings (3)

📓 Common learnings

Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.

Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

📚 Learning: 2025-07-21T00:10:56.947Z

Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.

Applied to files:

container/deps/vllm/install_vllm.sh

📚 Learning: 2025-07-22T10:22:28.972Z

Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

container/deps/vllm/install_vllm.sh

zaristei · 2025-08-14T07:58:20Z

Manually building ARM on this branch to ensure no further issues.

…#2430) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Pin torch versions for arm

9e888dd

pull-request-size bot added the size/XS label Aug 13, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 19:16 Inactive

github-actions bot added the build label Aug 13, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 19:17 Inactive

Fix lmcache installation for arm

f030a9a

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 19:50 Inactive

Merge remote-tracking branch 'origin' into krish/fix-torch-version-fo…

adfdc46

…r-arm-build

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 19:50 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 19:55 Inactive

Install lmcache after torch installation

db11f0d

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 21:36 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 21:40 Inactive

krishung5 added 2 commits August 13, 2025 15:57

Skip lmcache installation for arm

0379944

Merge remote-tracking branch 'origin' into krish/fix-torch-version-fo…

e80197d

…r-arm-build

pull-request-size bot added size/S and removed size/XS labels Aug 13, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 22:57 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 22:58 Inactive

krishung5 changed the title ~~build: Pin torch versions for arm~~ build: Pin torch versions and skip lmcache installation for ARM build Aug 13, 2025

biswapanda reviewed Aug 13, 2025

View reviewed changes

container/deps/vllm/install_vllm.sh Outdated Show resolved Hide resolved

Pin the version for lmcache

0b1951d

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 23:39 Inactive

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 23:40 Inactive

krishung5 marked this pull request as ready for review August 14, 2025 01:45

krishung5 requested review from a team, alec-flowers, ishandhanani, nnshah1, ptarasiewiczNV, richardhuo-nv, rmccorm4 and tanmayv25 as code owners August 14, 2025 01:46

coderabbitai bot reviewed Aug 14, 2025

View reviewed changes

Update comment

567d26c

copy-pr-bot bot temporarily deployed to GITLAB August 14, 2025 06:24 Inactive

krishung5 requested a review from biswapanda August 14, 2025 06:25

copy-pr-bot bot temporarily deployed to GITLAB August 14, 2025 06:25 Inactive

alec-flowers approved these changes Aug 14, 2025

View reviewed changes

zaristei approved these changes Aug 14, 2025

View reviewed changes

krishung5 merged commit 8f24c02 into main Aug 14, 2025
9 of 10 checks passed

krishung5 deleted the krish/fix-torch-version-for-arm-build branch August 14, 2025 15:56

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

build: Pin torch versions and skip lmcache installation for ARM build (…

87da1fc

…#2430) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

build: Pin torch versions and skip lmcache installation for ARM build #2430

build: Pin torch versions and skip lmcache installation for ARM build #2430

Uh oh!

krishung5 commented Aug 13, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

coderabbitai bot commented Aug 14, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Uh oh!

zaristei commented Aug 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

build: Pin torch versions and skip lmcache installation for ARM build #2430

build: Pin torch versions and skip lmcache installation for ARM build #2430

Uh oh!

Conversation

krishung5 commented Aug 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

Uh oh!

krishung5 commented Aug 13, 2025

Uh oh!

coderabbitai bot commented Aug 14, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

zaristei commented Aug 14, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

krishung5 commented Aug 13, 2025 •

edited by coderabbitai bot

Loading