Skip to content

Conversation

@krishung5
Copy link
Contributor

@krishung5 krishung5 commented Aug 13, 2025

Overview:

  • Pin the versions for torch, torchaudio and torchvision to align with what x86 uses. The previous binaries were removed from the index.
  • Skip LMCache installation for the ARM build for now to avoid the error
OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.

We should add it back once the functionality is also tested on ARM.

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • Bug Fixes

    • Improved arm64 reliability by skipping LMCache where CUDA-related issues occur.
    • Replaced arm64 nightly PyTorch packages with stable releases to prevent installation breakages.
  • Chores

    • Made LMCache installation architecture-aware, keeping it enabled on amd64.
    • Standardized dependency source to the official PyTorch index on arm64.
    • Pinned stable versions on arm64 for consistency (torch 2.7.1+cu128, torchvision 0.22.1, torchaudio 2.7.1).

@krishung5
Copy link
Contributor Author

/ok to test 9e888dd

@krishung5
Copy link
Contributor Author

/ok to test adfdc46

@krishung5
Copy link
Contributor Author

/ok to test db11f0d

@krishung5
Copy link
Contributor Author

/ok to test e80197d

@krishung5 krishung5 changed the title build: Pin torch versions for arm build: Pin torch versions and skip lmcache installation for ARM build Aug 13, 2025
@krishung5
Copy link
Contributor Author

/ok to test 0b1951d

@krishung5 krishung5 marked this pull request as ready for review August 14, 2025 01:45
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 14, 2025

Walkthrough

The install_vllm.sh script now gates LMCache installation by architecture (enabled on amd64, skipped on arm64) and updates arm64 PyTorch dependencies to stable pinned wheels from the standard index, removing nightly channel usage.

Changes

Cohort / File(s) Summary of Changes
vLLM install script
container/deps/vllm/install_vllm.sh
- Add ARCH check: install LMCache only on amd64; skip on arm64 with TODO note
- Replace arm64 nightly PyTorch/torchaudio/torchvision pins with stable versions from the standard wheel index

Sequence Diagram(s)

sequenceDiagram
    participant User as build script
    participant Install as install_vllm.sh
    participant Arch as Arch Detector
    participant LM as LMCache Installer
    participant PT as PyTorch Wheels

    User->>Install: Run install_vllm.sh
    Install->>Arch: Detect ARCH
    alt ARCH == amd64
        Install->>LM: Install LMCache
    else ARCH == arm64
        Install--xLM: Skip LMCache
    end
    Install->>PT: Install PyTorch/Audio/Vision
    note over PT: arm64 uses stable pins from standard index
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~8 minutes

Possibly related PRs

Poem

A whisk of bash, a hop of care,
AMD64 gets caches fair;
ARM64, we pause today,
Stable wheels to light the way.
I twitch my nose, review is slight—
Dependencies pinned, all snug and right. 🐰

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🔭 Outside diff range comments (1)
container/deps/vllm/install_vllm.sh (1)

133-139: Fix messaging and ensure PyTorch wheels resolve correctly for cu128 on arm64.

  • Message says “nightly” but you’re installing stable pins. Please correct the log.
  • Use the cu128-specific index to reliably resolve “+cu128” wheels; the root index can lead to resolution surprises. Keep immediate exit on failure as per our ARM64 learning.

Apply this diff:

-    # Try to install specific PyTorch version first, fallback to latest nightly
-    echo "Attempting to install pinned PyTorch nightly versions..."
-    if ! uv pip install torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl; then
+    # Install pinned stable PyTorch wheels for cu128 on arm64; fail fast if unavailable.
+    echo "Installing pinned stable PyTorch wheels (cu128) for arm64..."
+    if ! uv pip install \
+        --index-url https://download.pytorch.org/whl/cu128 \
+        --extra-index-url https://pypi.org/simple \
+        torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1; then
         echo "Pinned versions failed"
         exit 1
-        # uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
     fi
♻️ Duplicate comments (1)
container/deps/vllm/install_vllm.sh (1)

181-188: Pin flashinfer on arm64 for reproducibility.

On arm64 you install the latest “flashinfer-python” from PyPI without a version pin, while amd64 is pinned via a git ref. This can introduce non-determinism and break builds unexpectedly. Prior feedback also asked to freeze versions.

Apply this diff to allow optional pinning without changing current behavior:

-if [ "$ARCH" = "arm64" ]; then
-    uv pip install flashinfer-python
+if [ "$ARCH" = "arm64" ]; then
+    # Allow optional pin: set FLASHINFER_PYPI_SPEC="flashinfer-python==<version>" upstream to freeze.
+    uv pip install "${FLASHINFER_PYPI_SPEC:-flashinfer-python}"
 else
🧹 Nitpick comments (3)
container/deps/vllm/install_vllm.sh (3)

115-121: Good arch-gating for LMCache; tighten condition and fix misleading comment.

  • The rationale mentions CUDA_HOME not set, but CUDA_HOME is exported at Line 103. The real issue is lack of a usable CUDA toolchain on arm64 images. Update the comment to avoid confusion.
  • Optionally guard the amd64 install with a CUDA presence check to avoid failures on CUDA-less x86 images.

Apply this diff:

-if [ "$ARCH" = "amd64" ]; then
-    # Build issues with LMCache installation on arm64:
-    # OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
-    # TODO: Add it back once we have a working arm64 build.
-    # Install LMCache
-    uv pip install lmcache==0.3.3
-fi
+if [ "$ARCH" = "amd64" ]; then
+    # Skip LMCache on arm64 due to unresolved build/runtime issues.
+    # Re-enable after validating ARM support.
+    # Install LMCache only when CUDA is present.
+    if [ -n "${CUDA_HOME:-}" ] && [ -d "${CUDA_HOME}" ]; then
+        uv pip install lmcache==0.3.3
+    else
+        echo "Skipping LMCache install: CUDA_HOME directory not found at ${CUDA_HOME:-unset}"
+    fi
+fi

86-93: Help text defaults don’t match actual defaults.

  • VLLM_REF default in code is “ba81acbd…”, not the value shown.
  • INSTALLATION_DIR default is “/tmp”, not “/tmp/vllm”.
  • DEEPGEMM_REF default is “03d0be3”, not “1876566”.

Apply this diff:

-            echo "  --vllm-ref REF    Git reference to checkout (default: f4135232b9a8c4845f8961fb1cd17581c56ae2ce)"
+            echo "  --vllm-ref REF    Git reference to checkout (default: ba81acbdc1eec643ba815a76628ae3e4b2263b76)"
             echo "  --max-jobs NUM    Maximum number of parallel jobs (default: 16)"
             echo "  --arch ARCH       Architecture (amd64|arm64, default: auto-detect)"
-            echo "  --installation-dir DIR  Directory to install vllm (default: /tmp/vllm)"
-            echo "  --deepgemm-ref REF  Git reference for DeepGEMM (default: 1876566)"
+            echo "  --installation-dir DIR  Directory to clone/build vllm (default: /tmp)"
+            echo "  --deepgemm-ref REF  Git reference for DeepGEMM (default: 03d0be3)"
             echo "  --flashinf-ref REF  Git reference for Flash Infer (default: v0.2.8rc1)"
             echo "  --torch-backend BACKEND  Torch backend to use (default: cu128)"

30-30: Consider defaulting torch backend to auto.

Given vLLM’s installer handles “--torch-backend=auto”, defaulting to auto reduces maintenance when CUDA versions change across images. You can still override via flag.

Apply this diff:

-TORCH_BACKEND="cu128"
+TORCH_BACKEND="auto"

Note: This doesn’t affect users who pass “--torch-backend”.

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cebe921 and 0b1951d.

📒 Files selected for processing (1)
  • container/deps/vllm/install_vllm.sh (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
📚 Learning: 2025-07-21T00:10:56.947Z
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.

Applied to files:

  • container/deps/vllm/install_vllm.sh
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.

Applied to files:

  • container/deps/vllm/install_vllm.sh

@zaristei
Copy link
Contributor

Manually building ARM on this branch to ensure no further issues.

@krishung5 krishung5 merged commit 8f24c02 into main Aug 14, 2025
9 of 10 checks passed
@krishung5 krishung5 deleted the krish/fix-torch-version-for-arm-build branch August 14, 2025 15:56
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants