-
Notifications
You must be signed in to change notification settings - Fork 688
build: Pin torch versions and skip lmcache installation for ARM build #2430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
|
/ok to test 9e888dd |
|
/ok to test adfdc46 |
|
/ok to test db11f0d |
|
/ok to test e80197d |
|
/ok to test 0b1951d |
WalkthroughThe install_vllm.sh script now gates LMCache installation by architecture (enabled on amd64, skipped on arm64) and updates arm64 PyTorch dependencies to stable pinned wheels from the standard index, removing nightly channel usage. Changes
Sequence Diagram(s)sequenceDiagram
participant User as build script
participant Install as install_vllm.sh
participant Arch as Arch Detector
participant LM as LMCache Installer
participant PT as PyTorch Wheels
User->>Install: Run install_vllm.sh
Install->>Arch: Detect ARCH
alt ARCH == amd64
Install->>LM: Install LMCache
else ARCH == arm64
Install--xLM: Skip LMCache
end
Install->>PT: Install PyTorch/Audio/Vision
note over PT: arm64 uses stable pins from standard index
Estimated code review effort🎯 2 (Simple) | ⏱️ ~8 minutes Possibly related PRs
Poem
Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🔭 Outside diff range comments (1)
container/deps/vllm/install_vllm.sh (1)
133-139: Fix messaging and ensure PyTorch wheels resolve correctly for cu128 on arm64.
- Message says “nightly” but you’re installing stable pins. Please correct the log.
- Use the cu128-specific index to reliably resolve “+cu128” wheels; the root index can lead to resolution surprises. Keep immediate exit on failure as per our ARM64 learning.
Apply this diff:
- # Try to install specific PyTorch version first, fallback to latest nightly - echo "Attempting to install pinned PyTorch nightly versions..." - if ! uv pip install torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1 --index-url https://download.pytorch.org/whl; then + # Install pinned stable PyTorch wheels for cu128 on arm64; fail fast if unavailable. + echo "Installing pinned stable PyTorch wheels (cu128) for arm64..." + if ! uv pip install \ + --index-url https://download.pytorch.org/whl/cu128 \ + --extra-index-url https://pypi.org/simple \ + torch==2.7.1+cu128 torchaudio==2.7.1 torchvision==0.22.1; then echo "Pinned versions failed" exit 1 - # uv pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128 fi
♻️ Duplicate comments (1)
container/deps/vllm/install_vllm.sh (1)
181-188: Pin flashinfer on arm64 for reproducibility.On arm64 you install the latest “flashinfer-python” from PyPI without a version pin, while amd64 is pinned via a git ref. This can introduce non-determinism and break builds unexpectedly. Prior feedback also asked to freeze versions.
Apply this diff to allow optional pinning without changing current behavior:
-if [ "$ARCH" = "arm64" ]; then - uv pip install flashinfer-python +if [ "$ARCH" = "arm64" ]; then + # Allow optional pin: set FLASHINFER_PYPI_SPEC="flashinfer-python==<version>" upstream to freeze. + uv pip install "${FLASHINFER_PYPI_SPEC:-flashinfer-python}" else
🧹 Nitpick comments (3)
container/deps/vllm/install_vllm.sh (3)
115-121: Good arch-gating for LMCache; tighten condition and fix misleading comment.
- The rationale mentions CUDA_HOME not set, but CUDA_HOME is exported at Line 103. The real issue is lack of a usable CUDA toolchain on arm64 images. Update the comment to avoid confusion.
- Optionally guard the amd64 install with a CUDA presence check to avoid failures on CUDA-less x86 images.
Apply this diff:
-if [ "$ARCH" = "amd64" ]; then - # Build issues with LMCache installation on arm64: - # OSError: CUDA_HOME environment variable is not set. Please set it to your CUDA install root. - # TODO: Add it back once we have a working arm64 build. - # Install LMCache - uv pip install lmcache==0.3.3 -fi +if [ "$ARCH" = "amd64" ]; then + # Skip LMCache on arm64 due to unresolved build/runtime issues. + # Re-enable after validating ARM support. + # Install LMCache only when CUDA is present. + if [ -n "${CUDA_HOME:-}" ] && [ -d "${CUDA_HOME}" ]; then + uv pip install lmcache==0.3.3 + else + echo "Skipping LMCache install: CUDA_HOME directory not found at ${CUDA_HOME:-unset}" + fi +fi
86-93: Help text defaults don’t match actual defaults.
- VLLM_REF default in code is “ba81acbd…”, not the value shown.
- INSTALLATION_DIR default is “/tmp”, not “/tmp/vllm”.
- DEEPGEMM_REF default is “03d0be3”, not “1876566”.
Apply this diff:
- echo " --vllm-ref REF Git reference to checkout (default: f4135232b9a8c4845f8961fb1cd17581c56ae2ce)" + echo " --vllm-ref REF Git reference to checkout (default: ba81acbdc1eec643ba815a76628ae3e4b2263b76)" echo " --max-jobs NUM Maximum number of parallel jobs (default: 16)" echo " --arch ARCH Architecture (amd64|arm64, default: auto-detect)" - echo " --installation-dir DIR Directory to install vllm (default: /tmp/vllm)" - echo " --deepgemm-ref REF Git reference for DeepGEMM (default: 1876566)" + echo " --installation-dir DIR Directory to clone/build vllm (default: /tmp)" + echo " --deepgemm-ref REF Git reference for DeepGEMM (default: 03d0be3)" echo " --flashinf-ref REF Git reference for Flash Infer (default: v0.2.8rc1)" echo " --torch-backend BACKEND Torch backend to use (default: cu128)"
30-30: Consider defaulting torch backend to auto.Given vLLM’s installer handles “--torch-backend=auto”, defaulting to auto reduces maintenance when CUDA versions change across images. You can still override via flag.
Apply this diff:
-TORCH_BACKEND="cu128" +TORCH_BACKEND="auto"Note: This doesn’t affect users who pass “--torch-backend”.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (1)
container/deps/vllm/install_vllm.sh(2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📓 Common learnings
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
📚 Learning: 2025-07-21T00:10:56.947Z
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.
Applied to files:
container/deps/vllm/install_vllm.sh
📚 Learning: 2025-07-22T10:22:28.972Z
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
Applied to files:
container/deps/vllm/install_vllm.sh
|
Manually building ARM on this branch to ensure no further issues. |
…#2430) Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Overview:
We should add it back once the functionality is also tested on ARM.
Details:
Where should the reviewer start?
Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
Bug Fixes
Chores