Skip to content

Conversation

@nv-anants
Copy link
Contributor

@nv-anants nv-anants commented Sep 22, 2025

Overview:

Failure - https://github.com/ai-dynamo/dynamo/actions/runs/17907241473/job/50910654042?pr=2969#step:8:280

seems like an issue in vLLM repo scripts dynamo uses this script to install ep-kernels - https://github.com/vllm-project/vllm/blob/v0.10.1.1/tools/ep_kernels/install_python_libraries.sh#L100
and they are not pinning pplx-kernels repo to any commit or tag, so likely this change in upstream repo is causing failures - perplexityai/pplx-kernels@2bd6e30

vLLM fixed pinning issue in later versions but not in the one dynamo uses - vllm-project/vllm@906e461

closes: OPS-1190

Signed-off-by: Anant Sharma <anants@nvidia.com>
@github-actions github-actions bot added the fix label Sep 22, 2025
@nv-anants nv-anants marked this pull request as ready for review September 22, 2025 19:36
@nv-anants nv-anants requested review from a team as code owners September 22, 2025 19:36
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 22, 2025

Walkthrough

Adds a nvshmem cherry-pick step to the VLLM install script. Introduces early VLLM_REF variable definition. Configures git committer identity, checks out the specified VLLM ref, cherry-picks commit 906e461ed6ddccd3cc7b68fa72048d2d3fcbd72c, and notes TODO to remove this when upgrading VLLM.

Changes

Cohort / File(s) Summary
VLLM install script updates
container/deps/vllm/install_vllm.sh
Define VLLM_REF earlier; add configured git committer; checkout VLLM ref; cherry-pick nvshmem-related commit 906e461...bd72c; add comments/TODO regarding temporary fix and removal on next VLLM upgrade.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    participant S as Install Script
    participant G as Git
    participant R as VLLM Repo
    participant B as Build/Install

    S->>S: Define VLLM_REF
    S->>G: Configure committer name/email
    S->>G: Clone/Fetch VLLM
    S->>G: Checkout VLLM_REF
    Note over S,G: Temporary step for nvshmem fix
    S->>G: Cherry-pick commit 906e461…bd72c
    alt Cherry-pick succeeds
        S->>B: Proceed with build/install
        B-->>S: Installation complete
    else Cherry-pick fails
        S-->>S: Exit with error (manual resolution)
    end
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Poem

A rabbit taps the shell with cheer,
“Cherry-pick this fix right here!”
VLLM set, the ref in line,
nvshmem patched—now all is fine.
I hop away, my job complete,
scripts compiled, installs neat. 🐇✨

Pre-merge checks

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The PR title claims to add a deepep and pplx pinning commit in the vllm checkout, but the changeset (container/deps/vllm/install_vllm.sh) and provided summary show a cherry-pick of an nvshmem-related commit and a VLLM_REF variable change with no evidence of deepep or pplx changes, so the title is misleading and does not reflect the actual primary change. Update the title to accurately and concisely reflect the main change (for example: "fix: cherry-pick nvshmem commit into vllm checkout (temporary)" and optionally include the commit hash), and remove references to deepep/pplx unless those packages are actually modified in this PR.
Description Check ⚠️ Warning The PR includes a clear Overview describing the CI failure and links to upstream commits, but it does not follow the repository's required description template: the "Details", "Where should the reviewer start?" and "Related Issues" sections are missing or left as placeholders and the description does not explicitly document the specific code changes made in this PR (files, commits, or lines to review). Because these required template sections and explicit change/testing details are absent, the pull request description does not meet the repository's standards. The Overview is useful but insufficient on its own to guide a reviewer or record the change for future reference. Please update the PR to use the repository template: add a "Details" section that lists the exact changes (for example, the nvshmem cherry-pick and VLLM_REF change in container/deps/vllm/install_vllm.sh and the cherry-picked commit SHA), a "Where should the reviewer start?" section calling out files and lines to inspect, and a properly formatted "Related Issues" entry (e.g., closes # or link to OPS-1190). Also include testing performed, CI results, and any notes about the temporary nature and removal timeline for the cherry-pick; once those sections are added the description can be re-checked.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (2)
container/deps/vllm/install_vllm.sh (2)

23-23: Fix misleading TODO label (“nvshmem”)

This cherry-pick is about pinning pplx (and deepep per PR title), not nvshmem. Update the comment to avoid confusion later.

Apply:

-# REMOVE nvshmem cherry-pick when moving to next version of vllm
+# REMOVE pplx pin cherry-pick when moving to next version of vllm

132-136: Make the cherry-pick idempotent and safer

Guard the cherry-pick so repeated builds don't fail if the commit is already present; set committer via git -c and keep -x for traceability. Verified: commit 906e461ed6ddccd3cc7b68fa72048d2d3fcbd72c is absent from REF 1da94e673c257373280026f75ceb4effac80e892.

File: container/deps/vllm/install_vllm.sh (lines 132-136)

-# nvshmem fix - cherry-pick commit pinning pplx version
-# https://github.com/ai-dynamo/dynamo/actions/runs/17907241473/job/50910654042?pr=2969#step:8:280
-# remove when moving to next version of vllm
-# Configure git user for cherry-pick operation
-GIT_COMMITTER_NAME="Container Build" GIT_COMMITTER_EMAIL="container@buildkitsandbox.local" git cherry-pick 906e461ed6ddccd3cc7b68fa72048d2d3fcbd72c
+# pplx pin: cherry-pick commit that pins pplx; remove on next vLLM bump
+# https://github.com/ai-dynamo/dynamo/actions/runs/17907241473/job/50910654042?pr=2969#step:8:280
+PPLX_PIN_COMMIT="906e461ed6ddccd3cc7b68fa72048d2d3fcbd72c"
+# Skip if already included in the checked-out ref
+if ! git merge-base --is-ancestor "${PPLX_PIN_COMMIT}" HEAD; then
+  git -c user.name="Container Build" -c user.email="container@buildkitsandbox.local" \
+      cherry-pick -x "${PPLX_PIN_COMMIT}"
+else
+  echo "pplx pin already present in ${VLLM_REF}; skipping cherry-pick"
+fi
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 31c78df and 7ea7927.

📒 Files selected for processing (1)
  • container/deps/vllm/install_vllm.sh (2 hunks)
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-08-18T16:52:15.659Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable, when exported, automatically triggers vLLM's build system to use the precompiled wheel instead of building from source, even when using standard `uv pip install .` commands in container/deps/vllm/install_vllm.sh.

Applied to files:

  • container/deps/vllm/install_vllm.sh
📚 Learning: 2025-08-18T16:52:15.659Z
Learnt from: nnshah1
PR: ai-dynamo/dynamo#2489
File: container/deps/vllm/install_vllm.sh:151-152
Timestamp: 2025-08-18T16:52:15.659Z
Learning: The VLLM_PRECOMPILED_WHEEL_LOCATION environment variable is an official vLLM environment variable that, when exported, automatically triggers vLLM's build system to use the specified precompiled wheel instead of building from source. This works even with standard `uv pip install .` commands without requiring explicit reference to the variable in the install command. The vLLM build system internally detects and uses this environment variable.

Applied to files:

  • container/deps/vllm/install_vllm.sh
📚 Learning: 2025-07-21T00:10:56.947Z
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.

Applied to files:

  • container/deps/vllm/install_vllm.sh

Copy link

@dmitrytokarev dmitrytokarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

@nv-anants nv-anants enabled auto-merge (squash) September 22, 2025 19:49
@nv-anants nv-anants merged commit 396755e into main Sep 22, 2025
14 of 15 checks passed
@nv-anants nv-anants deleted the anants/fix-vllm branch September 22, 2025 19:54
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
nv-anants added a commit that referenced this pull request Sep 24, 2025
Signed-off-by: Anant Sharma <anants@nvidia.com>
jasonqinzhou pushed a commit that referenced this pull request Sep 24, 2025
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Jason Zhou <jasonzho@nvidia.com>
kylehh pushed a commit that referenced this pull request Sep 25, 2025
Signed-off-by: Anant Sharma <anants@nvidia.com>
Signed-off-by: Kyle H <kylhuang@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants