-
Notifications
You must be signed in to change notification settings - Fork 680
chore: update nixl version to 0.4.1 #2221
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change updates the NIXL dependency version across several Dockerfiles, a build script, a Rust Cargo manifest, and the Python project configuration. All references to NIXL are moved from specific commit hashes or older versions to the new version tag 0.4.1, with some dependency constraints tightened and obsolete architecture-specific pinning logic removed. Changes
Sequence Diagram(s)sequenceDiagram
participant BuildScript
participant Dockerfile
participant NIXLRepo
BuildScript->>Dockerfile: Pass NIXL_REF=0.4.1 as build arg
Dockerfile->>NIXLRepo: Clone or install NIXL@0.4.1
Dockerfile-->>BuildScript: Build completes with NIXL 0.4.1
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🧹 Nitpick comments (4)
pyproject.toml (1)
67-72: Re-think the “≤0.4.1” upper-bound – consider an explicit compatible range insteadPinning with only an upper bound (
nixl<=0.4.1) allows any lower version (including 0.1.x, 0.2.x …) to slip in if they satisfy the resolver, while also blocking future micro-patches such as 0.4.2 that might carry critical bug fixes.
Typical patterns are either an exact pin (==0.4.1) for full reproducibility or a compatible-release spec (~=0.4.1or>=0.4.1,<0.5.0) when you want to consume patch updates but stay on the 0.4 line.- "nixl<=0.4.1", + # Allow any 0.4.x patch but stop before 0.5 breaking changes + "nixl>=0.4.1,<0.5.0",Same applies to the
sglangextras section a few lines below.
Double-check whether a strict freeze is required; if not, the compatible-release form minimizes surprise regressions while preserving ABI stability.Also applies to: 73-77
container/Dockerfile.vllm (1)
83-85: Tag vs commit hash – trade reproducibility for convenienceSwitching
NIXL_REFfrom a commit SHA to the annotated tag0.4.1is fine, but note that a force-pushed/mutable tag would silently change the build.
If strict reproducibility is a goal, prefer an immutable commit SHA plus a comment with the corresponding tag for readability:-ARG NIXL_REF=0.4.1 +ARG NIXL_REF=3c47a48955e6f96bd5d4fb43a9d80bb64722f8e4 # tag: 0.4.1(or keep the tag and add
git fetch --depth 1 --tags && git checkout --detach "$NIXL_REF"to ensure a detached state).
Not blocking, just something to keep in mind for release builds.container/Dockerfile.tensorrt_llm (1)
47-50: Same reproducibility caveat as other DockerfilesThe move to
ARG NIXL_REF=0.4.1shares the reproducibility concern mentioned in the vLLM image. Consider pinning by SHA or detaching after checkout to avoid future tag drift.container/Dockerfile.sglang-wideep (1)
74-76: Use a shallow clone to speed up image builds and improve reproducibilityThe full
git clonefetches the entire history; for release tags you can drastically cut time and bandwidth:-RUN git clone https://github.com/ai-dynamo/nixl.git && cd nixl && git checkout ${NIXL_TAG} && \ +RUN git clone --depth 1 --branch ${NIXL_TAG} https://github.com/ai-dynamo/nixl.git && cd nixl && \
--depth 1also guards against history rewriting attacks on older commits.
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (7)
container/Dockerfile.sglang(1 hunks)container/Dockerfile.sglang-wideep(1 hunks)container/Dockerfile.tensorrt_llm(1 hunks)container/Dockerfile.vllm(1 hunks)container/build.sh(1 hunks)lib/llm/Cargo.toml(1 hunks)pyproject.toml(1 hunks)
🧰 Additional context used
🧠 Learnings (4)
📓 Common learnings
Learnt from: dmitry-tokarev-nv
PR: ai-dynamo/dynamo#2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
📚 Learning: in docs/support_matrix.md, the nixl version difference between runtime dependencies (0.5.0) and buil...
Learnt from: dmitry-tokarev-nv
PR: ai-dynamo/dynamo#2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.
Applied to files:
pyproject.tomllib/llm/Cargo.tomlcontainer/Dockerfile.tensorrt_llmcontainer/Dockerfile.sglangcontainer/Dockerfile.sglang-wideepcontainer/Dockerfile.vllmcontainer/build.sh
📚 Learning: the codebase uses async-nats version 0.40, not the older nats crate. error handling should use async...
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Applied to files:
lib/llm/Cargo.toml
📚 Learning: graceful fallback for pytorch wheel installation is broken on arm architecture, so immediate exit on...
Learnt from: zaristei
PR: ai-dynamo/dynamo#2020
File: container/deps/vllm/install_vllm.sh:115-118
Timestamp: 2025-07-21T00:10:56.947Z
Learning: Graceful fallback for PyTorch wheel installation is broken on ARM architecture, so immediate exit on pinned version failure is preferred over fallback mechanisms in container/deps/vllm/install_vllm.sh for ARM64.
Applied to files:
container/build.sh
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (5)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: Build and Test - vllm
- GitHub Check: Mirror Repository to GitLab
🔇 Additional comments (3)
lib/llm/Cargo.toml (1)
90-94: Update lockfile to keep CI reproducible
nixl-syswas bumped to 0.4.1, but the change is only effective ifCargo.lockis regenerated (or the workspace lockfiles committed).
Please runcargo update -p nixl-sys --precise 0.4.1and commit the resulting lock diff so every environment – including CI and downstream integrators that vendor the lockfile – resolves to the same crate revision.container/Dockerfile.sglang (1)
42-45: Consistency with support-matrix noteUpgrading the build-time
NIXL_REFto 0.4.1 keeps the build images aligned. Just verify the runtime env still intentionally pulls 0.5.x as documented indocs/support_matrix.md; the deliberate mismatch noted in prior PRs should remain unchanged.container/build.sh (1)
116-118: Tag0.4.1confirmed on both amd64 and arm64The GitHub API check shows the
0.4.1tag exists for both architectures. The remaining action is to validate that the image still builds cleanly on linux/arm64:• Run the ARM64 build (e.g. via your CI matrix or a local
docker buildxjob) to ensure there are no regressions.
• If you encounter issues, pin to a known-good commit or reintroduce an ARM-specific override.
Overview:
update nixl version to 0.4.1
closes: OPS-578
Summary by CodeRabbit