Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Jul 31, 2025

Overview:

Adds trtllm deploy example for K8s.

Cherrypick #2133

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

  • closes GitHub issue: #xxx

Summary by CodeRabbit

  • New Features

    • Added Kubernetes deployment templates for various TRTLLM configurations, including aggregated, disaggregated, and router-based setups.
    • Introduced a new port allocation utility for vLLM, supporting advanced port management and configuration.
  • Documentation

    • Updated framework support matrices and improved documentation structure across multiple README files.
    • Added detailed Kubernetes deployment instructions for TensorRT-LLM.
    • Clarified and reorganized installation steps and prerequisites for SGLang and examples.
  • Refactor

    • Modularized and improved port allocation logic for vLLM backend.
  • Style

    • Minor improvements to Dockerfile installation commands for SGLang images.

Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Jul 31, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@biswapanda biswapanda requested a review from PeaBrane as a code owner July 31, 2025 01:08
@github-actions github-actions bot added the feat label Jul 31, 2025
@biswapanda biswapanda changed the base branch from main to release/0.4.0 July 31, 2025 01:08
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 31, 2025

Caution

Review failed

Failed to post review comments.

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f8b0a5a and 6eb5ad1.

📒 Files selected for processing (13)
  • README.md (2 hunks)
  • components/backends/sglang/README.md (1 hunks)
  • components/backends/trtllm/README.md (2 hunks)
  • components/backends/trtllm/deploy/agg.yaml (1 hunks)
  • components/backends/trtllm/deploy/agg_router.yaml (1 hunks)
  • components/backends/trtllm/deploy/disagg.yaml (1 hunks)
  • components/backends/trtllm/deploy/disagg_router.yaml (1 hunks)
  • components/backends/vllm/README.md (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/args.py (5 hunks)
  • components/backends/vllm/src/dynamo/vllm/ports.py (1 hunks)
  • container/Dockerfile.sglang (1 hunks)
  • container/Dockerfile.sglang-wideep (1 hunks)
  • examples/README.md (2 hunks)
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
components/backends/sglang/README.md (1)

Learnt from: dmitry-tokarev-nv
PR: #2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.

components/backends/vllm/README.md (1)

Learnt from: dmitry-tokarev-nv
PR: #2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.

examples/README.md (1)

Learnt from: PeaBrane
PR: #1409
File: examples/router_standalone/worker.py:171-186
Timestamp: 2025-06-08T08:30:45.126Z
Learning: Example code in the examples/ directory may intentionally use hard-coded values or simplified implementations that wouldn't be appropriate for production code, but are acceptable for demonstration and testing purposes.

container/Dockerfile.sglang (1)

Learnt from: grahamking
PR: #1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.

components/backends/trtllm/README.md (1)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

README.md (2)

Learnt from: dmitry-tokarev-nv
PR: #2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.

Learnt from: biswapanda
PR: #1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.

components/backends/trtllm/deploy/disagg.yaml (4)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: nnshah1
PR: #2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.

components/backends/trtllm/deploy/disagg_router.yaml (3)

Learnt from: biswapanda
PR: #2137
File: components/backends/sglang/deploy/agg_router.yaml:0-0
Timestamp: 2025-07-28T17:00:07.968Z
Learning: In components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally designed to block the router from starting if it fails (using &&). This is a deliberate design decision where namespace clearing is a critical prerequisite and the router should not start with an uncleared namespace.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

components/backends/trtllm/deploy/agg.yaml (4)

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

Learnt from: nnshah1
PR: #2124
File: components/backends/vllm/deploy/disagg.yaml:54-60
Timestamp: 2025-07-25T22:34:11.384Z
Learning: In vLLM worker deployments, startup probes (with longer periods and higher failure thresholds like periodSeconds: 10, failureThreshold: 60) are used to handle the slow model loading startup phase, while liveness probes are intentionally kept aggressive (periodSeconds: 5, failureThreshold: 1) for quick failure detection once the worker is operational. This pattern separates startup concerns from operational health monitoring in GPU-heavy workloads.

components/backends/trtllm/deploy/agg_router.yaml (3)

Learnt from: biswapanda
PR: #2137
File: components/backends/sglang/deploy/agg_router.yaml:0-0
Timestamp: 2025-07-28T17:00:07.968Z
Learning: In components/backends/sglang/deploy/agg_router.yaml, the clear_namespace command is intentionally designed to block the router from starting if it fails (using &&). This is a deliberate design decision where namespace clearing is a critical prerequisite and the router should not start with an uncleared namespace.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:1178-1180
Timestamp: 2025-07-18T16:05:05.534Z
Learning: The stopSignal field under lifecycle in DynamoComponentDeployment CRDs is autogenerated due to Kubernetes library upgrades (k8s.io/api and k8s.io/apimachinery from v0.32.3 to v0.33.1), not a manual design decision by the user.

Learnt from: julienmancuso
PR: #2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamographdeployments.yaml:1233-1235
Timestamp: 2025-07-18T16:04:47.465Z
Learning: The stopSignal field in Kubernetes CRDs like DynamoGraphDeployment and DynamoComponentDeployment is autogenerated by controller-gen when upgrading Kubernetes library versions, and represents expected upstream API changes rather than manual code that needs custom validation.

🧬 Code Graph Analysis (1)
components/backends/vllm/src/dynamo/vllm/args.py (1)
components/backends/vllm/src/dynamo/vllm/ports.py (7)
  • DynamoPortRange (26-40)
  • EtcdContext (44-53)
  • PortAllocationRequest (78-85)
  • PortMetadata (57-74)
  • allocate_and_reserve_port (229-260)
  • allocate_and_reserve_port_block (142-226)
  • get_host_ip (263-290)
🔇 Additional comments (15)
components/backends/vllm/README.md (1)

38-43: Relative-path fix looks correct
The updated ../../../ path now correctly resolves from components/backends/vllm/README.md to the docs folder. No other issues spotted.

components/backends/sglang/README.md (1)

37-42: Good catch on broken documentation links
The extra ../ level is needed from this README location. Links render correctly now.

components/backends/trtllm/deploy/agg.yaml (2)

78-84: Environment variable is JSON-encoded – confirm consumer parsing
value: "[\"generate\"]" is passed as a plain string. Double-check that dynamo.trtllm expects a JSON string rather than a literal list coming from os.environ["DYN_SYSTEM_USE_ENDPOINT_HEALTH_STATUS"].


85-104: Engine config file path may be missing inside the image
--extra-engine-args engine_configs/agg.yaml assumes engine_configs/agg.yaml exists under /workspace/components/backends/trtllm (the workingDir). If the file is omitted from the runtime image build context the worker will crash at start-up.

Confirm the file is copied during image build or mount it via a ConfigMap/volume.

examples/README.md (1)

25-33: Framework-support block LGTM

The new section is clear, links resolve correctly from examples/ (../components/backends/...) and fixes the earlier duplication.
No action needed.

components/backends/trtllm/README.md (1)

52-57: Path fix approved

Switching the links to ../../../docs/... makes them work when this README is viewed from inside components/backends/trtllm. Good catch.

components/backends/trtllm/deploy/agg_router.yaml (1)

69-76: Custom GPU resource key may not schedule on NVIDIA nodes

The CRD allows gpu: "1", but the underlying node-plugin usually advertises nvidia.com/gpu. Confirm that the controller mutates this field, otherwise the pods will be unschedulable.

components/backends/vllm/src/dynamo/vllm/ports.py (4)

43-54: LGTM!

The EtcdContext dataclass properly encapsulates ETCD operations with namespace isolation and host-specific port keys.


56-75: LGTM!

The PortMetadata dataclass provides comprehensive metadata tracking with PID and timestamp for debugging and potential cleanup operations.


88-114: LGTM!

The hold_ports context manager correctly manages socket lifecycle with proper cleanup in the finally block, preventing resource leaks.


263-291: Excellent robust host IP detection with comprehensive fallback handling.

The implementation properly handles various failure scenarios (hostname resolution, binding tests) with clear logging and sensible fallback to localhost.

components/backends/vllm/src/dynamo/vllm/args.py (4)

15-26: LGTM!

Clean import organization bringing in the necessary port management utilities.


44-44: LGTM!

Appropriate addition of port_range field to store the configured port range.


75-86: LGTM!

Well-documented CLI arguments with appropriate defaults and clear help text about port range constraints.


133-136: LGTM!

Proper initialization of DynamoPortRange with validation handled by the dataclass.

Walkthrough

This update introduces a modular port allocation system for the vLLM backend, adds a new ports.py utility, and refactors port management in args.py. It also provides new Kubernetes deployment YAMLs for TRTLLM, corrects documentation links, updates installation instructions, and reorganizes framework support sections in several READMEs and Dockerfiles.

Changes

Cohort / File(s) Change Summary
vLLM Port Allocation Refactor & Utilities
components/backends/vllm/src/dynamo/vllm/args.py, components/backends/vllm/src/dynamo/vllm/ports.py
Refactored port allocation logic to use a new modular API with ETCD-based reservation, introduced explicit port range configuration, and added a new ports.py utility module with atomic port/block reservation, context management, and IP utilities.
TRTLLM Kubernetes Deployments
components/backends/trtllm/deploy/agg.yaml, components/backends/trtllm/deploy/agg_router.yaml, components/backends/trtllm/deploy/disagg.yaml, components/backends/trtllm/deploy/disagg_router.yaml
Added new Kubernetes CRD YAMLs for various TRTLLM deployment topologies (aggregated, disaggregated, router modes) specifying services, resources, health probes, and startup commands.
TRTLLM Documentation and Deployment Guide
components/backends/trtllm/README.md
Updated documentation links, added a comprehensive Kubernetes deployment section, and provided usage and benchmarking instructions.
Framework Support Matrix & Documentation Links
README.md, examples/README.md, components/backends/vllm/README.md, components/backends/sglang/README.md
Reorganized and promoted framework support sections, corrected documentation links, and improved introductory and installation instructions across main and example READMEs.
SGLang and Wideep Dockerfile Adjustments
container/Dockerfile.sglang, container/Dockerfile.sglang-wideep
Updated SGLang Dockerfile to install a pre-release flashinfer-python version, removed PYTHONPATH settings, and switched wideep Dockerfile from editable to standard pip install.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant vLLM Args Parser
    participant Ports Module
    participant ETCD

    User->>vLLM Args Parser: Launch with CLI args (including --dynamo-port-min/max)
    vLLM Args Parser->>Ports Module: Request port/block allocation (with range, metadata)
    Ports Module->>Ports Module: Hold candidate port(s) via sockets
    Ports Module->>ETCD: Atomically reserve port(s) with metadata
    ETCD-->>Ports Module: Confirmation of reservation
    Ports Module-->>vLLM Args Parser: Return allocated port(s)
    vLLM Args Parser-->>User: Set environment/config with reserved ports
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related PRs

Poem

🐇✨
Ports now dance in tidy rows,
Reserved in blocks, no race to close!
YAMLs sprout for TRTLLM’s might,
Docs and installs are set just right.
Frameworks listed, links now true—
Hop, hop! More robust for you!

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@biswapanda biswapanda enabled auto-merge (squash) July 31, 2025 05:03
@dmitry-tokarev-nv dmitry-tokarev-nv merged commit d0de1a0 into release/0.4.0 Jul 31, 2025
5 checks passed
@dmitry-tokarev-nv dmitry-tokarev-nv deleted the bis/deploy-trtllm branch July 31, 2025 15:32
nealvaidya added a commit that referenced this pull request Aug 1, 2025
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
(cherry picked from commit 65e89b3)
@coderabbitai coderabbitai bot mentioned this pull request Sep 5, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants