Skip to content

Conversation

@nvda-mesharma
Copy link
Contributor

@nvda-mesharma nvda-mesharma commented Jun 30, 2025

#1542

This PR adds deployment automation to deploy documentation to a private repository when pushing to release branches.

Walkthrough

A new deploy job was added to the GitHub Actions workflow for documentation generation. This job depends on the build-docs job, runs only on release branches, and deploys generated documentation to an external repository using a GitHub Pages action, with deployment details dynamically set from the branch and commit.

Changes

File(s) Change Summary
.github/workflows/generate-docs.yml Added a deploy job that downloads the docs artifact, extracts the branch name, and deploys documentation to an external repo using GitHub Pages.

Sequence Diagram(s)

sequenceDiagram
    participant GitHub Actions
    participant build-docs Job
    participant deploy Job
    participant ai-dynamo/dynamo.github.io Repo

    GitHub Actions->>build-docs Job: Trigger on push to release branch
    build-docs Job-->>GitHub Actions: Generate and upload docs artifact
    GitHub Actions->>deploy Job: Start if build-docs succeeds
    deploy Job->>deploy Job: Download docs artifact
    deploy Job->>deploy Job: Extract branch name from GITHUB_REF
    deploy Job->>ai-dynamo/dynamo.github.io Repo: Deploy docs via gh-pages action
Loading
  1. Trying to resolve 404 issue
    https://****.pages.github.io/release/0.3.1/

The current condition if: startsWith(github.ref, 'refs/heads/release') only runs when pushing directly to a release branch, not on PRs targeting release branches.
For PRs, we need to check the target branch (github.base_ref) rather than the source branch (github.ref).

  1. Adding a PR comment with the docs link

Summary by CodeRabbit

  • New Features

    • Added comprehensive example manifests and documentation for deploying and testing the Inference Gateway with Dynamo on Kubernetes, including RBAC, services, routes, and model resources.
  • Bug Fixes

    • Fixed Sphinx documentation extension naming issue.
    • Updated documentation links and deployment guides for better navigation.
  • Refactor

    • Simplified port reservation logic and removed GPU temperature tracking from resource management.
    • Streamlined Dockerfile and build processes for documentation and various components.
  • Style

    • Adjusted logging levels for subprocess stderr output to reduce misleading error logs.
  • Tests

    • Enhanced test setup with model pre-download fixture and improved multimodal deployment handling.
  • Chores

    • Updated dependencies, configuration files, and container base images for improved reproducibility and compatibility.

nvda-mesharma and others added 30 commits June 16, 2025 09:49
…nches (#1527)

Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 30, 2025

Walkthrough

This update introduces documentation deployment previews for pull requests targeting release branches, refines and pins documentation build dependencies, and simplifies several Dockerfile and config setups. It also removes GPU temperature tracking from the SDK, adds model validation to multimodal frontends, updates test fixtures, and introduces comprehensive Kubernetes manifests and guides for the inference gateway.

Changes

Files / Group Change Summary
.github/workflows/generate-docs.yml Adds a deploy job for automated docs preview deployment and PR commenting on release branches.
container/Dockerfile.docs, container/deps/requirements.docs.txt Pins system/Python packages for docs, switches to requirements file for dependency management.
container/Dockerfile.sglang Changes sglang source to a fork and branch for regression workaround.
container/Dockerfile.tensorrt_llm Removes /usr/local/cuda/compat/lib.real from LD_LIBRARY_PATH.
container/Dockerfile.vllm Removes architecture-specific NIXL build logic; wheelhouse directory no longer deleted post-install.
container/build.sh, lib/llm/Cargo.toml Updates NIXL commit hash and switches to a Git dependency for nixl-sys.
deploy/cloud/api-store/Earthfile, deploy/cloud/operator/Earthfile Refactors image build stages and base images for API store and operator.
deploy/inference-gateway/example/README.md, .../resources/*.yaml Adds example manifests and a README for deploying/using the inference gateway with Dynamo.
deploy/sdk/src/dynamo/sdk/cli/allocator.py, .../lib/resource.py Removes GPU temperature logging and all related code from resource management.
deploy/sdk/src/dynamo/sdk/cli/circus.py, .../cli/utils.py Refactors port reservation logic, introduces PortReserver context manager.
docs/architecture/planner.md, docs/conf.py, docs/index.rst Updates doc links, corrects Sphinx extension name, revises overview and guides.
examples/multimodal/components/frontend.py, .../video_frontend.py Adds engine_args initialization and model validation in multimodal frontends.
examples/multimodal/configs/*.yaml Adds Frontend section referencing model in multimodal configs.
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml, .../router_configs/* Increases free_gpu_memory_fraction and removes cache_transceiver_config.
launch/dynamo-run/src/subprocess.rs Changes stderr log level from error to warn, adds explanatory comment.
launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml Adds disable_overlap_scheduler: true to config.
lib/llm/src/block_manager/block/transfer/nixl.rs Updates XferDescList::new calls to include a boolean argument.
tests/serve/conftest.py Adds predownload_models fixture and pytest hook to pre-download HuggingFace models for tests.
tests/serve/test_dynamo_serve.py Removes seed from payloads, focuses on chat completions, updates multimodal deployment handling.

Sequence Diagram(s)

sequenceDiagram
    participant PR as Pull Request
    participant GitHub Actions
    participant Docs Build
    participant Docs Deploy
    participant PR Comment

    PR->>GitHub Actions: Open PR to release branch
    GitHub Actions->>Docs Build: Run build-docs job
    Docs Build-->>GitHub Actions: Upload docs artifact
    GitHub Actions->>Docs Deploy: Run deploy job (on release PR)
    Docs Deploy->>Docs Deploy: Download artifact, clean metadata
    Docs Deploy->>Docs Deploy: Deploy docs to preview repo/branch
    Docs Deploy->>PR Comment: Post or update preview link on PR
Loading

Possibly related PRs

  • ai-dynamo/dynamo#1563: Also modifies subprocess stderr logging level, directly related to changes in launch/dynamo-run/src/subprocess.rs.
  • ai-dynamo/dynamo#1633: Adds the same predownload_models fixture to tests/serve/conftest.py, directly related to test improvements here.
  • ai-dynamo/dynamo#1544: Modifies container/Dockerfile.vllm to remove architecture-specific build logic, matching changes in this update.

Poem

A rabbit hops through docs and code,
Pinning packages, lightening the load.
No more GPU temps to fret,
Multimodal models never forget.
K8s manifests spring up anew—
With preview links and fixtures too!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

🧹 Nitpick comments (15)
examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml (2)

32-36: Raising free_gpu_memory_fraction to 0.75 may starve other allocations

Jumping from 0.40 → 0.75 leaves only 25 % of device memory for weights, activations, communication buffers, and any transient CUDA allocations. On consumer-grade 24 GB cards that’s ~6 GB—not much headroom for a 7-billion-parameter model plus NCCL/EFA buffers. Please benchmark typical and worst-case sequence lengths under expected batch sizes to ensure you do not trigger CUDA OOM or kernel fallback to host staging.


50-54: Same memory reservation change for generation servers

The generation path often holds the model’s full weight copy and larger activation footprints than the context path. Confirm the 75 % reservation still allows the model to load and run without spilling. If load failures appear intermittent, consider separate fractions per server role or exposing this as an env-configurable knob.

examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml (1)

47-51: Generation server memory headroom

Generation paths tend to explode in memory when sampling long continuations. If you observe latency spikes or watchdog resets, consider reverting to 0.5 – 0.6 or making this tunable per deployment.

deploy/sdk/src/dynamo/sdk/cli/utils.py (1)

61-89: Well-structured refactoring with minor cleanup needed.

The PortReserver class is a good simplification of the previous port reservation logic. The context manager protocol is correctly implemented with proper error handling and logging.

However, address the inconsistent return statement in the close_socket method:

 def close_socket(self):
     try:
         if self.socket:
             self.socket.close()
     except socket.error as e:
         logger.warning(f"Error while closing socket: {str(e)}")
-        # Don't re-raise the exception as this is cleanup code
-        return True
+        # Don't re-raise the exception as this is cleanup code
launch/dynamo-run/src/subprocess.rs (1)

101-105: Good pragmatic solution for stderr logging confusion.

The change from error to warn level addresses the user confusion caused by libraries outputting non-error messages to stderr. The explanatory comment properly documents the reasoning and acknowledges this as a temporary middle ground.

Consider implementing stderr parsing logic in the future to distinguish between actual errors and informational messages, possibly by analyzing message patterns or severity indicators.

lib/llm/Cargo.toml (1)

80-86: Pinned git-rev improves reproducibility but consider using a version tag

Direct git dependencies:

  • disable crates-io caching → longer CI times,
  • bypass cargo audit CVE tracking,
  • can be GC-ed by repo owners.

If the commit represents a release, ask upstream to tag it (e.g. v0.3.0) and depend with
nixl-sys = { git = "...", tag = "v0.3.0", optional = true }.
This keeps the same code while future-proofing lockfile refreshes.

Not blocking, but worth tracking.

deploy/inference-gateway/example/resources/http-router.yaml (1)

34-34: Add missing newline at end of file.

YAMLlint detected a missing newline character at the end of the file, which is required by YAML standards.

     timeouts:
       request: 300s
+
deploy/cloud/api-store/Earthfile (1)

47-47: Consider making image push conditional.

The --push flag makes this build push by default, which might not be desired during development or testing.

Consider making the push conditional:

-    SAVE IMAGE --push $DOCKER_SERVER/$IMAGE:$IMAGE_TAG
+    ARG PUSH=false
+    IF [ "$PUSH" = "true" ]
+        SAVE IMAGE --push $DOCKER_SERVER/$IMAGE:$IMAGE_TAG
+    ELSE
+        SAVE IMAGE $DOCKER_SERVER/$IMAGE:$IMAGE_TAG
+    END
deploy/inference-gateway/example/resources/cluster-role.yaml (1)

19-40: Well-structured RBAC permissions for inference gateway.

The permissions are appropriately scoped for the inference gateway's operations:

  • Read access to inference resources and pods
  • Authentication and authorization checks for security

However, fix the missing newline at the end of the file.

   verbs:
   - create
+
deploy/inference-gateway/example/resources/service.yaml (1)

15-28: Appropriate service configuration for internal communication.

The ClusterIP service with http2 protocol is well-configured for internal cluster communication with the inference backend.

However, fix the missing newline at the end of the file.

   type: ClusterIP
+
deploy/inference-gateway/example/resources/inference-model.yaml (1)

15-26: Missing trailing newline + minor metadata hardening

YAMLLint flags the absence of a newline at EOF. Add one to avoid tooling noise.

Also consider adding labels to the CR for easier selection & lifecycle ops:

26     name: dynamo-deepseek
+---
+# (newline)
container/Dockerfile.vllm (1)

506-507: Blindly symlinking every venv executable can shadow system binaries

ln -sf $VIRTUAL_ENV/bin/* /usr/local/bin/ will overwrite names like python, pip, etc., that might already exist in /usr/local/bin.
Safer: only link the project-specific entry points or prepend the venv bin dir to PATH.

-    ln -sf $VIRTUAL_ENV/bin/* /usr/local/bin/
+    find "$VIRTUAL_ENV/bin" -maxdepth 1 -type f -executable -name 'dynamo-*' -exec ln -sf {} /usr/local/bin/ \;
container/Dockerfile.sglang (1)

137-144: Clone depth optimisation and one-step checkout

Fetching full history then git fetch && git checkout is slow and inflates image size.

-    git clone https://github.com/ishandhanani/sglang.git && \
-    cd sglang && \
-    git fetch && \
-    git checkout dynamo-release-nixl-sort  && \
+    git clone --branch dynamo-release-nixl-sort --depth 1 \
+        https://github.com/ishandhanani/sglang.git && \
+    cd sglang && \

This keeps the layer minimal and deterministic.

deploy/inference-gateway/example/resources/inference-pool.yaml (1)

28-28: Add missing trailing newline to satisfy linters.

YAML‐lint flags the absence of a line-feed at EOF. Add one to avoid needless CI noise.

-    name: dynamo-deepseek-epp
\ No newline at end of file
+    name: dynamo-deepseek-epp
+
deploy/inference-gateway/example/README.md (1)

130-134: Fix invalid double -o flags in the kubectl get svc command.

kubectl accepts only a single -o flag; the first one (yaml) is silently ignored and may confuse readers.

-GATEWAY_URL=$(kubectl get svc inference-gateway -o yaml -o jsonpath='{.spec.clusterIP}')
+GATEWAY_URL=$(kubectl get svc inference-gateway -o jsonpath='{.spec.clusterIP}')
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 8f485b1 and 896c274.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (40)
  • .github/workflows/generate-docs.yml (1 hunks)
  • container/Dockerfile.docs (1 hunks)
  • container/Dockerfile.sglang (1 hunks)
  • container/Dockerfile.tensorrt_llm (1 hunks)
  • container/Dockerfile.vllm (3 hunks)
  • container/build.sh (1 hunks)
  • container/deps/requirements.docs.txt (1 hunks)
  • deploy/cloud/api-store/Earthfile (1 hunks)
  • deploy/cloud/operator/Earthfile (1 hunks)
  • deploy/inference-gateway/example/README.md (1 hunks)
  • deploy/inference-gateway/example/resources/cluster-role-binding.yaml (1 hunks)
  • deploy/inference-gateway/example/resources/cluster-role.yaml (1 hunks)
  • deploy/inference-gateway/example/resources/dynamo-epp.yaml (1 hunks)
  • deploy/inference-gateway/example/resources/http-router.yaml (1 hunks)
  • deploy/inference-gateway/example/resources/inference-model.yaml (1 hunks)
  • deploy/inference-gateway/example/resources/inference-pool.yaml (1 hunks)
  • deploy/inference-gateway/example/resources/service.yaml (1 hunks)
  • deploy/sdk/src/dynamo/sdk/cli/allocator.py (1 hunks)
  • deploy/sdk/src/dynamo/sdk/cli/circus.py (1 hunks)
  • deploy/sdk/src/dynamo/sdk/cli/utils.py (2 hunks)
  • deploy/sdk/src/dynamo/sdk/lib/resource.py (1 hunks)
  • docs/architecture/planner.md (1 hunks)
  • docs/conf.py (1 hunks)
  • docs/index.rst (1 hunks)
  • examples/multimodal/components/frontend.py (2 hunks)
  • examples/multimodal/components/video_frontend.py (2 hunks)
  • examples/multimodal/configs/agg-llava.yaml (1 hunks)
  • examples/multimodal/configs/agg-phi3v.yaml (1 hunks)
  • examples/multimodal/configs/agg-qwen.yaml (1 hunks)
  • examples/multimodal/configs/agg_video.yaml (1 hunks)
  • examples/multimodal/configs/disagg.yaml (1 hunks)
  • examples/multimodal/configs/disagg_video.yaml (1 hunks)
  • examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml (2 hunks)
  • examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml (2 hunks)
  • launch/dynamo-run/src/subprocess.rs (1 hunks)
  • launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1 hunks)
  • lib/llm/Cargo.toml (1 hunks)
  • lib/llm/src/block_manager/block/transfer/nixl.rs (1 hunks)
  • tests/serve/conftest.py (1 hunks)
  • tests/serve/test_dynamo_serve.py (8 hunks)
🧰 Additional context used
🧠 Learnings (5)
container/Dockerfile.tensorrt_llm (1)
Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.
lib/llm/src/block_manager/block/transfer/nixl.rs (3)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
docs/index.rst (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
deploy/sdk/src/dynamo/sdk/cli/allocator.py (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
container/Dockerfile.vllm (2)
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.
🧬 Code Graph Analysis (1)
deploy/sdk/src/dynamo/sdk/cli/circus.py (1)
deploy/sdk/src/dynamo/sdk/cli/utils.py (1)
  • reserve_free_port (92-100)
🪛 YAMLlint (1.37.1)
deploy/inference-gateway/example/resources/http-router.yaml

[error] 34-34: no new line character at the end of file

(new-line-at-end-of-file)

deploy/inference-gateway/example/resources/service.yaml

[error] 28-28: no new line character at the end of file

(new-line-at-end-of-file)

deploy/inference-gateway/example/resources/inference-model.yaml

[error] 26-26: no new line character at the end of file

(new-line-at-end-of-file)

deploy/inference-gateway/example/resources/cluster-role.yaml

[error] 40-40: no new line character at the end of file

(new-line-at-end-of-file)

deploy/inference-gateway/example/resources/inference-pool.yaml

[error] 28-28: no new line character at the end of file

(new-line-at-end-of-file)

🪛 Checkov (3.2.334)
deploy/inference-gateway/example/resources/dynamo-epp.yaml

[MEDIUM] 15-67: Containers should not run with allowPrivilegeEscalation

(CKV_K8S_20)


[MEDIUM] 15-67: Minimize the admission of root containers

(CKV_K8S_23)

🪛 LanguageTool
deploy/inference-gateway/example/README.md

[grammar] ~9-~9: The singular proper name ‘Kubernetes’ must be used with a third-person or a past tense verb.
Context: ...quests. ## Prerequisites - Kubernetes cluster with kubectl configured - NVIDIA GPU dr...

(HE_VERB_AGR)

🪛 actionlint (1.7.7)
.github/workflows/generate-docs.yml

98-98: shellcheck reported issue in this script: SC2129:style:3:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects

(shellcheck)


98-98: shellcheck reported issue in this script: SC2086:info:3:73: Double quote to prevent globbing and word splitting

(shellcheck)


98-98: shellcheck reported issue in this script: SC2086:info:4:40: Double quote to prevent globbing and word splitting

(shellcheck)


98-98: shellcheck reported issue in this script: SC2086:info:5:34: Double quote to prevent globbing and word splitting

(shellcheck)


106-106: the runner of "peaceiris/actions-gh-pages@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue

(action)

🪛 Pylint (3.3.7)
deploy/sdk/src/dynamo/sdk/cli/utils.py

[refactor] 81-81: Either all return statements in a function should return an expression, or none of them should.

(R1710)

🔇 Additional comments (36)
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml (1)

32-36: High KV-cache reservation—double-check aggregate memory plan

Mirroring the router config, allocating 75 % for KV-cache significantly tightens the global pool. Validate with nvidia-smi --query-compute-apps=used_memory under a soak test; adjust if fragmentation/OOMs occur.

deploy/sdk/src/dynamo/sdk/cli/utils.py (1)

92-100: Excellent simplification of the port reservation logic.

The refactored function is much cleaner and maintains the same interface while delegating the complexity to the PortReserver class. This improves maintainability and readability.

examples/multimodal/configs/disagg.yaml (1)

23-24: LGTM! Frontend configuration section added consistently.

The new Frontend section properly references the model configuration from the Common section, maintaining consistency with the established pattern across multimodal configuration files.

examples/multimodal/configs/agg-llava.yaml (1)

20-21: Frontend configuration properly structured.

The Frontend section correctly references the model configuration from the Common section, maintaining consistency with the multimodal configuration pattern.

examples/multimodal/configs/agg-qwen.yaml (1)

20-21: Consistent frontend configuration addition.

The Frontend section follows the established pattern and correctly references the model configuration from the Common section.

examples/multimodal/configs/disagg_video.yaml (1)

28-29: Excellent consistency across multimodal configurations.

The Frontend section maintains the established pattern across all multimodal configuration files, ensuring consistent behavior and configuration structure.

tests/serve/conftest.py (3)

22-25: LGTM: Well-defined model list for test fixtures.

The model list is clearly defined and appropriately scoped for serve tests. The models chosen (DeepSeek and LLaVA) appear to align with the multimodal testing requirements mentioned in the AI summary.


30-68: Excellent error handling and graceful degradation.

The fixture implementation demonstrates several best practices:

  • Graceful handling of missing HF_TOKEN with informative warnings
  • Proper exception handling that doesn't fail the fixture
  • Clear logging for debugging and monitoring
  • Handles missing huggingface_hub dependency appropriately

The approach of not failing the fixture when downloads fail is correct, as it allows individual tests to handle missing models as needed.


72-85: Smart automatic fixture injection with proper filtering.

The pytest hook implementation is well-designed:

  • Correctly handles test items without fixturenames attribute
  • Properly filters for serve directory tests
  • Respects existing fixture usage to avoid duplicates
  • Honors the skip_model_download marker for opt-out scenarios

This automation reduces test maintenance overhead while providing flexibility.

examples/multimodal/configs/agg-phi3v.yaml (1)

21-22: LGTM: Consistent frontend configuration standardization.

The addition of the Frontend section with common-configs: [model] aligns with the broader pattern of standardizing frontend configuration across multimodal examples. This supports the model validation logic implemented in the frontend components.

deploy/sdk/src/dynamo/sdk/cli/allocator.py (1)

236-236: Verify impact of removing GPU temperature from logs.

The removal of GPU temperature logging aligns with the broader GPU temperature tracking removal mentioned in the AI summary. However, temperature information can be valuable for debugging thermal throttling issues.

Consider whether this information should be available through alternative means (e.g., separate monitoring tools) or if this removal might impact troubleshooting capabilities.

examples/multimodal/configs/agg_video.yaml (1)

27-28: LGTM: Consistent frontend configuration pattern.

The Frontend section addition follows the same pattern as other multimodal configuration files, ensuring consistency across the examples. This standardization supports the frontend model validation functionality.

examples/multimodal/components/frontend.py (3)

21-21: LGTM: Appropriate imports for new functionality.

The additional imports for JSONResponse and parse_vllm_args are correctly added to support the model validation functionality and proper HTTP error responses.

Also applies to: 23-23


42-44: LGTM: Clean initialization of engine arguments.

The __init__ method properly initializes the engine_args attribute using the class name, which will be used for model validation. The implementation is straightforward and follows the expected pattern.


48-52: LGTM: Proper model validation with appropriate HTTP response.

The model validation logic correctly:

  • Compares the requested model against the configured model
  • Returns a proper 404 JSON error response for model mismatches
  • Uses appropriate HTTP status code and error message format
  • Prevents processing invalid model requests early in the flow

This enhances the robustness of the frontend by validating requests before processing.

examples/multimodal/components/video_frontend.py (3)

21-23: LGTM!

The imports are correctly added to support model validation functionality.


42-44: LGTM!

The initialization correctly parses VLLM arguments to obtain the configured model, following the same pattern as the regular frontend implementation.


48-52: LGTM!

The model validation correctly ensures that clients can only request the configured model, returning an appropriate 404 error for mismatched models. This prevents potential security issues and incorrect model usage.

deploy/sdk/src/dynamo/sdk/lib/resource.py (1)

167-273: LGTM!

The removal of GPU temperature tracking is complete and consistent. The updated comment accurately reflects that only utilization and memory statistics are now tracked.

tests/serve/test_dynamo_serve.py (2)

70-77: Why was the seed parameter commented out?

Removing the seed parameter could make test outputs less deterministic. While the low temperature (0.1) helps with consistency, having a fixed seed would ensure fully reproducible test results.

Was this change intentional? If so, please add a comment explaining why seed-based determinism is no longer needed for these tests.


162-162: LGTM!

The config file change to agg-llava.yaml correctly aligns with the llava model being tested in the multimodal payload.

deploy/cloud/operator/Earthfile (1)

39-45: Switching to nvcr.io distroless image may break unattended builds

nvcr.io images usually sit behind NGC authentication & EULA acceptance.
If CI runners (or downstream users) don’t have an NGC API key configured, docker build will fail at the very first layer pull.

  1. Confirm that every build environment (GH-Actions, on-prem, air-gapped) has DOCKER_CONFIG / ~/.docker/config.json populated with NGC credentials.
  2. Alternatively publish a public mirror (e.g. ghcr.io/ai-dynamo/distroless-go:v3.1.9-dev) or keep the previous gcr.io/distroless/static-debian11 fallback via a --build-arg BASE_IMAGE.

No code change needed if infra is ready, but without it the new workflow will be red.

docs/conf.py (1)

70-75: Correct module name – LGTM

Changing to "sphinx_prompt" matches the package’s entry-point, fixing the build error.
Make sure the new name exists in requirements.docs.txt.

docs/architecture/planner.md (1)

68-72: Verify that the new relative link resolves

../guides/planner_benchmark/README.md looks plausible, but broken links silently pass in Markdown-only checks.
Please run linkcheck or open the generated docs to confirm the file exists under that path.

launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)

22-25: Behavioural flag added – confirm engine supports it

disable_overlap_scheduler: true is harmless if recognised, but TRT-LLM before 0.9.0 ignores unknown keys and silently starts with defaults, defeating the intent.

Double-check the exact engine version shipped in the container image; if < 0.9, gate the field behind a version check or document the minimum required version.

lib/llm/src/block_manager/block/transfer/nixl.rs (1)

120-121: Verify boolean flag for XferDescList::new

We only found two calls to XferDescList::new—both hard-coded with true—so be sure this always matches your transfer semantics (e.g., enabling async notification). If some transfers should disable notifications, consider making this flag configurable or document the API change.

• lib/llm/src/block_manager/block/transfer/nixl.rs:120
let mut src_dl = XferDescList::new(src_mem_type, true)?;
• lib/llm/src/block_manager/block/transfer/nixl.rs:121
let mut dst_dl = XferDescList::new(dst_mem_type, true)?;
• No other XferDescList::new usages found.

Please confirm with the NIXL API docs or commit notes that true is the correct default here.

container/build.sh (1)

112-112: NIXL library update coordination looks good.

The NIXL commit hash update aligns with the API changes in the Rust code, ensuring consistency across the build environment and source code.

container/Dockerfile.tensorrt_llm (1)

95-95: No references to /usr/local/cuda/compat/lib.real found—please verify CUDA runtime

I ran a repo-wide search and didn’t find any remaining references to the removed compatibility path. However, deleting that entry from LD_LIBRARY_PATH can still lead to runtime failures if the base image or any CUDA binaries expect those libraries.

• Confirm by building the tensorrt_llm container and running a representative CUDA workload (e.g., nvidia-smi, a simple CUDA sample, or your application tests).
• If any “library not found” errors occur, consider re-adding the compat path or ensuring the needed .so files are present under /usr/lib.
• Update documentation or CI to include a smoke-test for CUDA functionality.

deploy/inference-gateway/example/resources/http-router.yaml (1)

29-32: Consider the broad path matching pattern.

The current configuration routes all traffic with path prefix / to the backend. This is appropriate for a dedicated inference service but ensure this aligns with your intended routing strategy.

docs/index.rst (2)

110-110: Documentation enhancement approved.

Adding the GKE Setup Guide enhances the deployment documentation coverage and aligns with the new Kubernetes manifests added in this PR.


118-118: Link correction looks good.

Updating the Planner Benchmark link to point to README.md ensures accurate documentation references.

deploy/cloud/api-store/Earthfile (2)

31-35: LGTM! Efficient build optimization.

The approach of installing packages in the uv-base stage and copying the entire environment is more efficient than rebuilding in the final image.


41-45: Good security improvement with distroless image.

Using the NVIDIA distroless Python image reduces the attack surface by eliminating unnecessary packages and shell access.

container/Dockerfile.docs (1)

31-34: uv build mount relies on BuildKit – confirm CI runners

RUN --mount=type=bind,… requires Docker BuildKit; legacy builders will ignore the flag and fail.
Double-check that all CI/CD and local developer environments enable BuildKit (DOCKER_BUILDKIT=1).
If not guaranteed, replace with a plain COPY requirements.docs.txt followed by RUN uv pip install -r ….

container/Dockerfile.vllm (1)

123-129: uv build may fail – ensure output directory exists

uv build . --out-dir /workspace/wheels/nixl assumes the parent path exists; on a fresh layer it doesn’t.

-RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl
+RUN mkdir -p /workspace/wheels/nixl && \
+    cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl

Pre-creating avoids sporadic “No such file or directory” errors during parallel builds.

Likely an incorrect or invalid review comment.

.github/workflows/generate-docs.yml (1)

66-68: deploy job never runs on direct pushes to release/* branches. Intentional?

build-docs still runs for push events, but deploy is gated to PRs only.
If you also want docs deployed after merging into a release branch you need an additional condition, e.g.:

if: |
  (github.event_name == 'pull_request' && startsWith(github.base_ref, 'release')) ||
  (github.event_name == 'push' && startsWith(github.ref, 'refs/heads/release/'))

Otherwise the published docs will stop updating once the PR is merged.

Comment on lines +32 to +57
# Conservatively, this timeout should mirror the longest grace period of the pods within the pool
terminationGracePeriodSeconds: 130
containers:
- name: epp
image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:main
imagePullPolicy: Always
args:
- -poolName
- "dynamo-deepseek"
- "-poolNamespace"
- "default"
- -v
- "4"
- --zap-encoder
- "json"
- -grpcPort
- "9002"
- -grpcHealthPort
- "9003"
ports:
- containerPort: 9002
- containerPort: 9003
- name: metrics
containerPort: 9090
livenessProbe:
grpc:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Harden container security context

Static analysis flags root / privilege escalation. Add a securityContext to follow best-practice:

      containers:
      - name: epp
+        securityContext:
+          runAsNonRoot: true
+          allowPrivilegeEscalation: false
+          capabilities:
+            drop: [ "ALL" ]

If the image truly requires root, document why and silence the linter with an explicit justification.

🤖 Prompt for AI Agents
In deploy/inference-gateway/example/resources/dynamo-epp.yaml around lines 32 to
57, the container runs as root or allows privilege escalation, which is flagged
by static analysis. Add a securityContext section under the container spec to
run the container as a non-root user and disable privilege escalation. If
running as root is necessary, add a comment explaining why and explicitly set
the securityContext to acknowledge this and silence the linter.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.