-
Notifications
You must be signed in to change notification settings - Fork 689
docs: Add GitHub Pages deployment to dynamo.github.io for release branches #1704
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…nches (#1527) Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
Signed-off-by: Meenakshi Sharma <163925564+nvda-mesharma@users.noreply.github.com>
WalkthroughThis update introduces documentation deployment previews for pull requests targeting release branches, refines and pins documentation build dependencies, and simplifies several Dockerfile and config setups. It also removes GPU temperature tracking from the SDK, adds model validation to multimodal frontends, updates test fixtures, and introduces comprehensive Kubernetes manifests and guides for the inference gateway. Changes
Sequence Diagram(s)sequenceDiagram
participant PR as Pull Request
participant GitHub Actions
participant Docs Build
participant Docs Deploy
participant PR Comment
PR->>GitHub Actions: Open PR to release branch
GitHub Actions->>Docs Build: Run build-docs job
Docs Build-->>GitHub Actions: Upload docs artifact
GitHub Actions->>Docs Deploy: Run deploy job (on release PR)
Docs Deploy->>Docs Deploy: Download artifact, clean metadata
Docs Deploy->>Docs Deploy: Deploy docs to preview repo/branch
Docs Deploy->>PR Comment: Post or update preview link on PR
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 8
🧹 Nitpick comments (15)
examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml (2)
32-36: Raisingfree_gpu_memory_fractionto 0.75 may starve other allocationsJumping from 0.40 → 0.75 leaves only 25 % of device memory for weights, activations, communication buffers, and any transient CUDA allocations. On consumer-grade 24 GB cards that’s ~6 GB—not much headroom for a 7-billion-parameter model plus NCCL/EFA buffers. Please benchmark typical and worst-case sequence lengths under expected batch sizes to ensure you do not trigger CUDA OOM or kernel fallback to host staging.
50-54: Same memory reservation change for generation serversThe generation path often holds the model’s full weight copy and larger activation footprints than the context path. Confirm the 75 % reservation still allows the model to load and run without spilling. If load failures appear intermittent, consider separate fractions per server role or exposing this as an env-configurable knob.
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml (1)
47-51: Generation server memory headroomGeneration paths tend to explode in memory when sampling long continuations. If you observe latency spikes or watchdog resets, consider reverting to 0.5 – 0.6 or making this tunable per deployment.
deploy/sdk/src/dynamo/sdk/cli/utils.py (1)
61-89: Well-structured refactoring with minor cleanup needed.The
PortReserverclass is a good simplification of the previous port reservation logic. The context manager protocol is correctly implemented with proper error handling and logging.However, address the inconsistent return statement in the
close_socketmethod:def close_socket(self): try: if self.socket: self.socket.close() except socket.error as e: logger.warning(f"Error while closing socket: {str(e)}") - # Don't re-raise the exception as this is cleanup code - return True + # Don't re-raise the exception as this is cleanup codelaunch/dynamo-run/src/subprocess.rs (1)
101-105: Good pragmatic solution for stderr logging confusion.The change from
errortowarnlevel addresses the user confusion caused by libraries outputting non-error messages to stderr. The explanatory comment properly documents the reasoning and acknowledges this as a temporary middle ground.Consider implementing stderr parsing logic in the future to distinguish between actual errors and informational messages, possibly by analyzing message patterns or severity indicators.
lib/llm/Cargo.toml (1)
80-86: Pinned git-rev improves reproducibility but consider using a version tagDirect
gitdependencies:
- disable crates-io caching → longer CI times,
- bypass
cargo auditCVE tracking,- can be GC-ed by repo owners.
If the commit represents a release, ask upstream to tag it (e.g.
v0.3.0) and depend with
nixl-sys = { git = "...", tag = "v0.3.0", optional = true }.
This keeps the same code while future-proofing lockfile refreshes.Not blocking, but worth tracking.
deploy/inference-gateway/example/resources/http-router.yaml (1)
34-34: Add missing newline at end of file.YAMLlint detected a missing newline character at the end of the file, which is required by YAML standards.
timeouts: request: 300s +deploy/cloud/api-store/Earthfile (1)
47-47: Consider making image push conditional.The
--pushflag makes this build push by default, which might not be desired during development or testing.Consider making the push conditional:
- SAVE IMAGE --push $DOCKER_SERVER/$IMAGE:$IMAGE_TAG + ARG PUSH=false + IF [ "$PUSH" = "true" ] + SAVE IMAGE --push $DOCKER_SERVER/$IMAGE:$IMAGE_TAG + ELSE + SAVE IMAGE $DOCKER_SERVER/$IMAGE:$IMAGE_TAG + ENDdeploy/inference-gateway/example/resources/cluster-role.yaml (1)
19-40: Well-structured RBAC permissions for inference gateway.The permissions are appropriately scoped for the inference gateway's operations:
- Read access to inference resources and pods
- Authentication and authorization checks for security
However, fix the missing newline at the end of the file.
verbs: - create +deploy/inference-gateway/example/resources/service.yaml (1)
15-28: Appropriate service configuration for internal communication.The ClusterIP service with http2 protocol is well-configured for internal cluster communication with the inference backend.
However, fix the missing newline at the end of the file.
type: ClusterIP +deploy/inference-gateway/example/resources/inference-model.yaml (1)
15-26: Missing trailing newline + minor metadata hardeningYAMLLint flags the absence of a newline at EOF. Add one to avoid tooling noise.
Also consider adding
labelsto the CR for easier selection & lifecycle ops:26 name: dynamo-deepseek +--- +# (newline)container/Dockerfile.vllm (1)
506-507: Blindly symlinking every venv executable can shadow system binaries
ln -sf $VIRTUAL_ENV/bin/* /usr/local/bin/will overwrite names likepython,pip, etc., that might already exist in/usr/local/bin.
Safer: only link the project-specific entry points or prepend the venv bin dir toPATH.- ln -sf $VIRTUAL_ENV/bin/* /usr/local/bin/ + find "$VIRTUAL_ENV/bin" -maxdepth 1 -type f -executable -name 'dynamo-*' -exec ln -sf {} /usr/local/bin/ \;container/Dockerfile.sglang (1)
137-144: Clone depth optimisation and one-step checkoutFetching full history then
git fetch && git checkoutis slow and inflates image size.- git clone https://github.com/ishandhanani/sglang.git && \ - cd sglang && \ - git fetch && \ - git checkout dynamo-release-nixl-sort && \ + git clone --branch dynamo-release-nixl-sort --depth 1 \ + https://github.com/ishandhanani/sglang.git && \ + cd sglang && \This keeps the layer minimal and deterministic.
deploy/inference-gateway/example/resources/inference-pool.yaml (1)
28-28: Add missing trailing newline to satisfy linters.YAML‐lint flags the absence of a line-feed at EOF. Add one to avoid needless CI noise.
- name: dynamo-deepseek-epp \ No newline at end of file + name: dynamo-deepseek-epp +deploy/inference-gateway/example/README.md (1)
130-134: Fix invalid double-oflags in thekubectl get svccommand.
kubectlaccepts only a single-oflag; the first one (yaml) is silently ignored and may confuse readers.-GATEWAY_URL=$(kubectl get svc inference-gateway -o yaml -o jsonpath='{.spec.clusterIP}') +GATEWAY_URL=$(kubectl get svc inference-gateway -o jsonpath='{.spec.clusterIP}')
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (40)
.github/workflows/generate-docs.yml(1 hunks)container/Dockerfile.docs(1 hunks)container/Dockerfile.sglang(1 hunks)container/Dockerfile.tensorrt_llm(1 hunks)container/Dockerfile.vllm(3 hunks)container/build.sh(1 hunks)container/deps/requirements.docs.txt(1 hunks)deploy/cloud/api-store/Earthfile(1 hunks)deploy/cloud/operator/Earthfile(1 hunks)deploy/inference-gateway/example/README.md(1 hunks)deploy/inference-gateway/example/resources/cluster-role-binding.yaml(1 hunks)deploy/inference-gateway/example/resources/cluster-role.yaml(1 hunks)deploy/inference-gateway/example/resources/dynamo-epp.yaml(1 hunks)deploy/inference-gateway/example/resources/http-router.yaml(1 hunks)deploy/inference-gateway/example/resources/inference-model.yaml(1 hunks)deploy/inference-gateway/example/resources/inference-pool.yaml(1 hunks)deploy/inference-gateway/example/resources/service.yaml(1 hunks)deploy/sdk/src/dynamo/sdk/cli/allocator.py(1 hunks)deploy/sdk/src/dynamo/sdk/cli/circus.py(1 hunks)deploy/sdk/src/dynamo/sdk/cli/utils.py(2 hunks)deploy/sdk/src/dynamo/sdk/lib/resource.py(1 hunks)docs/architecture/planner.md(1 hunks)docs/conf.py(1 hunks)docs/index.rst(1 hunks)examples/multimodal/components/frontend.py(2 hunks)examples/multimodal/components/video_frontend.py(2 hunks)examples/multimodal/configs/agg-llava.yaml(1 hunks)examples/multimodal/configs/agg-phi3v.yaml(1 hunks)examples/multimodal/configs/agg-qwen.yaml(1 hunks)examples/multimodal/configs/agg_video.yaml(1 hunks)examples/multimodal/configs/disagg.yaml(1 hunks)examples/multimodal/configs/disagg_video.yaml(1 hunks)examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml(2 hunks)examples/tensorrt_llm/configs/llmapi_disagg_router_configs/single_node_config.yaml(2 hunks)launch/dynamo-run/src/subprocess.rs(1 hunks)launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml(1 hunks)lib/llm/Cargo.toml(1 hunks)lib/llm/src/block_manager/block/transfer/nixl.rs(1 hunks)tests/serve/conftest.py(1 hunks)tests/serve/test_dynamo_serve.py(8 hunks)
🧰 Additional context used
🧠 Learnings (5)
container/Dockerfile.tensorrt_llm (1)
Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.
lib/llm/src/block_manager/block/transfer/nixl.rs (3)
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1363
File: lib/llm/src/block_manager/block/transfer.rs:206-216
Timestamp: 2025-06-04T18:43:04.566Z
Learning: For NIXL transfers in the KVBM system, the future returned by `nixl::write_blocks_to` is independent of the underlying transfer execution. The transfer begins immediately when `nixl::write_blocks_to` is called, and the returned future is only used for notification/completion tracking. Therefore, it's safe to drop the future when notification is not needed (`notify == false`).
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
docs/index.rst (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
deploy/sdk/src/dynamo/sdk/cli/allocator.py (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
container/Dockerfile.vllm (2)
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Learnt from: grahamking
PR: ai-dynamo/dynamo#1177
File: container/Dockerfile.vllm:102-105
Timestamp: 2025-05-28T22:54:46.875Z
Learning: In Dockerfiles, when appending to environment variables that may not exist in the base image, Docker validation will fail if you reference undefined variables with ${VARIABLE} syntax. In such cases, setting the environment variable directly (e.g., ENV CPATH=/usr/include) rather than appending is the appropriate approach.
🧬 Code Graph Analysis (1)
deploy/sdk/src/dynamo/sdk/cli/circus.py (1)
deploy/sdk/src/dynamo/sdk/cli/utils.py (1)
reserve_free_port(92-100)
🪛 YAMLlint (1.37.1)
deploy/inference-gateway/example/resources/http-router.yaml
[error] 34-34: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/service.yaml
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/inference-model.yaml
[error] 26-26: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/cluster-role.yaml
[error] 40-40: no new line character at the end of file
(new-line-at-end-of-file)
deploy/inference-gateway/example/resources/inference-pool.yaml
[error] 28-28: no new line character at the end of file
(new-line-at-end-of-file)
🪛 Checkov (3.2.334)
deploy/inference-gateway/example/resources/dynamo-epp.yaml
[MEDIUM] 15-67: Containers should not run with allowPrivilegeEscalation
(CKV_K8S_20)
[MEDIUM] 15-67: Minimize the admission of root containers
(CKV_K8S_23)
🪛 LanguageTool
deploy/inference-gateway/example/README.md
[grammar] ~9-~9: The singular proper name ‘Kubernetes’ must be used with a third-person or a past tense verb.
Context: ...quests. ## Prerequisites - Kubernetes cluster with kubectl configured - NVIDIA GPU dr...
(HE_VERB_AGR)
🪛 actionlint (1.7.7)
.github/workflows/generate-docs.yml
98-98: shellcheck reported issue in this script: SC2129:style:3:1: Consider using { cmd1; cmd2; } >> file instead of individual redirects
(shellcheck)
98-98: shellcheck reported issue in this script: SC2086:info:3:73: Double quote to prevent globbing and word splitting
(shellcheck)
98-98: shellcheck reported issue in this script: SC2086:info:4:40: Double quote to prevent globbing and word splitting
(shellcheck)
98-98: shellcheck reported issue in this script: SC2086:info:5:34: Double quote to prevent globbing and word splitting
(shellcheck)
106-106: the runner of "peaceiris/actions-gh-pages@v3" action is too old to run on GitHub Actions. update the action's version to fix this issue
(action)
🪛 Pylint (3.3.7)
deploy/sdk/src/dynamo/sdk/cli/utils.py
[refactor] 81-81: Either all return statements in a function should return an expression, or none of them should.
(R1710)
🔇 Additional comments (36)
examples/tensorrt_llm/configs/llmapi_disagg_configs/single_node_config.yaml (1)
32-36: High KV-cache reservation—double-check aggregate memory planMirroring the router config, allocating 75 % for KV-cache significantly tightens the global pool. Validate with
nvidia-smi --query-compute-apps=used_memoryunder a soak test; adjust if fragmentation/OOMs occur.deploy/sdk/src/dynamo/sdk/cli/utils.py (1)
92-100: Excellent simplification of the port reservation logic.The refactored function is much cleaner and maintains the same interface while delegating the complexity to the
PortReserverclass. This improves maintainability and readability.examples/multimodal/configs/disagg.yaml (1)
23-24: LGTM! Frontend configuration section added consistently.The new Frontend section properly references the model configuration from the Common section, maintaining consistency with the established pattern across multimodal configuration files.
examples/multimodal/configs/agg-llava.yaml (1)
20-21: Frontend configuration properly structured.The Frontend section correctly references the model configuration from the Common section, maintaining consistency with the multimodal configuration pattern.
examples/multimodal/configs/agg-qwen.yaml (1)
20-21: Consistent frontend configuration addition.The Frontend section follows the established pattern and correctly references the model configuration from the Common section.
examples/multimodal/configs/disagg_video.yaml (1)
28-29: Excellent consistency across multimodal configurations.The Frontend section maintains the established pattern across all multimodal configuration files, ensuring consistent behavior and configuration structure.
tests/serve/conftest.py (3)
22-25: LGTM: Well-defined model list for test fixtures.The model list is clearly defined and appropriately scoped for serve tests. The models chosen (DeepSeek and LLaVA) appear to align with the multimodal testing requirements mentioned in the AI summary.
30-68: Excellent error handling and graceful degradation.The fixture implementation demonstrates several best practices:
- Graceful handling of missing HF_TOKEN with informative warnings
- Proper exception handling that doesn't fail the fixture
- Clear logging for debugging and monitoring
- Handles missing
huggingface_hubdependency appropriatelyThe approach of not failing the fixture when downloads fail is correct, as it allows individual tests to handle missing models as needed.
72-85: Smart automatic fixture injection with proper filtering.The pytest hook implementation is well-designed:
- Correctly handles test items without fixturenames attribute
- Properly filters for serve directory tests
- Respects existing fixture usage to avoid duplicates
- Honors the
skip_model_downloadmarker for opt-out scenariosThis automation reduces test maintenance overhead while providing flexibility.
examples/multimodal/configs/agg-phi3v.yaml (1)
21-22: LGTM: Consistent frontend configuration standardization.The addition of the Frontend section with
common-configs: [model]aligns with the broader pattern of standardizing frontend configuration across multimodal examples. This supports the model validation logic implemented in the frontend components.deploy/sdk/src/dynamo/sdk/cli/allocator.py (1)
236-236: Verify impact of removing GPU temperature from logs.The removal of GPU temperature logging aligns with the broader GPU temperature tracking removal mentioned in the AI summary. However, temperature information can be valuable for debugging thermal throttling issues.
Consider whether this information should be available through alternative means (e.g., separate monitoring tools) or if this removal might impact troubleshooting capabilities.
examples/multimodal/configs/agg_video.yaml (1)
27-28: LGTM: Consistent frontend configuration pattern.The Frontend section addition follows the same pattern as other multimodal configuration files, ensuring consistency across the examples. This standardization supports the frontend model validation functionality.
examples/multimodal/components/frontend.py (3)
21-21: LGTM: Appropriate imports for new functionality.The additional imports for
JSONResponseandparse_vllm_argsare correctly added to support the model validation functionality and proper HTTP error responses.Also applies to: 23-23
42-44: LGTM: Clean initialization of engine arguments.The
__init__method properly initializes theengine_argsattribute using the class name, which will be used for model validation. The implementation is straightforward and follows the expected pattern.
48-52: LGTM: Proper model validation with appropriate HTTP response.The model validation logic correctly:
- Compares the requested model against the configured model
- Returns a proper 404 JSON error response for model mismatches
- Uses appropriate HTTP status code and error message format
- Prevents processing invalid model requests early in the flow
This enhances the robustness of the frontend by validating requests before processing.
examples/multimodal/components/video_frontend.py (3)
21-23: LGTM!The imports are correctly added to support model validation functionality.
42-44: LGTM!The initialization correctly parses VLLM arguments to obtain the configured model, following the same pattern as the regular frontend implementation.
48-52: LGTM!The model validation correctly ensures that clients can only request the configured model, returning an appropriate 404 error for mismatched models. This prevents potential security issues and incorrect model usage.
deploy/sdk/src/dynamo/sdk/lib/resource.py (1)
167-273: LGTM!The removal of GPU temperature tracking is complete and consistent. The updated comment accurately reflects that only utilization and memory statistics are now tracked.
tests/serve/test_dynamo_serve.py (2)
70-77: Why was the seed parameter commented out?Removing the seed parameter could make test outputs less deterministic. While the low temperature (0.1) helps with consistency, having a fixed seed would ensure fully reproducible test results.
Was this change intentional? If so, please add a comment explaining why seed-based determinism is no longer needed for these tests.
162-162: LGTM!The config file change to
agg-llava.yamlcorrectly aligns with the llava model being tested in the multimodal payload.deploy/cloud/operator/Earthfile (1)
39-45: Switching to nvcr.io distroless image may break unattended builds
nvcr.ioimages usually sit behind NGC authentication & EULA acceptance.
If CI runners (or downstream users) don’t have an NGC API key configured,docker buildwill fail at the very first layer pull.
- Confirm that every build environment (GH-Actions, on-prem, air-gapped) has
DOCKER_CONFIG/~/.docker/config.jsonpopulated with NGC credentials.- Alternatively publish a public mirror (e.g.
ghcr.io/ai-dynamo/distroless-go:v3.1.9-dev) or keep the previousgcr.io/distroless/static-debian11fallback via a--build-arg BASE_IMAGE.No code change needed if infra is ready, but without it the new workflow will be red.
docs/conf.py (1)
70-75: Correct module name – LGTMChanging to
"sphinx_prompt"matches the package’s entry-point, fixing the build error.
Make sure the new name exists inrequirements.docs.txt.docs/architecture/planner.md (1)
68-72: Verify that the new relative link resolves
../guides/planner_benchmark/README.mdlooks plausible, but broken links silently pass in Markdown-only checks.
Please runlinkcheckor open the generated docs to confirm the file exists under that path.launch/dynamo-run/src/subprocess/trtllm_config/sample.yaml (1)
22-25: Behavioural flag added – confirm engine supports it
disable_overlap_scheduler: trueis harmless if recognised, but TRT-LLM before 0.9.0 ignores unknown keys and silently starts with defaults, defeating the intent.Double-check the exact engine version shipped in the container image; if < 0.9, gate the field behind a version check or document the minimum required version.
lib/llm/src/block_manager/block/transfer/nixl.rs (1)
120-121: Verify boolean flag for XferDescList::newWe only found two calls to
XferDescList::new—both hard-coded withtrue—so be sure this always matches your transfer semantics (e.g., enabling async notification). If some transfers should disable notifications, consider making this flag configurable or document the API change.• lib/llm/src/block_manager/block/transfer/nixl.rs:120
let mut src_dl = XferDescList::new(src_mem_type, true)?;
• lib/llm/src/block_manager/block/transfer/nixl.rs:121
let mut dst_dl = XferDescList::new(dst_mem_type, true)?;
• No otherXferDescList::newusages found.Please confirm with the NIXL API docs or commit notes that
trueis the correct default here.container/build.sh (1)
112-112: NIXL library update coordination looks good.The NIXL commit hash update aligns with the API changes in the Rust code, ensuring consistency across the build environment and source code.
container/Dockerfile.tensorrt_llm (1)
95-95: No references to/usr/local/cuda/compat/lib.realfound—please verify CUDA runtimeI ran a repo-wide search and didn’t find any remaining references to the removed compatibility path. However, deleting that entry from
LD_LIBRARY_PATHcan still lead to runtime failures if the base image or any CUDA binaries expect those libraries.• Confirm by building the
tensorrt_llmcontainer and running a representative CUDA workload (e.g.,nvidia-smi, a simple CUDA sample, or your application tests).
• If any “library not found” errors occur, consider re-adding the compat path or ensuring the needed.sofiles are present under/usr/lib.
• Update documentation or CI to include a smoke-test for CUDA functionality.deploy/inference-gateway/example/resources/http-router.yaml (1)
29-32: Consider the broad path matching pattern.The current configuration routes all traffic with path prefix
/to the backend. This is appropriate for a dedicated inference service but ensure this aligns with your intended routing strategy.docs/index.rst (2)
110-110: Documentation enhancement approved.Adding the GKE Setup Guide enhances the deployment documentation coverage and aligns with the new Kubernetes manifests added in this PR.
118-118: Link correction looks good.Updating the Planner Benchmark link to point to
README.mdensures accurate documentation references.deploy/cloud/api-store/Earthfile (2)
31-35: LGTM! Efficient build optimization.The approach of installing packages in the
uv-basestage and copying the entire environment is more efficient than rebuilding in the final image.
41-45: Good security improvement with distroless image.Using the NVIDIA distroless Python image reduces the attack surface by eliminating unnecessary packages and shell access.
container/Dockerfile.docs (1)
31-34:uv buildmount relies on BuildKit – confirm CI runners
RUN --mount=type=bind,…requires Docker BuildKit; legacy builders will ignore the flag and fail.
Double-check that all CI/CD and local developer environments enable BuildKit (DOCKER_BUILDKIT=1).
If not guaranteed, replace with a plainCOPY requirements.docs.txtfollowed byRUN uv pip install -r ….container/Dockerfile.vllm (1)
123-129:uv buildmay fail – ensure output directory exists
uv build . --out-dir /workspace/wheels/nixlassumes the parent path exists; on a fresh layer it doesn’t.-RUN cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixl +RUN mkdir -p /workspace/wheels/nixl && \ + cd /opt/nixl && uv build . --out-dir /workspace/wheels/nixlPre-creating avoids sporadic “No such file or directory” errors during parallel builds.
Likely an incorrect or invalid review comment.
.github/workflows/generate-docs.yml (1)
66-68:deployjob never runs on direct pushes torelease/*branches. Intentional?
build-docsstill runs forpushevents, butdeployis gated to PRs only.
If you also want docs deployed after merging into a release branch you need an additional condition, e.g.:if: | (github.event_name == 'pull_request' && startsWith(github.base_ref, 'release')) || (github.event_name == 'push' && startsWith(github.ref, 'refs/heads/release/'))Otherwise the published docs will stop updating once the PR is merged.
| # Conservatively, this timeout should mirror the longest grace period of the pods within the pool | ||
| terminationGracePeriodSeconds: 130 | ||
| containers: | ||
| - name: epp | ||
| image: us-central1-docker.pkg.dev/k8s-staging-images/gateway-api-inference-extension/epp:main | ||
| imagePullPolicy: Always | ||
| args: | ||
| - -poolName | ||
| - "dynamo-deepseek" | ||
| - "-poolNamespace" | ||
| - "default" | ||
| - -v | ||
| - "4" | ||
| - --zap-encoder | ||
| - "json" | ||
| - -grpcPort | ||
| - "9002" | ||
| - -grpcHealthPort | ||
| - "9003" | ||
| ports: | ||
| - containerPort: 9002 | ||
| - containerPort: 9003 | ||
| - name: metrics | ||
| containerPort: 9090 | ||
| livenessProbe: | ||
| grpc: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Harden container security context
Static analysis flags root / privilege escalation. Add a securityContext to follow best-practice:
containers:
- name: epp
+ securityContext:
+ runAsNonRoot: true
+ allowPrivilegeEscalation: false
+ capabilities:
+ drop: [ "ALL" ]If the image truly requires root, document why and silence the linter with an explicit justification.
🤖 Prompt for AI Agents
In deploy/inference-gateway/example/resources/dynamo-epp.yaml around lines 32 to
57, the container runs as root or allows privilege escalation, which is flagged
by static analysis. Add a securityContext section under the container spec to
run the container as a non-root user and disable privilege escalation. If
running as root is necessary, add a comment explaining why and explicitly set
the securityContext to acknowledge this and silence the linter.
#1542
This PR adds deployment automation to deploy documentation to a private repository when pushing to release branches.
Walkthrough
A new
deployjob was added to the GitHub Actions workflow for documentation generation. This job depends on thebuild-docsjob, runs only onreleasebranches, and deploys generated documentation to an external repository using a GitHub Pages action, with deployment details dynamically set from the branch and commit.Changes
deployjob that downloads the docs artifact, extracts the branch name, and deploys documentation to an external repo using GitHub Pages.Sequence Diagram(s)
sequenceDiagram participant GitHub Actions participant build-docs Job participant deploy Job participant ai-dynamo/dynamo.github.io Repo GitHub Actions->>build-docs Job: Trigger on push to release branch build-docs Job-->>GitHub Actions: Generate and upload docs artifact GitHub Actions->>deploy Job: Start if build-docs succeeds deploy Job->>deploy Job: Download docs artifact deploy Job->>deploy Job: Extract branch name from GITHUB_REF deploy Job->>ai-dynamo/dynamo.github.io Repo: Deploy docs via gh-pages actionhttps://****.pages.github.io/release/0.3.1/
The current condition if: startsWith(github.ref, 'refs/heads/release') only runs when pushing directly to a release branch, not on PRs targeting release branches.
For PRs, we need to check the target branch (github.base_ref) rather than the source branch (github.ref).
Summary by CodeRabbit
New Features
Bug Fixes
Refactor
Style
Tests
Chores