docs: example for dynamo cloud 0.4.1 deployment on minikube #2988

DFatadeNVIDIA · 2025-09-10T14:31:05Z

Overview:

This document covers the process of deploying Dynamo Cloud and running inference in a vLLM distributed runtime within a Kubernetes environment. The Dynamo Cloud Platform provides a managed deployment experience.

Details:

Contains the infrastructure components required for the Dynamo Cloud platform.
Leverages the Dynamo Operator and its exposed CRDs to deploy Dynamo inference graphs.

This overview covers the setup process on a Minikube instance, including:

Deploying the Dynamo Operator and creating Dynamo CRDs.
Deploying an inference graph built in vLLM Dynamo Runtime.
Setting up ingress and running inference.

Where should the reviewer start?

Review examples/deployments/minikube/README.md to review instructions for setting up Dynamo Cloud platform and deploying a vLLM inference graph.

Summary by CodeRabbit

New Features
- Backend endpoints now respect the DYN_NAMESPACE environment variable, enabling namespace-configurable deployments.
Documentation
- Added a comprehensive Minikube/Kubernetes deployment guide, including prerequisites, setup, secrets configuration, Helm-based installation, exposing services via Ingress, verification, and cleanup steps.
Chores
- Updated development container image tag to a local vLLM-focused variant for improved dev environment parity.

copy-pr-bot · 2025-09-10T14:31:09Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

Signed-off-by: DFatadeNVIDIA <dfatade@nvidia.com>

Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

Signed-off-by: DFatadeNVIDIA <dfatade@nvidia.com>

coderabbitai · 2025-09-10T16:25:53Z

Walkthrough

Updates devcontainer image tag, parameterizes TRT-LLM default endpoints by an environment-driven namespace, and adds a Minikube deployment README for Dynamo Cloud with vLLM.

Changes

Cohort / File(s)	Summary of edits
Devcontainer config `\.devcontainer/devcontainer.json`	Changed `image` from `"dynamo:latest-vllm-dev"` to `"dynamo:latest-vllm-local-dev"`.
TRT-LLM utils namespace endpoints `components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py`	Added `os` import and `DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")`. Updated `DEFAULT_ENDPOINT`, `DEFAULT_NEXT_ENDPOINT`, and `DEFAULT_ENCODE_ENDPOINT` to use `f"dyn://{DYN_NAMESPACE}...."` instead of hard-coded `"dyn://dynamo...."`.
Minikube deployment docs `examples/deployments/minikube/README.md`	Added new README detailing end-to-end Minikube/Kubernetes deployment steps for Dynamo Cloud with vLLM, including prerequisites, secrets, Helm installs, graph deployment, ingress, and cleanup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as Client App
  participant Env as Environment
  participant Utils as trtllm_utils.py
  participant Svc as TRT-LLM Services

  User->>App: Trigger generation
  App->>Env: Read DYN_NAMESPACE
  Env-->>App: "dynamo" (default) or custom
  App->>Utils: Request default endpoints
  Utils-->>App: dyn://{DYN_NAMESPACE}.tensorrt_llm[...]
  App->>Svc: Call generate/next/encode via dyn://{ns}...
  Svc-->>App: Response

  note over Utils,Svc: Endpoints now derived from DYN_NAMESPACE (env)

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

feat: dyn namespace scoping for trtllm #2970 — Similar change: introduces DYN_NAMESPACE and replaces hard-coded dyn://dynamo... endpoints in trtllm_utils.py.
fix: devcontainer.json typo from b6b3a767c #2976 — Adjusts .devcontainer/devcontainer.json image property to a different vLLM-tag variant; overlaps in the same config.
build: default to using dev instead of local-dev for vllm build #2837 — Also modifies the devcontainer image tag in .devcontainer/devcontainer.json, closely related configuration change.

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)

Check name	Status	Explanation	Resolution
Title Check	⚠️ Warning	The current title focuses solely on the documentation aspect, omitting the devcontainer image tag update and the introduction of DYN_NAMESPACE in the TRTLLM utilities, so it does not fully reflect all significant changes in this pull request.	Please revise the title to concisely summarize all key changes, for example including the Minikube example addition, the devcontainer image tag update, and the DYN_NAMESPACE support in the TRTLLM utilities.
Description Check	⚠️ Warning	The description covers the documentation changes and reviewer guidance but omits mention of the devcontainer configuration update and the new environment-driven endpoints in the TRTLLM utilities, and it also lacks the required Related Issues section from the repository template.	Please update the description to include details about the .devcontainer image tag change and the DYN_NAMESPACE implementation in trtllm_utils.py, and add a Related Issues section according to the repository template.

✅ Passed checks (1 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changes. Docstring coverage check skipped.

Poem

In a cluster of clouds where pods softly hum,
I twitch my ears—namespaces drum.
devcontainer brews a local blend,
Endpoints follow the names we send.
Minikube meadows, ingress skies—
With vLLM winds, the tokens rise.
Hop, deploy, request—surprise!

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)

components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (1)

112-121: Enforce or extend parse_endpoint to handle dotted namespaces

Currently parse_endpoint splits on '.', requiring exactly three parts; any dots within the namespace (e.g. ns.foo.bar.comp) will trigger a misleading ValueError.

Either validate upstream that namespaces contain no dots or update parse_endpoint to treat all but the last two segments as the namespace.

🧹 Nitpick comments (7)

examples/deployments/minikube/README.md (7)
11-13: Tighten wording (“built with”).

“Deploying an inference graph built in vLLM Dynamo Runtime” → “built with the vLLM Dynamo Runtime.”
- - Deploying an inference graph built in vLLM Dynamo Runtime
+ - Deploying an inference graph built with the vLLM Dynamo Runtime
66-79: Tighten device plugin guidance and prefer GPU Operator for robustness.

Recommend either:

Enable NVIDIA GPU Operator (preferred) OR

Manually deploy device plugin, but link to its supported CUDA/driver matrix.

Also show a selector-based check (pods and DaemonSet) to reduce confusion.
- kubectl get pods -n kube-system
+ kubectl get daemonset -n kube-system nvidia-device-plugin-daemonset
+ kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds
86-98: Typo: “addon”, not “add on”.

Minor wording tweak on verifying ingress readiness.
- # enable ingress add on
+ # enable ingress addon
270-286: Add --wait --atomic to platform install for consistency.

Ensures the controller, NATS, and etcd are ready before proceeding.
-helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
+helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
   --set "dynamo-operator.imagePullSecrets[0].name=nvcrimagepullsecret" \
-  --namespace ${NAMESPACE}
+  --namespace ${NAMESPACE} \
+  --wait \
+  --atomic
311-318: Clarify which fields to edit in the example manifest.

Spell the path precisely and show a concrete snippet for extraPodSpec.mainContainer.image and dynamoNamespace to reduce mis-edits.
- update the `extraPodSpec.mainContainer.image` path ... configure the `dynamoNamespace` field
+ update:
+ - `spec.VllmDecodeWorker.extraPodSpec.mainContainer.image: ${DYNAMO_IMAGE}`
+ - `spec.dynamoNamespace: ${NAMESPACE}`
350-371: Ingress: set an explicit backend path and tighten YAML.

Good as-is; optionally add ingressClassName, and confirm service port matches container. Also consider adding nginx.ingress.kubernetes.io/proxy-body-size: "0" if large payloads are expected.
 metadata:
   name: vllm-agg-router-ingress
   namespace: $NAMESPACE
+  annotations:
+    nginx.ingress.kubernetes.io/proxy-body-size: "0"
 spec:
   ingressClassName: nginx
380-383: Hosts entry: small grammar fix.

“along with it's address” → “along with its address”.
-Once the ingress resource has been created, make sure to add the entry along with it's address
+Once the ingress resource has been created, add the entry along with its address

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 905c920 and e1b5d25.

📒 Files selected for processing (3)

.devcontainer/devcontainer.json (1 hunks)
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (2 hunks)
examples/deployments/minikube/README.md (1 hunks)

🧰 Additional context used

🧠 Learnings (5)

📓 Common learnings

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

.devcontainer/devcontainer.json

📚 Learning: 2025-08-30T20:43:10.091Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

.devcontainer/devcontainer.json

📚 Learning: 2025-08-30T20:43:49.632Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

.devcontainer/devcontainer.json

📚 Learning: 2025-09-03T01:10:12.599Z

Learnt from: keivenchang
PR: ai-dynamo/dynamo#2822
File: container/Dockerfile.vllm:343-352
Timestamp: 2025-09-03T01:10:12.599Z
Learning: In the dynamo project's local-dev Docker targets, USER_UID and USER_GID build args are intentionally left without default values to force explicit UID/GID mapping during build time, preventing file permission issues in local development environments where container users need to match host user permissions for mounted volumes.

Applied to files:

.devcontainer/devcontainer.json

🔇 Additional comments (3)

.devcontainer/devcontainer.json (1)
10-10: ```shell
#!/bin/bash
echo "Searching for exact dev image tag across the repository"
rg -n 'latest-vllm-local-dev' -C2 .

echo -e "\nSearching for any 'dynamo:' image references"
rg -n 'dynamo:' -C2 .

echo -e "\nSearching for VLLM mentions to locate related build targets"
rg -n 'vllm' -C2 .

echo -e "\nChecking Earthfile aliases and targets for image builds"
rg -n 'alias' -C2 Earthfile*

echo -e "\nLooking for any Dockerfile or Containerfile in .devcontainer"
find .devcontainer -type f ( -iname 'dockerfile*' -o -iname 'containerfile*' )
</blockquote></details>
<details>
<summary>examples/deployments/minikube/README.md (2)</summary><blockquote>

`54-60`: **Caution: unmounting /proc/driver/nvidia can be risky.**

Provide context for when this is necessary and a rollback step. Many setups work with the GPU Operator or device plugin without manual unmounting.

Consider adding a guarded path:

```bash
# Only if kubelet/device-plugin complains about preinstalled drivers conflicting
ssh -o "StrictHostKeyChecking no" -i "$(minikube ssh-key)" docker@"$(minikube ip)" \
  "mount | grep -q '/proc/driver/nvidia' && sudo umount -R /proc/driver/nvidia || true"
421-426: Add note recommending manual CRD cleanup
Helm won’t delete cluster-scoped CRDs on uninstall; after helm uninstall dynamo-crds, run:
kubectl get crd | grep dynamo
kubectl delete crd $(kubectl get crd | grep dynamo | awk '{print $1}')
to remove any leftover CRDs.

coderabbitai · 2025-09-10T16:25:56Z

components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py

+DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")
+
 # Default endpoint for the next worker.
-DEFAULT_ENDPOINT = "dyn://dynamo.tensorrt_llm.generate"
+DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm.generate"
 DEFAULT_MODEL_PATH = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
-DEFAULT_NEXT_ENDPOINT = "dyn://dynamo.tensorrt_llm_next.generate"
-DEFAULT_ENCODE_ENDPOINT = "dyn://dynamo.tensorrt_llm_encode.generate"
+DEFAULT_NEXT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_next.generate"
+DEFAULT_ENCODE_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_encode.generate"


🛠️ Refactor suggestion

Harden env handling and namespace validity for endpoints.

If DYN_NAMESPACE is set but empty, defaults won’t kick in, producing invalid endpoints like dyn://.tensorrt_llm.generate. Also prevent dots/whitespace in the namespace, since parse_endpoint expects exactly three dot-separated parts.

Apply:

- DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo") + # Allow override but fall back when empty; restrict to safe characters + DYN_NAMESPACE = (os.environ.get("DYN_NAMESPACE") or "dynamo").strip() + if not DYN_NAMESPACE or any(c in DYN_NAMESPACE for c in " .\t\n"): + raise ValueError( + f"Invalid DYN_NAMESPACE='{DYN_NAMESPACE}'. " + "Use a DNS-like token without spaces or dots." + )

Optional: compute defaults at config parse time to reflect the env at process start, and make unit-testing easier:

- DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm.generate" - DEFAULT_NEXT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_next.generate" - DEFAULT_ENCODE_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_encode.generate" + def _default_endpoints(ns: str): + return ( + f"dyn://{ns}.tensorrt_llm.generate", + f"dyn://{ns}.tensorrt_llm_next.generate", + f"dyn://{ns}.tensorrt_llm_encode.generate", + ) + DEFAULT_ENDPOINT, DEFAULT_NEXT_ENDPOINT, DEFAULT_ENCODE_ENDPOINT = _default_endpoints(DYN_NAMESPACE)

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")

# Default endpoint for the next worker.

DEFAULT_ENDPOINT = "dyn://dynamo.tensorrt_llm.generate"

DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm.generate"

DEFAULT_MODEL_PATH = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

DEFAULT_NEXT_ENDPOINT = "dyn://dynamo.tensorrt_llm_next.generate"

DEFAULT_ENCODE_ENDPOINT = "dyn://dynamo.tensorrt_llm_encode.generate"

DEFAULT_NEXT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_next.generate"

DEFAULT_ENCODE_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_encode.generate"

# Allow override but fall back when empty; restrict to safe characters

DYN_NAMESPACE = (os.environ.get("DYN_NAMESPACE") or "dynamo").strip()

if not DYN_NAMESPACE or any(c in DYN_NAMESPACE for c in " .\t\n"):

raise ValueError(

f"Invalid DYN_NAMESPACE='{DYN_NAMESPACE}'. "

"Use a DNS-like token without spaces or dots."

)

# Default endpoint for the next worker.

DEFAULT_MODEL_PATH = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"

def _default_endpoints(ns: str):

return (

f"dyn://{ns}.tensorrt_llm.generate",

f"dyn://{ns}.tensorrt_llm_next.generate",

f"dyn://{ns}.tensorrt_llm_encode.generate",

)

DEFAULT_ENDPOINT, DEFAULT_NEXT_ENDPOINT, DEFAULT_ENCODE_ENDPOINT = _default_endpoints(DYN_NAMESPACE)

coderabbitai · 2025-09-10T16:25:56Z