Skip to content

Conversation

@DFatadeNVIDIA
Copy link

@DFatadeNVIDIA DFatadeNVIDIA commented Sep 10, 2025

Overview:

This document covers the process of deploying Dynamo Cloud and running inference in a vLLM distributed runtime within a Kubernetes environment. The Dynamo Cloud Platform provides a managed deployment experience.

Details:

  • Contains the infrastructure components required for the Dynamo Cloud platform.
  • Leverages the Dynamo Operator and its exposed CRDs to deploy Dynamo inference graphs.

This overview covers the setup process on a Minikube instance, including:

  • Deploying the Dynamo Operator and creating Dynamo CRDs.
  • Deploying an inference graph built in vLLM Dynamo Runtime.
  • Setting up ingress and running inference.

Where should the reviewer start?

  • Review examples/deployments/minikube/README.md to review instructions for setting up Dynamo Cloud platform and deploying a vLLM inference graph.

Summary by CodeRabbit

  • New Features
    • Backend endpoints now respect the DYN_NAMESPACE environment variable, enabling namespace-configurable deployments.
  • Documentation
    • Added a comprehensive Minikube/Kubernetes deployment guide, including prerequisites, setup, secrets configuration, Helm-based installation, exposing services via Ingress, verification, and cleanup steps.
  • Chores
    • Updated development container image tag to a local vLLM-focused variant for improved dev environment parity.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

DFatadeNVIDIA and others added 4 commits September 10, 2025 10:01
Signed-off-by: DFatadeNVIDIA <dfatade@nvidia.com>
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
Signed-off-by: DFatadeNVIDIA <dfatade@nvidia.com>
@DFatadeNVIDIA DFatadeNVIDIA force-pushed the dafatade/dynamo_cloud_minikube_example branch from 3eac656 to 4d3c838 Compare September 10, 2025 15:02
Signed-off-by: DFatadeNVIDIA <dfatade@nvidia.com>
Signed-off-by: DFatadeNVIDIA <dfatade@nvidia.com>
@DFatadeNVIDIA DFatadeNVIDIA marked this pull request as ready for review September 10, 2025 16:16
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Updates devcontainer image tag, parameterizes TRT-LLM default endpoints by an environment-driven namespace, and adds a Minikube deployment README for Dynamo Cloud with vLLM.

Changes

Cohort / File(s) Summary of edits
Devcontainer config
\.devcontainer/devcontainer.json
Changed image from "dynamo:latest-vllm-dev" to "dynamo:latest-vllm-local-dev".
TRT-LLM utils namespace endpoints
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py
Added os import and DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo"). Updated DEFAULT_ENDPOINT, DEFAULT_NEXT_ENDPOINT, and DEFAULT_ENCODE_ENDPOINT to use f"dyn://{DYN_NAMESPACE}...." instead of hard-coded "dyn://dynamo....".
Minikube deployment docs
examples/deployments/minikube/README.md
Added new README detailing end-to-end Minikube/Kubernetes deployment steps for Dynamo Cloud with vLLM, including prerequisites, secrets, Helm installs, graph deployment, ingress, and cleanup.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor User
  participant App as Client App
  participant Env as Environment
  participant Utils as trtllm_utils.py
  participant Svc as TRT-LLM Services

  User->>App: Trigger generation
  App->>Env: Read DYN_NAMESPACE
  Env-->>App: "dynamo" (default) or custom
  App->>Utils: Request default endpoints
  Utils-->>App: dyn://{DYN_NAMESPACE}.tensorrt_llm[...]
  App->>Svc: Call generate/next/encode via dyn://{ns}...
  Svc-->>App: Response

  note over Utils,Svc: Endpoints now derived from DYN_NAMESPACE (env)
Loading

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related PRs

Pre-merge checks (1 passed, 2 warnings)

❌ Failed checks (2 warnings)
Check name Status Explanation Resolution
Title Check ⚠️ Warning The current title focuses solely on the documentation aspect, omitting the devcontainer image tag update and the introduction of DYN_NAMESPACE in the TRTLLM utilities, so it does not fully reflect all significant changes in this pull request. Please revise the title to concisely summarize all key changes, for example including the Minikube example addition, the devcontainer image tag update, and the DYN_NAMESPACE support in the TRTLLM utilities.
Description Check ⚠️ Warning The description covers the documentation changes and reviewer guidance but omits mention of the devcontainer configuration update and the new environment-driven endpoints in the TRTLLM utilities, and it also lacks the required Related Issues section from the repository template. Please update the description to include details about the .devcontainer image tag change and the DYN_NAMESPACE implementation in trtllm_utils.py, and add a Related Issues section according to the repository template.
✅ Passed checks (1 passed)
Check name Status Explanation
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

In a cluster of clouds where pods softly hum,
I twitch my ears—namespaces drum.
devcontainer brews a local blend,
Endpoints follow the names we send.
Minikube meadows, ingress skies—
With vLLM winds, the tokens rise.
Hop, deploy, request—surprise!

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 7

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (1)

112-121: Enforce or extend parse_endpoint to handle dotted namespaces

  • Currently parse_endpoint splits on '.', requiring exactly three parts; any dots within the namespace (e.g. ns.foo.bar.comp) will trigger a misleading ValueError.
  • Either validate upstream that namespaces contain no dots or update parse_endpoint to treat all but the last two segments as the namespace.
🧹 Nitpick comments (7)
examples/deployments/minikube/README.md (7)

11-13: Tighten wording (“built with”).

“Deploying an inference graph built in vLLM Dynamo Runtime” → “built with the vLLM Dynamo Runtime.”

- - Deploying an inference graph built in vLLM Dynamo Runtime
+ - Deploying an inference graph built with the vLLM Dynamo Runtime

66-79: Tighten device plugin guidance and prefer GPU Operator for robustness.

Recommend either:

  • Enable NVIDIA GPU Operator (preferred) OR
  • Manually deploy device plugin, but link to its supported CUDA/driver matrix.

Also show a selector-based check (pods and DaemonSet) to reduce confusion.

- kubectl get pods -n kube-system
+ kubectl get daemonset -n kube-system nvidia-device-plugin-daemonset
+ kubectl get pods -n kube-system -l name=nvidia-device-plugin-ds

86-98: Typo: “addon”, not “add on”.

Minor wording tweak on verifying ingress readiness.

- # enable ingress add on
+ # enable ingress addon

270-286: Add --wait --atomic to platform install for consistency.

Ensures the controller, NATS, and etcd are ready before proceeding.

-helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
+helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
   --set "dynamo-operator.imagePullSecrets[0].name=nvcrimagepullsecret" \
-  --namespace ${NAMESPACE}
+  --namespace ${NAMESPACE} \
+  --wait \
+  --atomic

311-318: Clarify which fields to edit in the example manifest.

Spell the path precisely and show a concrete snippet for extraPodSpec.mainContainer.image and dynamoNamespace to reduce mis-edits.

- update the `extraPodSpec.mainContainer.image` path ... configure the `dynamoNamespace` field
+ update:
+ - `spec.VllmDecodeWorker.extraPodSpec.mainContainer.image: ${DYNAMO_IMAGE}`
+ - `spec.dynamoNamespace: ${NAMESPACE}`

350-371: Ingress: set an explicit backend path and tighten YAML.

Good as-is; optionally add ingressClassName, and confirm service port matches container. Also consider adding nginx.ingress.kubernetes.io/proxy-body-size: "0" if large payloads are expected.

 metadata:
   name: vllm-agg-router-ingress
   namespace: $NAMESPACE
+  annotations:
+    nginx.ingress.kubernetes.io/proxy-body-size: "0"
 spec:
   ingressClassName: nginx

380-383: Hosts entry: small grammar fix.

“along with it's address” → “along with its address”.

-Once the ingress resource has been created, make sure to add the entry along with it's address
+Once the ingress resource has been created, add the entry along with its address
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 905c920 and e1b5d25.

📒 Files selected for processing (3)
  • .devcontainer/devcontainer.json (1 hunks)
  • components/backends/trtllm/src/dynamo/trtllm/utils/trtllm_utils.py (2 hunks)
  • examples/deployments/minikube/README.md (1 hunks)
🧰 Additional context used
🧠 Learnings (5)
📓 Common learnings
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • .devcontainer/devcontainer.json
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • .devcontainer/devcontainer.json
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • .devcontainer/devcontainer.json
📚 Learning: 2025-09-03T01:10:12.599Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2822
File: container/Dockerfile.vllm:343-352
Timestamp: 2025-09-03T01:10:12.599Z
Learning: In the dynamo project's local-dev Docker targets, USER_UID and USER_GID build args are intentionally left without default values to force explicit UID/GID mapping during build time, preventing file permission issues in local development environments where container users need to match host user permissions for mounted volumes.

Applied to files:

  • .devcontainer/devcontainer.json
🔇 Additional comments (3)
.devcontainer/devcontainer.json (1)

10-10: ```shell
#!/bin/bash
echo "Searching for exact dev image tag across the repository"
rg -n 'latest-vllm-local-dev' -C2 .

echo -e "\nSearching for any 'dynamo:' image references"
rg -n 'dynamo:' -C2 .

echo -e "\nSearching for VLLM mentions to locate related build targets"
rg -n 'vllm' -C2 .

echo -e "\nChecking Earthfile aliases and targets for image builds"
rg -n 'alias' -C2 Earthfile*

echo -e "\nLooking for any Dockerfile or Containerfile in .devcontainer"
find .devcontainer -type f ( -iname 'dockerfile*' -o -iname 'containerfile*' )


</blockquote></details>
<details>
<summary>examples/deployments/minikube/README.md (2)</summary><blockquote>

`54-60`: **Caution: unmounting /proc/driver/nvidia can be risky.**

Provide context for when this is necessary and a rollback step. Many setups work with the GPU Operator or device plugin without manual unmounting.

Consider adding a guarded path:

```bash
# Only if kubelet/device-plugin complains about preinstalled drivers conflicting
ssh -o "StrictHostKeyChecking no" -i "$(minikube ssh-key)" docker@"$(minikube ip)" \
  "mount | grep -q '/proc/driver/nvidia' && sudo umount -R /proc/driver/nvidia || true"

421-426: Add note recommending manual CRD cleanup
Helm won’t delete cluster-scoped CRDs on uninstall; after helm uninstall dynamo-crds, run:

kubectl get crd | grep dynamo
kubectl delete crd $(kubectl get crd | grep dynamo | awk '{print $1}')

to remove any leftover CRDs.

Comment on lines +17 to +23
DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")

# Default endpoint for the next worker.
DEFAULT_ENDPOINT = "dyn://dynamo.tensorrt_llm.generate"
DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm.generate"
DEFAULT_MODEL_PATH = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
DEFAULT_NEXT_ENDPOINT = "dyn://dynamo.tensorrt_llm_next.generate"
DEFAULT_ENCODE_ENDPOINT = "dyn://dynamo.tensorrt_llm_encode.generate"
DEFAULT_NEXT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_next.generate"
DEFAULT_ENCODE_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_encode.generate"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Harden env handling and namespace validity for endpoints.

If DYN_NAMESPACE is set but empty, defaults won’t kick in, producing invalid endpoints like dyn://.tensorrt_llm.generate. Also prevent dots/whitespace in the namespace, since parse_endpoint expects exactly three dot-separated parts.

Apply:

- DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")
+ # Allow override but fall back when empty; restrict to safe characters
+ DYN_NAMESPACE = (os.environ.get("DYN_NAMESPACE") or "dynamo").strip()
+ if not DYN_NAMESPACE or any(c in DYN_NAMESPACE for c in " .\t\n"):
+     raise ValueError(
+         f"Invalid DYN_NAMESPACE='{DYN_NAMESPACE}'. "
+         "Use a DNS-like token without spaces or dots."
+     )

Optional: compute defaults at config parse time to reflect the env at process start, and make unit-testing easier:

- DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm.generate"
- DEFAULT_NEXT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_next.generate"
- DEFAULT_ENCODE_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_encode.generate"
+ def _default_endpoints(ns: str):
+     return (
+         f"dyn://{ns}.tensorrt_llm.generate",
+         f"dyn://{ns}.tensorrt_llm_next.generate",
+         f"dyn://{ns}.tensorrt_llm_encode.generate",
+     )
+ DEFAULT_ENDPOINT, DEFAULT_NEXT_ENDPOINT, DEFAULT_ENCODE_ENDPOINT = _default_endpoints(DYN_NAMESPACE)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
DYN_NAMESPACE = os.environ.get("DYN_NAMESPACE", "dynamo")
# Default endpoint for the next worker.
DEFAULT_ENDPOINT = "dyn://dynamo.tensorrt_llm.generate"
DEFAULT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm.generate"
DEFAULT_MODEL_PATH = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
DEFAULT_NEXT_ENDPOINT = "dyn://dynamo.tensorrt_llm_next.generate"
DEFAULT_ENCODE_ENDPOINT = "dyn://dynamo.tensorrt_llm_encode.generate"
DEFAULT_NEXT_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_next.generate"
DEFAULT_ENCODE_ENDPOINT = f"dyn://{DYN_NAMESPACE}.tensorrt_llm_encode.generate"
# Allow override but fall back when empty; restrict to safe characters
DYN_NAMESPACE = (os.environ.get("DYN_NAMESPACE") or "dynamo").strip()
if not DYN_NAMESPACE or any(c in DYN_NAMESPACE for c in " .\t\n"):
raise ValueError(
f"Invalid DYN_NAMESPACE='{DYN_NAMESPACE}'. "
"Use a DNS-like token without spaces or dots."
)
# Default endpoint for the next worker.
DEFAULT_MODEL_PATH = "TinyLlama/TinyLlama-1.1B-Chat-v1.0"
def _default_endpoints(ns: str):
return (
f"dyn://{ns}.tensorrt_llm.generate",
f"dyn://{ns}.tensorrt_llm_next.generate",
f"dyn://{ns}.tensorrt_llm_encode.generate",
)
DEFAULT_ENDPOINT, DEFAULT_NEXT_ENDPOINT, DEFAULT_ENCODE_ENDPOINT = _default_endpoints(DYN_NAMESPACE)

Comment on lines +5 to +8
- Contains the infrastructure components required for the Dynamo cloud platform
- Leverage the Dynamo Operator and it's exposed CRD's to deploy Dynamo inference graphs
- Provides a managed deployment experience

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix grammar: “its” (possessive) and plural “CRDs”.

  • “Leverage the Dynamo Operator and it's exposed CRD's” → “Leverage the Dynamo Operator and its exposed CRDs”.
  • Consider removing the redundant “Provides a managed deployment experience.” bullet or merge it into the intro sentence.
- - Leverage the Dynamo Operator and it's exposed CRD's to deploy Dynamo inference graphs
- - Provides a managed deployment experience
+ - Leverage the Dynamo Operator and its exposed CRDs to deploy Dynamo inference graphs
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
- Contains the infrastructure components required for the Dynamo cloud platform
- Leverage the Dynamo Operator and it's exposed CRD's to deploy Dynamo inference graphs
- Provides a managed deployment experience
- Contains the infrastructure components required for the Dynamo cloud platform
- Leverage the Dynamo Operator and its exposed CRDs to deploy Dynamo inference graphs
🤖 Prompt for AI Agents
In examples/deployments/minikube/README.md around lines 5 to 8, fix grammar:
change "it's" to the possessive "its" and change "CRD's" to plural "CRDs" (so
the line reads "Leverage the Dynamo Operator and its exposed CRDs"). Also remove
or merge the redundant bullet "Provides a managed deployment experience." into
the introductory sentence to avoid repetition.

Comment on lines +104 to +122
Dynamo Cloud requires Istio for service mesh capabilities. You can set up Istio via the minikube addons feature. Install Istio and verify pods are running:

```bash
# Enable required addons
minikube addons enable istio-provisioner
minikube addons enable istio

# verify pods are running
kubectl get pods -n istio-operator
kubectl get pods -n istio-system

# Output should be similar
NAME READY STATUS RESTARTS AGE
istio-operator-b88fb5f65-9tj8d 1/1 Running 0 34s

NAME READY STATUS RESTARTS AGE
istio-ingressgateway-64887df48f-98l2n 1/1 Running 0 19s
istiod-65c5bcc875-ktcnc 1/1 Running 0 26s
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Istio: add namespace label for sidecar injection.

Most Istio installs require istio-injection=enabled on the target namespace so Dynamo pods get sidecars.

 # Enable required addons
 minikube addons enable istio-provisioner
 minikube addons enable istio
+ 
+# Enable automatic sidecar injection in the Dynamo namespace
+kubectl label namespace ${NAMESPACE} istio-injection=enabled --overwrite
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Dynamo Cloud requires Istio for service mesh capabilities. You can set up Istio via the minikube addons feature. Install Istio and verify pods are running:
```bash
# Enable required addons
minikube addons enable istio-provisioner
minikube addons enable istio
# verify pods are running
kubectl get pods -n istio-operator
kubectl get pods -n istio-system
# Output should be similar
NAME READY STATUS RESTARTS AGE
istio-operator-b88fb5f65-9tj8d 1/1 Running 0 34s
NAME READY STATUS RESTARTS AGE
istio-ingressgateway-64887df48f-98l2n 1/1 Running 0 19s
istiod-65c5bcc875-ktcnc 1/1 Running 0 26s
```
# Enable required addons
minikube addons enable istio-provisioner
minikube addons enable istio
# Enable automatic sidecar injection in the Dynamo namespace
kubectl label namespace ${NAMESPACE} istio-injection=enabled --overwrite
# verify pods are running
kubectl get pods -n istio-operator
kubectl get pods -n istio-system
# Output should be similar
NAME READY STATUS RESTARTS AGE
istio-operator-b88fb5f65-9tj8d 1/1 Running 0 34s
NAME READY STATUS RESTARTS AGE
istio-ingressgateway-64887df48f-98l2n 1/1 Running 0 19s
istiod-65c5bcc875-ktcnc 1/1 Running 0 26s
🤖 Prompt for AI Agents
In examples/deployments/minikube/README.md around lines 104 to 122, the Istio
setup section omits the required namespace label for automatic sidecar
injection; update the docs to instruct users to label the Kubernetes namespace
where Dynamo pods will run (for example "default" or your Dynamo namespace) with
istio-injection=enabled using kubectl label namespace <namespace>
istio-injection=enabled --overwrite and mention verifying the label (kubectl get
namespace -L istio-injection) so Dynamo pods receive Istio sidecars.

Comment on lines +151 to +167
### Leveraging Dynamo Container Runtimes In Dynamo Cloud

Dynamo, is a high-throughput, low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo specializes in taking a given runtime (TRT-LLM, vLLM, SGLang, etc) and creating a highly scalable distributed runtime. It's important to verify which Dynamo runtime you'll want to leverage in your deployments, as each runtime will have slightly different implementations for optimizing a given workload with Dynamo.

To simplify this tutorial, we'll leverage prebuilt Dynamo containers targeting a vLLM runtime - these containers are available today as published artifacts in NGC catalog. Dynamo also supports buidling container runtimes from source and uploading them to a private registry. For tutorials on building images from source, please reference this [documentation](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/vllm#pull-or-build-container). We'll make sure the container image we use is tied to the same release version as the Dynamo Cloud helm charts that will be in the next section:

```bash
# set release version
export RELEASE_VERSION=0.4.1

# configure dynamo image and corresponding tag
export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:${RELEASE_VERSION}

# print dynamo image
echo ${DYNAMO_IMAGE}
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Typos and clarity in runtime section.

  • “Dynamo, is” → “Dynamo is”
  • “buidling” → “building”
  • Clarify that the image tag should match the chart release.
- Dynamo, is a high-throughput,
+ Dynamo is a high-throughput,
- Dynamo also supports buidling container runtimes
+ Dynamo also supports building container runtimes
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Leveraging Dynamo Container Runtimes In Dynamo Cloud
Dynamo, is a high-throughput, low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo specializes in taking a given runtime (TRT-LLM, vLLM, SGLang, etc) and creating a highly scalable distributed runtime. It's important to verify which Dynamo runtime you'll want to leverage in your deployments, as each runtime will have slightly different implementations for optimizing a given workload with Dynamo.
To simplify this tutorial, we'll leverage prebuilt Dynamo containers targeting a vLLM runtime - these containers are available today as published artifacts in NGC catalog. Dynamo also supports buidling container runtimes from source and uploading them to a private registry. For tutorials on building images from source, please reference this [documentation](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/vllm#pull-or-build-container). We'll make sure the container image we use is tied to the same release version as the Dynamo Cloud helm charts that will be in the next section:
```bash
# set release version
export RELEASE_VERSION=0.4.1
# configure dynamo image and corresponding tag
export DYNAMO_IMAGE=nvcr.io/nvidia/ai-dynamo/vllm-runtime:${RELEASE_VERSION}
# print dynamo image
echo ${DYNAMO_IMAGE}
```
### Leveraging Dynamo Container Runtimes In Dynamo Cloud
Dynamo is a high-throughput, low-latency inference framework designed for serving generative AI and reasoning models in multi-node distributed environments. Dynamo specializes in taking a given runtime (TRT-LLM, vLLM, SGLang, etc) and creating a highly scalable distributed runtime. It's important to verify which Dynamo runtime you'll want to leverage in your deployments, as each runtime will have slightly different implementations for optimizing a given workload with Dynamo.
To simplify this tutorial, we'll leverage prebuilt Dynamo containers targeting a vLLM runtime - these containers are available today as published artifacts in NGC catalog. Dynamo also supports building container runtimes from source and uploading them to a private registry. For tutorials on building images from source, please reference this [documentation](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/vllm#pull-or-build-container). We'll make sure the container image we use is tied to the same release version as the Dynamo Cloud helm charts that will be in the next section:
🤖 Prompt for AI Agents
In examples/deployments/minikube/README.md around lines 151 to 167, fix typos
and improve clarity in the runtime description: remove the stray comma in
"Dynamo, is" -> "Dynamo is", correct "buidling" -> "building", and rephrase the
image/tag sentence to explicitly state that the DYNAMO_IMAGE tag
(RELEASE_VERSION) must match the Dynamo Cloud Helm chart release version; ensure
the examples and comments consistently reference the same release variable for
clarity.

Comment on lines +221 to +234
```bash
# Fetch the CRDs helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-0.4.0.tgz

# Fetch the platform helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz

# verify charts were fetched
ls -l *.tgz

# output should be similar
-rw-r--r-- 1 ubuntu ubuntu 13342 Jul 31 20:13 dynamo-crds-0.4.0.tgz
-rw-r--r-- 1 ubuntu ubuntu 89682 Jul 31 20:13 dynamo-platform-0.4.1.tgz
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Version mismatch: CRDs fetched at 0.4.0 but RELEASE_VERSION=0.4.1 is used later.

Either fetch CRDs at ${RELEASE_VERSION} or introduce CRDS_VERSION. The current commands will fail at install time.

-# Fetch the CRDs helm chart
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-0.4.0.tgz
+# Fetch the CRDs helm chart
+# If CRDs lag platform release, pin separately
+export CRDS_VERSION=0.4.0
+helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${CRDS_VERSION}.tgz
@@
-# verify charts were fetched
+# verify charts were fetched
 ls -l *.tgz
@@
--rw-r--r-- 1 ubuntu ubuntu 13342 Jul 31 20:13 dynamo-crds-0.4.0.tgz
+-rw-r--r-- 1 ubuntu ubuntu 13342 Jul 31 20:13 dynamo-crds-${CRDS_VERSION}.tgz
 -rw-r--r-- 1 ubuntu ubuntu 89682 Jul 31 20:13 dynamo-platform-0.4.1.tgz
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
```bash
# Fetch the CRDs helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-0.4.0.tgz
# Fetch the platform helm chart
helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
# verify charts were fetched
ls -l *.tgz
# output should be similar
-rw-r--r-- 1 ubuntu ubuntu 13342 Jul 31 20:13 dynamo-crds-0.4.0.tgz
-rw-r--r-- 1 ubuntu ubuntu 89682 Jul 31 20:13 dynamo-platform-0.4.1.tgz
```
🤖 Prompt for AI Agents
In examples/deployments/minikube/README.md around lines 221 to 234, the CRDs
helm chart is hardcoded to 0.4.0 while the platform chart uses
${RELEASE_VERSION} (0.4.1), causing a version mismatch; update the CRD fetch to
use the same ${RELEASE_VERSION} variable (helm fetch
.../dynamo-crds-${RELEASE_VERSION}.tgz) or introduce and document a separate
CRDS_VERSION variable and use it consistently for the CRD fetch so both charts
align at install time.

Comment on lines +243 to +248
# install dynamo crd's chart in default namespace (CRD's exposed from this chart aren't namespace scoped)
helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \
--namespace default \
--wait \
--atomic
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Install command must match the fetched CRDs file.

Currently uses ${RELEASE_VERSION} but the file on disk is dynamo-crds-0.4.0.tgz.

-helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \
+helm install dynamo-crds dynamo-crds-${CRDS_VERSION}.tgz \
   --namespace default \
   --wait \
   --atomic
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
# install dynamo crd's chart in default namespace (CRD's exposed from this chart aren't namespace scoped)
helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz \
--namespace default \
--wait \
--atomic
```
# install dynamo crd's chart in default namespace (CRD's exposed from this chart aren't namespace scoped)
helm install dynamo-crds dynamo-crds-${CRDS_VERSION}.tgz \
--namespace default \
--wait \
--atomic
🤖 Prompt for AI Agents
In examples/deployments/minikube/README.md around lines 243 to 248, the helm
install command references dynamo-crds-${RELEASE_VERSION}.tgz but the actual
file on disk is dynamo-crds-0.4.0.tgz; update the README so the install command
matches the fetched file by either replacing ${RELEASE_VERSION} with 0.4.0
(i.e., use dynamo-crds-0.4.0.tgz) or add a prior line to export/define
RELEASE_VERSION=0.4.0 so the variable expands correctly.

Comment on lines +387 to +397
curl http://dynamo-vllm-agg-router.test/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 100
}'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix typos in the sample prompt JSON.

“ests” → “texts” (or remove), “familt” → “family”. Keep example concise to avoid 413s through ingress.

-        "content": "... hinting at ests that Aeloria holds ... lost familt clue is hidden."
+        "content": "... hinting that Aeloria holds ... lost family clue is hidden."
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
curl http://dynamo-vllm-agg-router.test/v1/chat/completions -H "Content-Type: application/json" -d '{
"model": "Qwen/Qwen3-0.6B",
"messages": [
{
"role": "user",
"content": "In the heart of Eldoria, an ancient land of boundless magic and mysterious creatures, lies the long-forgotten city of Aeloria. Once a beacon of knowledge and power, Aeloria was buried beneath the shifting sands of time, lost to the world for centuries. You are an intrepid explorer, known for your unparalleled curiosity and courage, who has stumbled upon an ancient map hinting at ests that Aeloria holds a secret so profound that it has the potential to reshape the very fabric of reality. Your journey will take you through treacherous deserts, enchanted forests, and across perilous mountain ranges. Your Task: Character Background: Develop a detailed background for your character. Describe their motivations for seeking out Aeloria, their skills and weaknesses, and any personal connections to the ancient city or its legends. Are they driven by a quest for knowledge, a search for lost familt clue is hidden."
}
],
"stream":false,
"max_tokens": 100
}'
{
"role": "user",
"content": "... hinting that Aeloria holds ... lost family clue is hidden."
}
🤖 Prompt for AI Agents
In examples/deployments/minikube/README.md around lines 387 to 397, the sample
prompt JSON contains typos and is overly long; fix the typos ("ests" → "texts"
or remove, "familt" → "family") and shorten the user content to a concise
character-background prompt to avoid large payloads (keep it to a few sentences,
reduce descriptive prose, and keep max_tokens/stream as-is) so the curl example
is correct and won't trigger 413 errors.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants