Skip to content

Conversation

@DFatadeNVIDIA
Copy link

@DFatadeNVIDIA DFatadeNVIDIA commented Sep 10, 2025

Overview:

This document covers the process of deploying Dynamo Cloud and running inference in a vLLM distributed runtime within a Kubernetes environment. The Dynamo Cloud Platform provides a managed deployment experience.

Details:

  • Contains the infrastructure components required for the Dynamo Cloud platform.
  • Leverages the Dynamo Operator and its exposed CRDs to deploy Dynamo inference graphs.

This overview covers the setup process on a Minikube instance, including:

  • Deploying the Dynamo Operator and creating Dynamo CRDs.
  • Deploying an inference graph built in vLLM Dynamo Runtime.
  • Setting up ingress and running inference.

Where should the reviewer start?

  • Review examples/deployments/minikube/README.md to review instructions for setting up Dynamo Cloud platform and deploying a vLLM inference graph.

Summary by CodeRabbit

  • Documentation
    • Added a comprehensive guide for deploying the platform on Kubernetes with Minikube.
    • Covers prerequisites, optional GPU setup, Ingress/Istio provisioning, and storage verification.
    • Provides step-by-step Helm installation using environment variables and required secrets.
    • Details deploying inference graphs, selecting models, and validating resources/pods.
    • Explains exposing the frontend via Ingress, updating hosts, and testing with a sample request.
    • Includes cleanup instructions to remove deployments, ingress, and uninstall components.

Signed-off-by: DFatadeNVIDIA <dfatade@nvidia.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 10, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@DFatadeNVIDIA DFatadeNVIDIA marked this pull request as ready for review September 10, 2025 16:35
Copy link
Contributor

@athreesh athreesh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, request a double check on Istio/NGINX re: slack message

would be sick if you were able to create a Brev Launchable for this 👀

@julienmancuso @nealvaidya mind taking a look at this PR as well?

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 10, 2025

Walkthrough

Adds a new README detailing end-to-end steps to deploy Dynamo Cloud on Minikube via Helm: prerequisites, optional GPU setup, Ingress/Istio, CRD and platform chart installs from NGC, secrets creation, deploying a DynamoGraphDeployment, exposing via Ingress, sample request, and cleanup.

Changes

Cohort / File(s) Summary
Minikube deployment docs
examples/deployments/minikube/README.md
New README with step-by-step Minikube Kubernetes deployment instructions: prerequisites, optional GPU enablement, ingress/istio setup, Helm-based CRD/platform installs from NGC, secret creation, DynamoGraphDeployment application, ingress exposure, validation, and cleanup.

Sequence Diagram(s)

sequenceDiagram
    autonumber
    actor Dev as Developer
    participant MK as Minikube Cluster
    participant Helm as Helm (NGC repo)
    participant K8s as Kubernetes API
    participant Dyn as Dynamo Platform
    participant GW as Ingress/Istio GW
    participant Client as Client (curl)

    Dev->>MK: Start Minikube (+ optional GPU config)
    Dev->>K8s: Configure StorageClass, verify
    Dev->>K8s: Install Ingress / Istio components (optional)
    Dev->>Helm: Add/fetch CRD & platform charts (NGC)
    Dev->>K8s: Create secrets (NGC pull, HF token)
    Dev->>K8s: helm install dynamo-crds
    Dev->>K8s: helm install dynamo-platform
    Note over K8s,Dyn: Dynamo controllers/operators and services become Ready
    Dev->>K8s: kubectl apply DynamoGraphDeployment (model args)
    K8s->>Dyn: Reconcile graph deployment
    Dyn->>K8s: Create Pods/Services for inference
    Dev->>K8s: Apply Ingress for frontend
    K8s->>GW: Route external traffic
    Client->>GW: HTTP request (/v1/chat/completions)
    GW->>Dyn: Forward to inference service
    Dyn-->>Client: Response

    rect rgba(230,245,255,0.6)
    Note over Dev,K8s: Cleanup: delete graph, remove ingress, uninstall charts
    end
Loading

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related PRs

Pre-merge checks (2 passed, 1 warning)

❌ Failed checks (1 warning)
Check name Status Explanation Resolution
Description Check ⚠️ Warning The description includes the required Overview, Details, and Where should the reviewer start sections from the repository’s template but omits the mandatory Related Issues section with issue references and action keywords, so it does not fully adhere to the prescribed PR description structure. Please add a “Related Issues” section at the end of the description with appropriate GitHub issue references and an action keyword (e.g., “Closes #xxx”) to satisfy the repository’s PR template requirements.
✅ Passed checks (2 passed)
Check name Status Explanation
Title Check ✅ Passed The title clearly indicates that this PR adds documentation for deploying Dynamo Cloud version 0.4.1 on Kubernetes Minikube, directly reflecting the main change of introducing a Minikube deployment guide for the platform without including unnecessary detail or noise.
Docstring Coverage ✅ Passed No functions found in the changes. Docstring coverage check skipped.

Poem

A rabbit boots a tiny kube,
With helm charts lined in tidy lube.
CRDs hop in, pods align,
Ingress opens, calls divine.
Curl goes boop—response in flight,
Dynamo dreams through Minikube night. 🐇✨

Tip

👮 Agentic pre-merge checks are now available in preview!

Pro plan users can now enable pre-merge checks in their settings to enforce checklists before merging PRs.

  • Built-in checks – Quickly apply ready-made checks to enforce title conventions, require pull request descriptions that follow templates, validate linked issues for compliance, and more.
  • Custom agentic checks – Define your own rules using CodeRabbit’s advanced agentic capabilities to enforce organization-specific policies and workflows. For example, you can instruct CodeRabbit’s agent to verify that API documentation is updated whenever API schema files are modified in a PR. Note: Upto 5 custom checks are currently allowed during the preview period. Pricing for this feature will be announced in a few weeks.

Please see the documentation for more information.

Example:

reviews:
  pre_merge_checks:
    custom_checks:
      - name: "Undocumented Breaking Changes"
        mode: "warning"
        instructions: |
          Pass/fail criteria: All breaking changes to public APIs, CLI flags, environment variables, configuration keys, database schemas, or HTTP/GraphQL endpoints must be documented in the "Breaking Change" section of the PR description and in CHANGELOG.md. Exclude purely internal or private changes (e.g., code not exported from package entry points or explicitly marked as internal).

Please share your feedback with us on this Discord post.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (12)
examples/deployments/minikube/README.md (12)

5-8: Fix grammar and remove duplicated bullet.

Use “its” (not “it’s”), pluralize “CRDs” (no apostrophe), capitalize consistently, and drop the repeated “managed deployment” bullet.

- - Contains the infrastructure components required for the Dynamo cloud platform
- - Leverage the Dynamo Operator and it's exposed CRD's to deploy Dynamo inference graphs
- - Provides a managed deployment experience
+ - Contains the infrastructure components required for the Dynamo Cloud platform.
+ - Leverages the Dynamo Operator and its exposed CRDs to deploy Dynamo inference graphs.

19-24: Link to the “general prerequisites”.

Please add a concrete link to the canonical prerequisites doc so readers don’t guess.


55-60: Caution: unmounting /proc/driver/nvidia is risky and unexplained.

Explain why this is needed, when to use it, and how to revert. Otherwise remove to avoid breaking host GPU visibility.


151-166: Typos and clarity.

  • “buidling” → “building”.
  • Tighten wording about image/tag parity with charts.
-Dynamo also supports buidling container runtimes from source and uploading them to a private registry.
+Dynamo also supports building container runtimes from source and uploading them to a private registry.

258-266: Sample output should reflect chosen CRDS_VERSION.

Update the example to avoid confusion (shows 0.4.0 right now).


270-296: Add --wait/--atomic to platform install; consider imagePullSecrets for all subcharts.

Add wait flags; if nats/etcd images are private, document how to pass imagePullSecrets via values.

 helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz \
   --set "dynamo-operator.imagePullSecrets[0].name=nvcrimagepullsecret" \
-  --namespace ${NAMESPACE}
+  --namespace ${NAMESPACE} \
+  --wait --atomic

311-314: Typos and path clarity.

  • “args commmand” → “args command”.
  • Consider showing the exact YAML snippet to edit (extraPodSpec.mainContainer.args) to minimize user error.

350-371: Optional: add ingress annotations for larger bodies/timeouts.

If users send bigger prompts, consider adding NGINX annotations (proxy-body-size, proxy-read-timeout).


380-383: Grammar fix (“its” not “it’s”).

-Once the ingress resource has been created, make sure to add the entry along with it's address
+Once the ingress resource has been created, make sure to add the entry along with its address

387-397: Prefer a minimal curl example.

Long payload makes copy/paste unwieldy. Suggest a short prompt; also format JSON with stream: false (space for readability).

-curl http://dynamo-vllm-agg-router.test/v1/chat/completions   -H "Content-Type: application/json"   -d '{
-    "model": "Qwen/Qwen3-0.6B",
-    "messages": [
-    {
-        "role": "user",
-        "content": "In the heart of Eldoria, ...
-    }
-    ],
-    "stream":false,
-    "max_tokens": 100
-  }'
+curl http://dynamo-vllm-agg-router.test/v1/chat/completions \
+  -H "Content-Type: application/json" \
+  -d '{
+    "model": "Qwen/Qwen3-0.6B",
+    "messages": [{"role": "user", "content": "Say hello from Dynamo on Minikube."}],
+    "stream": false,
+    "max_tokens": 64
+  }'

404-426: Add cleanup for namespace and secrets (optional).

Many users will want to fully tear down the environment.

 # uninstall dynamo CRD's
 helm uninstall dynamo-crds -n default
+
+# (Optional) remove namespace and secrets created in this guide
+kubectl delete namespace ${NAMESPACE}

82-122: Potential ingress overlap: NGINX vs Istio IngressGateway.

You enable both NGINX Ingress and Istio. Clarify which gateway fronts traffic, or note potential port overlaps and how to disable one if needed.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between e176a29 and aa538be.

📒 Files selected for processing (1)
  • examples/deployments/minikube/README.md (1 hunks)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (2)
examples/deployments/minikube/README.md (2)

86-98: Ingress addon section looks good.


210-216: Chart versioning: CRDs and platform versions must be consistent and parameterized.

You mix RELEASE_VERSION=0.4.1 with a hard-coded CRDs 0.4.0 later. Introduce a CRDS_VERSION (in case CRDs are versioned independently) and use helm “pull” consistently.

-# set release version
-export RELEASE_VERSION=0.4.1
+# Set versions (CRDs may differ from platform)
+export RELEASE_VERSION=0.4.1
+export CRDS_VERSION=0.4.1

Likely an incorrect or invalid review comment.

@DFatadeNVIDIA
Copy link
Author

LGTM, request a double check on Istio/NGINX re: slack message

would be sick if you were able to create a Brev Launchable for this 👀

@julienmancuso @nealvaidya mind taking a look at this PR as well?

I'm all in for a Brev Launchable personally 👀

NGINX would be required here just for exposing the service - I don't think we need both NGINX and Istio, I'll do a quick run and double check on my end

@DFatadeNVIDIA DFatadeNVIDIA requested a review from a team as a code owner September 22, 2025 21:18
@tmonty12
Copy link
Contributor

Thank you for the contribution @DFatadeNVIDIA

We already have:

I'd prefer if you added any missing information/documentation there and or link out to the pre-existing documentation.

@DFatadeNVIDIA
Copy link
Author

Thank you for the contribution @DFatadeNVIDIA

We already have:

I'd prefer if you added any missing information/documentation there and or link out to the pre-existing documentation.

Hey @tmonty12 - I hope you're doing well and thanks for taking the time to review and share feedback! That's good to know, I looked through all the files you linked and don't think this will add much value given what's already present in the repo. I'll get this MR closed out but happy to revisit if there are any other example needs in the future.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants