Skip to content

Conversation

@keivenchang
Copy link
Contributor

@keivenchang keivenchang commented Sep 4, 2025

Overview:

Add --user parameter to container run.sh script to allow overriding the default user context when running Docker containers.

Details:

  • Added new optional --user parameter to run.sh script
  • Parameter accepts UID:GID format (e.g., 1000:1000) or username (e.g., ubuntu)
  • Integrated user parameter into Docker run command
  • Updated help documentation to include new parameter

Where should the reviewer start?

container/run.sh - specifically the parameter handling section (lines 128-134) and Docker run command integration (line 366)

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to BUG-5501463 https://nvbugspro.nvidia.com/bug/5501463

Summary by CodeRabbit

  • New Features

    • Added --user option to container run script to override container user.
    • Hello World runtime example now streams per-word responses with cancellation support.
    • Helm values add etcd image repository and option to allow insecure images for easier deployment.
  • Bug Fixes

    • Improved backend prefill handling and error resilience, reducing failures and ensuring safe defaults.
  • Documentation

    • Major overhaul: consolidated installation guides, updated links, added profiling prerequisites/notes, and improved operator/Helm documentation and navigation.
  • Chores

    • Bumped Helm chart versions to 0.5.0.
    • Standardized container/devcontainer targets from “local-dev” to “dev”.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Sep 4, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@keivenchang keivenchang self-assigned this Sep 4, 2025
@saturley-hall
Copy link
Member

This can't be merged to main as it bumps the version of some helm charts which are not publicly available until after 0.5.0 is released.
The only changes needed are in fac71cc. This is the only commit that needs to go to main. Then cherry-pick the commit from main into release/0.5.0

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 4, 2025

Walkthrough

Renames the dev build target from local-dev to dev across Dockerfile, devcontainer config, and scripts. Adds --user to run.sh. Updates vLLM handler to default prefill off when absent and adds a prefill path. Simplifies Helm deployment by removing scripts, bumping chart versions, and revising docs. Adds streaming/cancellation to hello_world example.

Changes

Cohort / File(s) Summary
Devcontainer and Docker target rename
./.devcontainer/README.md, ./.devcontainer/devcontainer.json, container/Dockerfile.vllm
Switch image/tag and build target from local-dev to dev; update docs accordingly; rename Dockerfile stage alias to dev.
Container scripts
container/build.sh, container/run.sh
build.sh now injects USER_UID/GID when TARGET=="dev". run.sh adds --user option, parses/validates it, and passes --user to docker run.
vLLM backend prefill and error handling
components/backends/vllm/src/dynamo/vllm/handlers.py
_prefill_check_loop sets can_prefill=0 if no client; adds generic exception logging. generate adds prefill request path: deep-copies sampling params, sets kv_transfer_params/do_remote_decode, uses prefill_worker_client.round_robin, updates kv transfer params, continues streaming; TODO noted.
Helm: versions, values, chart docs
deploy/cloud/helm/crds/Chart.yaml, deploy/helm/chart/Chart.yaml, deploy/cloud/helm/platform/values.yaml, deploy/cloud/helm/platform/README.md, deploy/cloud/helm/platform/README.md.gotmpl, deploy/cloud/helm/README.md
Bump chart/app versions to 0.5.0; add etcd.image.repository and allowInsecureImages; update platform doc links and top-level Helm README to describe platform/crds charts.
Removed Helm deployment scripts and values
deploy/cloud/helm/deploy.sh, deploy/cloud/helm/dynamo-platform-values.yaml, deploy/cloud/helm/network-config-wizard.sh
Delete deploy orchestrator, default values, and network configuration wizard scripts.
Operator build/docs wiring
deploy/cloud/operator/Makefile, deploy/cloud/operator/README.md, deploy/cloud/operator/docs/header.md
Makefile now composes header + generated API into docs/guides path and cleans intermediates; Operator README links to installation guide; header replaced with auto-generated notice.
Docs: links, profiling, installation flow
benchmarks/profiler/README.md, docs/benchmarks/pre_deployment_profiling.md, docs/architecture/sla_planner.md, docs/guides/dynamo_deploy/*, deploy/README.md, docs/index.rst
Update links/paths; add profiling prerequisite/note; rework installation guide to helm upgrade --install, introduce RELEASE_VERSION, remove quick deploy script references; adjust multiple guide links to installation_guide.md; add autogenerated notices to API reference.
Hello world example streaming
examples/runtime/hello_world/hello_world.py
Replace PyContext with Context; implement async streaming with per-word yield and cancellation checks; function signature updated.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant DecodeHandler as DecodeWorkerHandler
  participant PrefillClient as PrefillWorkerClient
  participant Engine as DecodeEngine

  Note over DecodeHandler: generate(request, sampling_params)

  DecodeHandler->>DecodeHandler: _prefill_check_loop()
  alt Prefill allowed
    DecodeHandler->>PrefillClient: round_robin(prefill_request)
    PrefillClient-->>DecodeHandler: prefill_response (kv_transfer_params)
    DecodeHandler->>DecodeHandler: sampling_params.extra_args.kv_transfer_params = response.kv_transfer_params
  else Prefill not allowed or client None
    Note over DecodeHandler: Proceed without prefill
  end

  loop Stream tokens
    DecodeHandler->>Engine: generate_tokens(...)
    Engine-->>DecodeHandler: token/update
    DecodeHandler-->>Client: stream token
  end

  rect rgba(255,230,230,0.5)
  Note over DecodeHandler,Engine: On EngineDeadError → shutdown path (unchanged)
  end
Loading
sequenceDiagram
  autonumber
  participant User
  participant Endpoint as content_generator()
  participant Ctx as Context

  User->>Endpoint: request (string)
  loop For each word
    Endpoint->>Endpoint: await sleep(1s)
    alt Cancelled?
      Endpoint->>Ctx: is_stopped()/is_killed()
      Ctx-->>Endpoint: true
      Endpoint-->>User: return (stop streaming)
    else Not cancelled
      Endpoint-->>User: yield "Hello {word}!"
    end
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related PRs

Poem

A rabbit taps the helm in tidy rows,
Scripts hop away, a cleaner garden grows.
Prefill peeks first, then tokens flow,
Hello streams “Hi!” with a gentle glow.
Dev becomes dev—no burrow of old—
Ships at 0.5.0, quick and bold. 🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore or @coderabbit ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 8

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (4)
.devcontainer/README.md (1)

381-382: Fix stale troubleshooting command.

Replace --target local-dev with --target dev.

-   ./container/build.sh --target local-dev
+   ./container/build.sh --target dev
docs/benchmarks/pre_deployment_profiling.md (1)

61-61: Fix path typo: stray closing brace in filename.

The trailing } makes the example path invalid.

-* `${benchmark_result_dir}/selected_prefill_interpolation/raw_data.npz}`
+* `${benchmark_result_dir}/selected_prefill_interpolation/raw_data.npz`
benchmarks/profiler/README.md (1)

61-61: Same path typo here: remove the extra brace.

Mirror the fix from the main doc.

-* `${benchmark_result_dir}/selected_prefill_interpolation/raw_data.npz}`
+* `${benchmark_result_dir}/selected_prefill_interpolation/raw_data.npz`
docs/guides/dynamo_deploy/installation_guide.md (1)

56-69: Use ‘helm pull’ instead of ‘helm fetch’.

Align with Helm v3 terminology.

-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-crds-${RELEASE_VERSION}.tgz
 helm install dynamo-crds dynamo-crds-${RELEASE_VERSION}.tgz --namespace default
@@
-helm fetch https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
+helm pull https://helm.ngc.nvidia.com/nvidia/ai-dynamo/charts/dynamo-platform-${RELEASE_VERSION}.tgz
 helm install dynamo-platform dynamo-platform-${RELEASE_VERSION}.tgz --namespace ${NAMESPACE}
🧹 Nitpick comments (28)
.devcontainer/devcontainer.json (1)

10-10: Fix comment to match new image naming (no longer "local dev").

The comment says "latest VLLM local dev image" but the image is now "dynamo:latest-vllm". Update the comment for clarity.

-    "image": "dynamo:latest-vllm", // Use the latest VLLM local dev image
+    "image": "dynamo:latest-vllm", // Use the latest VLLM dev image
.devcontainer/README.md (1)

44-46: Update diagram to reflect new image tag.

Change dynamo:latest-vllm-local-devdynamo:latest-vllm.

-        IMAGE["Docker Image<br/>dynamo:latest-vllm-local-dev"]
+        IMAGE["Docker Image<br/>dynamo:latest-vllm"]
container/run.sh (5)

46-46: Avoid shadowing shell’s $USER; use a distinct variable name.

Using USER can conflict with the standard environment variable. Rename to CONTAINER_USER (or RUN_USER) to prevent surprises.

-USER=
+CONTAINER_USER=

128-135: Rename option storage and keep quoting to avoid word-splitting.

Store into CONTAINER_USER and preserve quoting.

-        --user)
-            if [ "$2" ]; then
-                USER="$2"
+        --user)
+            if [ "$2" ]; then
+                CONTAINER_USER="$2"
                 shift
             else
                 missing_requirement "$1"
             fi
             ;;

286-291: Build user flag safely and clearly.

Use the renamed var and quote it.

-    if [[ ${USER} == "" ]]; then
+    if [[ -z "${CONTAINER_USER}" ]]; then
         USER_STRING=""
     else
-        USER_STRING="--user ${USER}"
+        USER_STRING="--user ${CONTAINER_USER}"
     fi

328-329: Help text: clarify accepted formats.

Mention UID:GID and username to match PR description.

-    echo "  [--user override the user for running the container]"
+    echo "  [--user override the user (username or UID[:GID]) for running the container]"

239-265: HF cache mount may mismatch when running as non-root.

When --user is set to a non-root user, mounting to /root/.cache/huggingface can cause permission or path mismatches. Consider honoring HF_HOME (and setting it) or mounting to the effective user’s cache dir when --user is provided.

-        VOLUME_MOUNTS+=" -v $HF_CACHE:/root/.cache/huggingface"
+        # If a custom container user is requested, prefer HF_HOME under that user's home
+        if [ -n "${CONTAINER_USER}" ]; then
+            ENVIRONMENT_VARIABLES+=" -e HF_HOME=/home/ubuntu/.cache/huggingface"
+            VOLUME_MOUNTS+=" -v $HF_CACHE:/home/ubuntu/.cache/huggingface"
+        else
+            VOLUME_MOUNTS+=" -v $HF_CACHE:/root/.cache/huggingface"
+        fi
docs/benchmarks/pre_deployment_profiling.md (1)

7-9: Good clarity on time expectations.

Helpful note. Consider adding “duration varies by model size, GPU type, and TP search space” to set expectations.

 > **Time Investment**: This profiling process is comprehensive and typically takes **a few hours** to complete. The script systematically tests multiple tensor parallelism configurations and load conditions to find optimal performance settings. This upfront investment ensures your deployment meets SLA requirements and operates efficiently.
+> Actual duration varies by model size, GPU type, and the TP search space you enable.
components/backends/vllm/src/dynamo/vllm/handlers.py (6)

152-164: Prime can_prefill at init and tighten the guard to avoid race on first requests.

First requests within the initial 5s window may skip prefill even when workers exist. Also, be explicit about prefill_worker_client presence.

Apply:

@@ def __init__(...):
-        if self.prefill_worker_client is not None:
-            self._prefill_check_task = asyncio.create_task(self._prefill_check_loop())
+        if self.prefill_worker_client is not None:
+            try:
+                self.can_prefill = len(self.prefill_worker_client.instance_ids())
+            except Exception:
+                self.can_prefill = 0
+            self._prefill_check_task = asyncio.create_task(self._prefill_check_loop())
@@ async def generate(...):
-        if self.can_prefill:
+        if self.prefill_worker_client is not None and self.can_prefill > 0:
             # Create a copy for prefill with specific modifications

171-176: Avoid double-await; add a timeout to prevent hangs.

The current await anext(await round_robin(...)) is hard to read and lacks a timeout.

-            try:
-                prefill_response = await anext(
-                    await self.prefill_worker_client.round_robin(
-                        prefill_request, context=context
-                    )
-                )
+            try:
+                async with asyncio.timeout(5):
+                    rr_gen = await self.prefill_worker_client.round_robin(
+                        prefill_request, context=context
+                    )
+                    prefill_response = await anext(rr_gen)

177-184: Handle cancellation separately and avoid re-raising opaque Exception.

Treat asyncio.CancelledError explicitly, and consider mapping other exceptions to a structured error for upstreams.

-            except Exception as e:
-                # TODO: Cancellation does not propagate until the first token is received
-                if context.is_stopped() or context.is_killed():
-                    logger.debug(f"Aborted Remote Prefill Request ID: {request_id}")
-                    # TODO: Raise asyncio.CancelledError into bindings
-                    return
-                raise e
+            except asyncio.CancelledError:
+                logger.debug(f"Cancelled Remote Prefill Request ID: {request_id}")
+                raise
+            except Exception as e:
+                if context.is_stopped() or context.is_killed():
+                    logger.debug(f"Aborted Remote Prefill Request ID: {request_id}")
+                    return
+                logger.error(f"Remote prefill failed: {e}")
+                # Fallback to local decode without KV handoff

185-195: Guard against missing/empty KV transfer params.

If kv_transfer_params is absent, decoding may later assume its presence.

-            prefill_response = MyRequestOutput.model_validate_json(
-                prefill_response.data()
-            )
+            prefill_response = MyRequestOutput.model_validate_json(prefill_response.data())
@@
-            sampling_params.extra_args[
-                "kv_transfer_params"
-            ] = prefill_response.kv_transfer_params
+            kvp = getattr(prefill_response, "kv_transfer_params", None) or {}
+            if not kvp:
+                logger.warning(f"No kv_transfer_params from prefill; continuing without KV handoff. request_id={request_id}")
+            sampling_params.extra_args["kv_transfer_params"] = kvp

129-133: Cancel-and-drain background task to avoid task warnings on shutdown.

If the loop raises after cancel, the task may log unhandled exceptions. Drain it when possible.

     def cleanup(self):
         """Cancel background tasks."""
-        if self._prefill_check_task is not None:
-            self._prefill_check_task.cancel()
+        if self._prefill_check_task is not None:
+            self._prefill_check_task.cancel()
+            # Best-effort: schedule a drain without making cleanup async
+            async def _drain(t):
+                try:
+                    await t
+                except asyncio.CancelledError:
+                    pass
+                except Exception as e:
+                    logger.debug(f"Prefill check task closed with error: {e}")
+            try:
+                asyncio.get_running_loop().create_task(_drain(self._prefill_check_task))
+            except RuntimeError:
+                # No running loop; ignore
+                pass
         super().cleanup()

206-211: Deduplicate EngineDeadError handling (minor).

generate_tokens already catches EngineDeadError and exits the process. Consider centralizing handling to one place.

docs/guides/dynamo_deploy/metrics.md (1)

10-10: Link fix LGTM; minor typos in this doc.

  • Line 23: “podmnonitors” → “PodMonitors”.
  • Lines 75–76: sample text has typos (“ests”, “familt”); consider trimming or correcting to avoid distracting copy.
-# Values allow podmnonitors to be picked up that are outside of the kube-prometheus-stack helm release
+# Values allow PodMonitors to be picked up outside the kube-prometheus-stack Helm release
docs/guides/dynamo_deploy/dynamo_operator.md (1)

41-41: Tighten wording and fix minor grammar/link text in this page.

Also clean up Line 55 phrasing.

-- A Kubernetes cluster with [Dynamo Cloud](installation_guide.md) installed
+- A Kubernetes cluster with the [Dynamo Kubernetes Platform](installation_guide.md) installed
-First, follow to [See Install Dynamo Cloud](README.md).
+First, see [Install Dynamo Cloud](README.md).
deploy/cloud/helm/platform/README.md.gotmpl (1)

60-60: Prefer absolute URL so this link works on Artifact Hub and other renderers

Relative paths often break outside the repo context. Consider linking to GitHub (or docs site) directly.

- - [Dynamo Cloud Deployment Installation Guide](../../../../docs/guides/dynamo_deploy/installation_guide.md)
+ - [Dynamo Cloud Deployment Installation Guide](https://github.com/ai-dynamo/dynamo/blob/main/docs/guides/dynamo_deploy/installation_guide.md)
docs/guides/dynamo_deploy/gke_setup.md (1)

148-152: Clarify chart path/working directory for the helm command

Make the path explicit to reduce confusion about where to run the command from.

-You can use it during helm installation:
+From the repository root, you can use it during helm installation:
@@
-helm upgrade --install ${RELEASE} platform/ -f values.yaml --namespace ${NAMESPACE}
+helm upgrade --install ${RELEASE} deploy/cloud/helm/platform/ -f values.yaml --namespace ${NAMESPACE}
deploy/cloud/helm/platform/values.yaml (1)

135-137: Pin image tag or document chart-managed tag source.

You switched to repository: bitnamilegacy/etcd. If you intend to rely on the chart’s default tag, add a brief comment so operators know where the tag is controlled; otherwise explicitly pin to a version to prevent surprise upgrades.

 etdc:
   image:
-    repository: bitnamilegacy/etcd
+    repository: bitnamilegacy/etcd
+    # tag: 3.5.15-debian-12-rXX  # Optional: pin for reproducibility
deploy/cloud/helm/crds/README.md (1)

18-20: Link/path LGTM; add trailing newline for linters.

Content and relative link look correct. Add a newline at EOF to satisfy common markdown/style linters.

docs/guides/dynamo_deploy/README.md (2)

55-59: Namespace inconsistency: reuse the same NAMESPACE set above.

Earlier you set NAMESPACE=dynamo-kubernetes. Here it switches to dynamo-cloud, which can confuse readers.

-# Set same namespace from platform install
-export NAMESPACE=dynamo-cloud
+# Set same namespace from platform install
+export NAMESPACE=dynamo-kubernetes

107-134: Align dynamoNamespace values to a single namespace.

Mixing my-llm and dynamo-dev can cause misplacement of resources. Use one namespace placeholder consistently.

   services:
     Frontend:
-      dynamoNamespace: my-llm
+      dynamoNamespace: my-llm
@@
     VllmDecodeWorker:  # or SGLangDecodeWorker, TrtllmDecodeWorker
-      dynamoNamespace: dynamo-dev
+      dynamoNamespace: my-llm
deploy/cloud/helm/README.md (1)

18-23: README looks good; add newline at EOF.

Minor formatting nit to appease linters.

docs/guides/dynamo_deploy/installation_guide.md (1)

24-25: Minor grammar tweak.

“Platform is installed using … helm chart” → “The platform is installed using the … Helm chart.”

-Platform is installed using Dynamo Kubernetes Platform [helm chart](../../../deploy/cloud/helm/platform/README.md).
+The platform is installed using the Dynamo Kubernetes Platform [Helm chart](../../../deploy/cloud/helm/platform/README.md).
docs/guides/dynamo_deploy/api_reference.md (1)

1-20: De-duplicate the “auto-generated” warning (header.md already adds it)

Since deploy/cloud/operator/docs/header.md is prepended during the Makefile step, this file ends up with two consecutive auto-generation warnings. Keep it in header.md only to avoid redundancy.

Apply this diff to drop the duplicate notice here (the header will still render it):

->
-> **⚠️ Important**: This documentation is automatically generated from source code.
->
-> Do not edit this file directly.
-
deploy/cloud/operator/Makefile (1)

288-294: Harden pathing: ensure target docs directory exists and avoid duplicate warnings

  • The relative path ../../../docs/guides/dynamo_deploy/ looks correct from deploy/cloud/operator, but mkdir -p the target dir for robustness on clean clones.
  • Given header.md already provides an auto-gen warning, consider removing the duplicate lines from the final api_reference.md (see separate comment). The Makefile step is fine; just avoid committing the duplicate text in the generated file.

Apply this diff to make the target path creation explicit:

 	@echo "✅ Generated API reference at ./docs/api_reference.md"
 	# concatenate header.md and api_reference.md
+	@mkdir -p ../../../docs/guides/dynamo_deploy
 	cat docs/header.md ./docs/api_reference.md > ../../../docs/guides/dynamo_deploy/api_reference.md
 	rm ./docs/api_reference.md
 	@echo "✅ Concatenated header.md and api_reference.md"
examples/runtime/hello_world/hello_world.py (2)

31-36: Reduce cancel latency; avoid print; trim tokens.

Check cancel before and after sleep, use logger, and skip empty/whitespace tokens.

Apply:

-    for word in request.split(","):
-        await asyncio.sleep(1)
-        if context.is_stopped() or context.is_killed():
-            print("request got cancelled.")
-            return
-        yield f"Hello {word}!"
+    for raw in request.split(","):
+        # Early check to avoid ~1s latency on cancel
+        if context.is_stopped() or context.is_killed():
+            logger.info("Request cancelled: id=%s", context.id())
+            return
+        word = raw.strip()
+        if not word:
+            continue
+        await asyncio.sleep(1)
+        # Late check in case cancel occurred during sleep
+        if context.is_stopped() or context.is_killed():
+            logger.info("Request cancelled after sleep: id=%s", context.id())
+            return
+        yield f"Hello {word}!"

44-45: Harden against missing etcd client (static workers).

Avoid potential AttributeError if etcd_client() returns None.

-    lease_id = runtime.etcd_client().primary_lease_id()
+    etcd = runtime.etcd_client()
+    lease_id = etcd.primary_lease_id() if etcd else "N/A"
📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

💡 Knowledge Base configuration:

  • MCP integration is disabled by default for public repositories
  • Jira integration is disabled by default for public repositories
  • Linear integration is disabled by default for public repositories

You can enable these sources in your CodeRabbit configuration.

📥 Commits

Reviewing files that changed from the base of the PR and between c403d18 and fac71cc.

📒 Files selected for processing (35)
  • .devcontainer/README.md (1 hunks)
  • .devcontainer/devcontainer.json (1 hunks)
  • benchmarks/profiler/README.md (1 hunks)
  • components/backends/vllm/src/dynamo/vllm/handlers.py (2 hunks)
  • container/Dockerfile.vllm (1 hunks)
  • container/build.sh (1 hunks)
  • container/run.sh (5 hunks)
  • deploy/README.md (0 hunks)
  • deploy/README.md (1 hunks)
  • deploy/cloud/helm/README.md (1 hunks)
  • deploy/cloud/helm/crds/Chart.yaml (1 hunks)
  • deploy/cloud/helm/crds/README.md (1 hunks)
  • deploy/cloud/helm/deploy.sh (0 hunks)
  • deploy/cloud/helm/dynamo-platform-values.yaml (0 hunks)
  • deploy/cloud/helm/network-config-wizard.sh (0 hunks)
  • deploy/cloud/helm/platform/README.md (1 hunks)
  • deploy/cloud/helm/platform/README.md.gotmpl (1 hunks)
  • deploy/cloud/helm/platform/values.yaml (1 hunks)
  • deploy/cloud/operator/Makefile (1 hunks)
  • deploy/cloud/operator/README.md (1 hunks)
  • deploy/cloud/operator/docs/header.md (1 hunks)
  • deploy/helm/chart/Chart.yaml (1 hunks)
  • docs/architecture/sla_planner.md (1 hunks)
  • docs/benchmarks/pre_deployment_profiling.md (1 hunks)
  • docs/guides/dynamo_deploy/README.md (3 hunks)
  • docs/guides/dynamo_deploy/api_reference.md (1 hunks)
  • docs/guides/dynamo_deploy/dynamo_operator.md (2 hunks)
  • docs/guides/dynamo_deploy/gke_setup.md (1 hunks)
  • docs/guides/dynamo_deploy/grove.md (1 hunks)
  • docs/guides/dynamo_deploy/installation_guide.md (4 hunks)
  • docs/guides/dynamo_deploy/metrics.md (1 hunks)
  • docs/guides/dynamo_deploy/minikube.md (1 hunks)
  • docs/guides/dynamo_deploy/sla_planner_deployment.md (1 hunks)
  • docs/index.rst (1 hunks)
  • examples/runtime/hello_world/hello_world.py (1 hunks)
💤 Files with no reviewable changes (3)
  • deploy/cloud/helm/deploy.sh
  • deploy/cloud/helm/dynamo-platform-values.yaml
  • deploy/cloud/helm/network-config-wizard.sh
🧰 Additional context used
🧠 Learnings (8)
📓 Common learnings
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2822
File: container/Dockerfile.vllm:343-352
Timestamp: 2025-09-03T01:10:12.599Z
Learning: In the dynamo project's local-dev Docker targets, USER_UID and USER_GID build args are intentionally left without default values to force explicit UID/GID mapping during build time, preventing file permission issues in local development environments where container users need to match host user permissions for mounted volumes.
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#1474
File: deploy/cloud/operator/internal/controller/dynamocomponent_controller.go:1302-1306
Timestamp: 2025-06-11T21:18:00.425Z
Learning: In the Dynamo operator, the project’s preferred security posture is to set a Pod-level `PodSecurityContext` with `runAsUser`, `runAsGroup`, and `fsGroup` all set to `1000`, and then selectively override the user at the individual container level (e.g., `RunAsUser: 0` for Kaniko) when root is required.
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project, devcontainer.json files use templated container names (like "dynamo-vllm-devcontainer") that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • .devcontainer/devcontainer.json
  • .devcontainer/README.md
📚 Learning: 2025-08-30T20:43:10.091Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: .devcontainer/devcontainer.json:12-12
Timestamp: 2025-08-30T20:43:10.091Z
Learning: In the dynamo project's devcontainer setup, hard-coded container names in devcontainer.json files serve as templates that are automatically processed by the copy_devcontainer.sh script to generate framework-specific configurations with unique names, preventing container name collisions.

Applied to files:

  • .devcontainer/devcontainer.json
  • .devcontainer/README.md
📚 Learning: 2025-08-30T20:43:49.632Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2797
File: container/Dockerfile:437-449
Timestamp: 2025-08-30T20:43:49.632Z
Learning: In the dynamo project's devcontainer setup, the team prioritizes consistency across framework-specific Dockerfiles (like container/Dockerfile, container/Dockerfile.vllm, etc.) by mirroring their structure, even when individual optimizations might be possible, to maintain uniformity in the development environment setup.

Applied to files:

  • .devcontainer/devcontainer.json
  • .devcontainer/README.md
  • container/Dockerfile.vllm
📚 Learning: 2025-09-03T01:10:12.599Z
Learnt from: keivenchang
PR: ai-dynamo/dynamo#2822
File: container/Dockerfile.vllm:343-352
Timestamp: 2025-09-03T01:10:12.599Z
Learning: In the dynamo project's local-dev Docker targets, USER_UID and USER_GID build args are intentionally left without default values to force explicit UID/GID mapping during build time, preventing file permission issues in local development environments where container users need to match host user permissions for mounted volumes.

Applied to files:

  • .devcontainer/devcontainer.json
  • container/build.sh
  • .devcontainer/README.md
📚 Learning: 2025-07-18T16:04:31.771Z
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2012
File: deploy/cloud/helm/crds/templates/nvidia.com_dynamocomponentdeployments.yaml:92-98
Timestamp: 2025-07-18T16:04:31.771Z
Learning: CRD schemas in files like deploy/cloud/helm/crds/templates/*.yaml are auto-generated from Kubernetes library upgrades and should not be manually modified as changes would be overwritten during regeneration.

Applied to files:

  • deploy/cloud/helm/crds/README.md
  • deploy/cloud/helm/crds/Chart.yaml
📚 Learning: 2025-09-03T21:02:02.170Z
Learnt from: julienmancuso
PR: ai-dynamo/dynamo#2843
File: deploy/cloud/helm/platform/values.yaml:0-0
Timestamp: 2025-09-03T21:02:02.170Z
Learning: With etcd chart version 12.0.18, the global.security.allowInsecureImages setting is not required when using the bitnamilegacy repository, unlike older chart versions.

Applied to files:

  • deploy/cloud/helm/platform/values.yaml
📚 Learning: 2025-09-04T19:03:06.595Z
Learnt from: biswapanda
PR: ai-dynamo/dynamo#2872
File: examples/multimodal/deploy/agg_qwen.yaml:53-60
Timestamp: 2025-09-04T19:03:06.595Z
Learning: In the dynamo repository, Kubernetes Custom Resources use `gpu: "1"` format for GPU resource limits and requests, not the standard Kubernetes `nvidia.com/gpu: 1` format. This applies to DynamoGraphDeployment resources and other dynamo CRs.

Applied to files:

  • docs/guides/dynamo_deploy/README.md
🧬 Code graph analysis (2)
examples/runtime/hello_world/hello_world.py (2)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • DistributedRuntime (31-54)
lib/bindings/python/src/dynamo/runtime/__init__.py (2)
  • dynamo_endpoint (65-97)
  • dynamo_worker (36-62)
container/run.sh (1)
container/build.sh (1)
  • missing_requirement (435-437)
🪛 LanguageTool
docs/guides/dynamo_deploy/dynamo_operator.md

[grammar] ~41-~41: There might be a mistake here.
Context: ... Cloud](installation_guide.md) installed - [FluxCD](https://fluxcd.io/flux/installat...

(QB_NEW_EN)

deploy/cloud/helm/platform/README.md

[grammar] ~104-~104: There might be a mistake here.
Context: ...namo Cloud Deployment Installation Guide](../../../../docs/guides/dynamo_deploy/installation_guide.md) - [NATS Documentation](https://docs.nats.io...

(QB_NEW_EN)

docs/guides/dynamo_deploy/README.md

[grammar] ~45-~45: There might be a mistake here.
Context: ... | Backend | Available Configurations | |---------|--------------------------| |...

(QB_NEW_EN)


[grammar] ~46-~46: There might be a mistake here.
Context: ...| |---------|--------------------------| | **[vLLM](/components/backends/vllm/dep...

(QB_NEW_EN)


[grammar] ~47-~47: There might be a mistake here.
Context: ...ated + Router, Disaggregated + Planner | | **[SGLang](/components/backends/sglang...

(QB_NEW_EN)


[grammar] ~48-~48: There might be a mistake here.
Context: ...ed + Planner, Disaggregated Multi-node | | **[TensorRT-LLM](/components/backends/...

(QB_NEW_EN)


[grammar] ~82-~82: There might be a mistake here.
Context: ...cations for DynamoGraphDeployment and DynamoComponentDeployment - **[Operator Guide](/docs/guides/dynamo_depl...

(QB_NEW_EN)


[grammar] ~83-~83: There might be a mistake here.
Context: ...mo operator configuration and management - **[Create Deployment](/docs/guides/dynamo_d...

(QB_NEW_EN)

docs/guides/dynamo_deploy/installation_guide.md

[grammar] ~24-~24: There might be a mistake here.
Context: ...uick Start Paths Platform is installed using Dynamo Kubernetes Platform [helm chart]...

(QB_NEW_EN)

🔇 Additional comments (19)
container/Dockerfile.vllm (1)

297-297: Stage rename LGTM.

Switching local-devdev aligns with the rest of the PR and keeps target naming consistent.

.devcontainer/README.md (1)

94-98: Docs update LGTM, but ensure consistency across the page.

Step 1 now references --target dev, which is correct.

container/run.sh (1)

366-366: Docker run integration LGTM.

${USER_STRING} is correctly injected and remains empty when not set.

benchmarks/profiler/README.md (1)

1-1: Documentation link verified
Confirmed that docs/benchmarks/pre_deployment_profiling.md exists in-tree.

docs/architecture/sla_planner.md (1)

31-34: Clear prerequisite statement and cross-link.

LGTM. This aligns with operational reality of SLA planner depending on profiled artifacts.

docs/guides/dynamo_deploy/sla_planner_deployment.md (1)

26-26: Prereq phrasing is precise.

Good to specify that results live in dynamo-pvc.

components/backends/vllm/src/dynamo/vllm/handlers.py (2)

119-121: LGTM: explicit reset when no prefill client.

Setting can_prefill = 0 in the else branch avoids stale values. No further action.


1-10: Remove scope-mismatch comment for handlers.py
File components/backends/vllm/src/dynamo/vllm/handlers.py isn’t modified in this PR; only container/run.sh was changed.

Likely an incorrect or invalid review comment.

deploy/cloud/helm/crds/Chart.yaml (1)

19-19: CRDs chart version bump looks good.

No issues spotted. Ensure release notes reference 0.5.0 for CRD changes.

deploy/helm/chart/Chart.yaml (1)

20-21: Update all lingering 0.4.1 references in documentation to 0.5.0

  • docs/support_matrix.md (lines 63, 71): bump NIXL entries from 0.4.1 to 0.5.0
  • docs/benchmarks/pre_deployment_profiling.md (line 109): update DOCKER_IMAGE tag to vllm-runtime:0.5.0
  • docs/_includes/quick_start_local.rst (lines 13, 20): change ai-dynamo[sglang]==0.4.1 and GitHub release URL from release/0.4.1 to 0.5.0
  • docs/_includes/install.rst (lines 13, 44): update pip install extras and container image tags from 0.4.1 to 0.5.0
⛔ Skipped due to learnings
Learnt from: dmitry-tokarev-nv
PR: ai-dynamo/dynamo#2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.
docs/guides/dynamo_deploy/dynamo_operator.md (1)

32-32: Updated installation link reads well.

No issues.

deploy/cloud/helm/platform/README.md (1)

104-104: Readme sync with template looks good

Generated README matches the gotmpl change. No action needed.

docs/guides/dynamo_deploy/installation_guide.md (1)

107-114: Use NVCR as the registry server
Replace the existing --docker-server=${DOCKER_SERVER} line in the “create secret” block with:

  --docker-server=nvcr.io \

(or ensure you export DOCKER_SERVER=nvcr.io before running) so the pull secret explicitly targets NVCR.

Likely an incorrect or invalid review comment.

docs/guides/dynamo_deploy/grove.md (1)

100-100: LGTM: link retargeted correctly to installation_guide.md

The relative path resolves within the same directory; anchor text is clear.

docs/guides/dynamo_deploy/minikube.md (1)

61-61: LGTM: Next Steps link now points to the installation guide

Good consistency with other doc updates.

deploy/cloud/operator/docs/header.md (1)

18-19: LGTM: concise auto-generation notice

Appropriate place to host the warning that will prefix the generated docs.

deploy/cloud/operator/Makefile (1)

280-294: Core PR objective appears missing: container/run.sh (--user) not in diff

The PR description/reviewer guidance call out container/run.sh changes (lines 128–134, 366), but the provided changes only cover docs and Makefile. Please include the run.sh modifications so we can review argument parsing and docker run integration.

examples/runtime/hello_world/hello_world.py (2)

1-60: Confirm scope: example changes unrelated to container/run.sh --user.

The hello_world streaming updates seem orthogonal to the Docker run permission fix. Confirm they’re intended in this PR; otherwise consider a follow-up PR to keep scope tight.


21-21: Confirm Context bindings. The _core.pyi stub doesn’t list a Context class or its methods—please verify that the Python dynamo.runtime.Context (from dynamo._core) actually implements id(), is_stopped(), and is_killed().

Comment on lines 96 to 101
export NAMESPACE=dynamo-cloud
export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/ # or your registry
export DOCKER_USERNAME='$oauthtoken'
export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY>
export IMAGE_TAG=0.5.0
export IMAGE_TAG=${RELEASE_VERSION}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix registry variables to avoid auth mismatch and double slashes.

Docker registry secrets expect the server host (e.g., nvcr.io), not a path. Also, a trailing slash in DOCKER_SERVER causes a double slash in the image repository.

- export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/  # or your registry
+ export DOCKER_SERVER=nvcr.io                    # registry server host only
+ export DOCKER_NAMESPACE=nvidia/ai-dynamo        # image namespace/org
@@
-export IMAGE_TAG=${RELEASE_VERSION}
+export IMAGE_TAG=${RELEASE_VERSION}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export NAMESPACE=dynamo-cloud
export DOCKER_SERVER=nvcr.io/nvidia/ai-dynamo/ # or your registry
export DOCKER_USERNAME='$oauthtoken'
export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY>
export IMAGE_TAG=0.5.0
export IMAGE_TAG=${RELEASE_VERSION}
export NAMESPACE=dynamo-cloud
export DOCKER_SERVER=nvcr.io # registry server host only
export DOCKER_NAMESPACE=nvidia/ai-dynamo # image namespace/org
export DOCKER_USERNAME='$oauthtoken'
export DOCKER_PASSWORD=<YOUR_NGC_CLI_API_KEY>
export IMAGE_TAG=${RELEASE_VERSION}
🤖 Prompt for AI Agents
In docs/guides/dynamo_deploy/installation_guide.md around lines 96-101, the
DOCKER_SERVER variable is set to a path with a trailing slash which is incorrect
for Docker registry auth and can produce double slashes in image names; update
DOCKER_SERVER to the registry host only (e.g., nvcr.io) and move the repository
path into a separate variable (e.g., DOCKER_REPOSITORY=nvcr.io/nvidia/ai-dynamo
or DOCKER_PATH=nvidia/ai-dynamo) or ensure image references are constructed
without adding an extra slash so image push/auth use the host for credentials
and the full repository path for image names.

@keivenchang keivenchang force-pushed the keivenchang/v0.5.0__BUG-5501463__fix-Dockerfile-run-permission-problems branch from fac71cc to 020b891 Compare September 4, 2025 19:49
@keivenchang keivenchang changed the title fix bug 5501463 Dockerfile run permission problems fix: bug 5501463 Dockerfile run permission problems Sep 4, 2025
@github-actions github-actions bot added the fix label Sep 4, 2025
@keivenchang keivenchang changed the base branch from main to release/0.5.0 September 4, 2025 19:52
@pull-request-size pull-request-size bot added size/S and removed size/XL labels Sep 4, 2025
@keivenchang
Copy link
Contributor Author

This can't be merged to main as it bumps the version of some helm charts which are not publicly available until after 0.5.0 is released. The only changes needed are in fac71cc. This is the only commit that needs to go to main. Then cherry-pick the commit from main into release/0.5.0

That was a mistake, I've set it to merge to release/0.5.0, is that ok?

@nv-tusharma
Copy link
Contributor

@saturley-hall Can you please clarify your comment from above? I don't see helm charts updated in this PR

@saturley-hall
Copy link
Member

@saturley-hall Can you please clarify your comment from above? I don't see helm charts updated in this PR

This was originally targeted at main he has since retargeted it at release/0.5.0 so it is no longer showing it as a diff.

@saturley-hall
Copy link
Member

This can't be merged to main as it bumps the version of some helm charts which are not publicly available until after 0.5.0 is released. The only changes needed are in fac71cc. This is the only commit that needs to go to main. Then cherry-pick the commit from main into release/0.5.0

That was a mistake, I've set it to merge to release/0.5.0, is that ok?

Out of a desire for expedience yes. Please ensure that is also in main

saturley-hall and others added 2 commits September 4, 2025 21:00
Signed-off-by: Harrison King Saturley-Hall <hsaturleyhal@nvidia.com>
Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>
@keivenchang keivenchang changed the base branch from release/0.5.0 to main September 4, 2025 21:01
@pull-request-size pull-request-size bot added size/XL and removed size/S labels Sep 4, 2025
@keivenchang keivenchang force-pushed the keivenchang/v0.5.0__BUG-5501463__fix-Dockerfile-run-permission-problems branch from 020b891 to ae5e0be Compare September 4, 2025 21:02
@pull-request-size pull-request-size bot added size/S and removed size/XL labels Sep 4, 2025
@keivenchang keivenchang closed this Sep 4, 2025
@keivenchang
Copy link
Contributor Author

Messy PR, removing.

@keivenchang keivenchang deleted the keivenchang/v0.5.0__BUG-5501463__fix-Dockerfile-run-permission-problems branch September 26, 2025 22:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants