Skip to content

Conversation

@biswapanda
Copy link
Contributor

@biswapanda biswapanda commented Aug 22, 2025

Overview:

Cherry-pick: #2625

Prevent crashloops in helloworld k8s example

nvbug: https://nvbugspro.nvidia.com/bug/5471412
closes: DYN-926

Summary by CodeRabbit

  • New Features
    • Added TensorRT-LLM Kubernetes deployment templates (agg, router, disagg, disagg-router).
    • Operator now supports Grove termination delay and auto-detects Grove availability.
    • New multi-stage Dockerfile and build flag to enable KVBM; updated images and runtime tooling.
  • Documentation
    • Major docs revamp: new backend guides (vLLM, SGLang, TRT-LLM), Kubernetes deploy guides, multinode examples, Quickstart/Helm updates, and Support Matrix.
  • Bug Fixes
    • More robust config handling and numerous link/path corrections across docs and scripts.
  • Chores
    • Version bump to 0.4.0+post0.
    • Expanded third-party license attributions.

alec-flowers and others added 30 commits July 30, 2025 22:07
Co-authored-by: Anant Sharma <anants@nvidia.com>
Co-authored-by: Ishan Dhanani <idhanani@nvidia.com>
Co-authored-by: Tanmay Verma <tanmayv@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
…link (#2239)

Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Anish <80174047+athreesh@users.noreply.github.com>
…2260)

Signed-off-by: Biswa Panda <biswa.panda@gmail.com>
…#2319)

Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com>
Co-authored-by: Biswa Panda <biswa.panda@gmail.com>
Co-authored-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 22, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 22, 2025

Walkthrough

This PR adds extensive documentation, deployment manifests, and container build updates across backends (vLLM, SGLang, TensorRT-LLM), introduces ETCD-backed port allocation for vLLM, refactors Grove settings in the operator/Helm, updates TRT-LLM engine configs, and revises numerous docs/links. Minor code changes include safer dict access in TRT-LLM and operator wiring for Grove detection/termination delay.

Changes

Cohort / File(s) Summary
Licenses
ATTRIBUTIONS-Go.md
Appends many third‑party license attributions (MIT/Apache/BSD/ISC); documentation only.
Versioning
Cargo.toml
Bumps workspace version to 0.4.0+post0.
Top-level README and Docs Restructure
README.md, docs/index.rst, docs/hidden_toctree.rst, docs/guides/dynamo_deploy/*, docs/support_matrix.md, docs/examples/*, docs/runtime/README.md, docs/architecture/*, deploy/helm/README.md, deploy/cloud/README.md
Reorganizes navigation, adds support matrix link, updates guides/paths, quickstart version bump, new Helm guide, link fixes, adds Hello World example README.
SGLang Docs & Manifests
components/backends/sglang/README.md, .../deploy/README.md, .../deploy/disagg.yaml, .../docs/*, .../launch/agg_router.sh, .../slurm_jobs/*
Path/link fixes, Quick Start/containers, multi-node/WideEP docs, decode worker entrypoint update, Docker-compose/run examples, SLURM docs and template/env tweaks.
SGLang Containers
container/Dockerfile.sglang, container/Dockerfile.sglang-wideep
Update NIXL refs, add jq/curl, adjust PYTHONPATH/removals, add flashinfer pre-release and requirements installs; non-editable install in wideep.
TRT-LLM K8s Deploy & Docs
components/backends/trtllm/README.md, .../deploy/*.yaml, docs/components/backends/trtllm/*
Adds agg/agg_router/disagg/disagg_router manifests, new README and deploy guide, KV transfer doc link tweak.
TRT-LLM Engine Configs
components/backends/trtllm/engine_configs/**/*
Normalize CUDA graph config under cuda_graph_config, move kv_cache_dtype to kv_cache_config.dtype, add cache_transceiver_config, moe_config restructure, value tweaks (e.g., free_gpu_memory_fraction).
TRT-LLM Runtime Code
components/backends/trtllm/src/dynamo/trtllm/main.py
Change kv_cache_config event buffer check to dict-key presence; preserves falsy explicit values.
TRT-LLM Container
container/Dockerfile.tensorrt_llm
Update NIXL ref, runtime env (ARCH_ALT, LD_LIBRARY_PATH/PATH), add jq, wheelhouse path changes, pin Triton 3.3.1 on amd64, add env script and CI artifacts.
vLLM Runtime – Ports
components/backends/vllm/src/dynamo/vllm/ports.py, .../args.py, docs/components/backends/vllm/*, components/backends/vllm/README.md
Introduces ETCD-backed port allocation (single/block), host IP util, CLI port range, removes local alloc funcs; updates docs and README.
vLLM Container & Install
container/Dockerfile.vllm, container/deps/vllm/install_vllm.sh
Add ENABLE_KVBM build-arg, pin DeepGEMM/NIXL, switch install modes, add jq; vLLM/FlashInfer install flow updates and Torch pin for amd64.
Generic Container Infra
container/Dockerfile (new), container/Dockerfile.none (removed), container/build.sh
New multi-stage base/wheel/dev Dockerfile; drop old NONE Dockerfile; add ENABLE_KVBM flag; update defaults (TRT-LLM, NIXL, base images), paths, and build args.
Operator & Helm – Grove Refactor
deploy/cloud/operator/cmd/main.go, .../internal/controller_common/*, .../internal/dynamo/*, .../helm/*
Replace enableGrove with GroveConfig{Enabled, TerminationDelay}, auto-detect Grove via discovery, propagate terminationDelay to PodGangSet, adjust Helm values/templates/RBAC and deploy script.
Inference Gateway Docs
deploy/inference-gateway/README.md
Adds explicit model deploy step and GAIE chart install instructions; quickstart link fix.
Minor Doc Fixes/Removals
benchmarks/llm/README.md, components/README.md, components/backends/llama_cpp/README.md, docs/API/nixl_connect/*, docs/components/backends/llm/README.md, docs/guides/dynamo_deploy/operator_deployment.md, docs/architecture/kv_cache_routing.md, examples/basics/disaggregated_serving/README.md
Link/path corrections, small wording changes, remove obsolete pages/entries.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Worker as vLLM Worker
  participant Args as CLI Args Parser
  participant Ports as Ports Module
  participant ETCD as Etcd KV
  participant OS as Host OS

  Args->>Worker: parse(--port-min/--port-max)
  Worker->>Ports: configure_ports_with_etcd(port_range, etcd_ctx, tp/dp)
  alt KV events enabled
    Ports->>ETCD: reserve kv_port (metadata)
  end
  Ports->>OS: check_port_available()
  loop allocate block (tp or 1)
    Ports->>ETCD: reserve contiguous block (metadata with block_info)
    ETCD-->>Ports: success/fail
  end
  Ports-->>Worker: kv_port, side_channel_port(s)
  Worker->>OS: set env (host_ip, ports)
Loading
sequenceDiagram
  autonumber
  participant Main as Operator Main
  participant Disc as K8s Discovery
  participant Cfg as GroveConfig
  participant Reconciler as Controller
  participant Grove as Grove PodGangSet

  Main->>Cfg: set TerminationDelay (flag/default)
  Main->>Disc: DetectGroveAvailability()
  Disc-->>Main: true/false
  Main->>Cfg: Enabled = result
  alt Grove Enabled
    Reconciler->>Grove: Reconcile with TerminationDelay
  else
    Reconciler->>Reconciler: Reconcile Dynamo components only
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

Poem

I tap my paw on ports that glow,
ETCD whispers, “Here you go.”
Grove trees sway, a timed goodnight,
WideEP moons sail through the night.
Wheels are baked, configs sing,
Frontends hop—request on wing.
Ship it, ship it—thump-thump-spring! 🐇🚀

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 33

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)
components/backends/sglang/launch/agg_router.sh (1)

14-20: Fail-fast on namespace clear and interpreter consistency for frontend.

  • The retrieved learning explicitly states clear_namespace must block router startup if it fails. Currently, the script proceeds even if the clear step exits non‑zero (no set -e and no &&). Also, the frontend is launched with python while the rest uses python3. Aligning interpreters prevents PATH/env drift issues.

Apply this diff to enforce fail-fast and interpreter consistency:

@@
-# run clear_namespace
-python3 -m dynamo.sglang.utils.clear_namespace --namespace dynamo
+# run clear_namespace (must succeed before starting the frontend)
+python3 -m dynamo.sglang.utils.clear_namespace --namespace dynamo || {
+  echo "[ERROR] clear_namespace failed; aborting startup." >&2
+  exit 1
+}
@@
-# run ingress
-python -m dynamo.frontend --router-mode kv --http-port=8000 &
+# run ingress
+python3 -m dynamo.frontend --router-mode kv --http-port=8000 &
container/Dockerfile.sglang-wideep (1)

108-114: Supply-chain risk: rustup-init downloaded without integrity verification.

An executable is fetched and run without checksum/signature validation. This is a critical supply-chain risk.

Apply this diff to add SHA256 verification (Rust provides .sha256 files per artifact):

 ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
@@
 RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
-    # TODO: Add SHA check back based on RUSTARCH
+    wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init.sha256" && \
+    # Verify checksum before executing
+    echo "$(cat rustup-init.sha256)  rustup-init" | sha256sum -c - && \
     chmod +x rustup-init && \
     ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
-    rm rustup-init && \
+    rm rustup-init rustup-init.sha256 && \
     chmod -R a+w $RUSTUP_HOME $CARGO_HOME
deploy/inference-gateway/README.md (3)

41-45: Typo: “Inferenece” → “Inference”.

User-facing typo in the CRD description.

-b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs)
+b. Install the Inference Extension CRDs (InferenceModel and InferencePool CRDs)

84-91: Inconsistency: referenced config file vs. helm -f value.

Text mentions inference-gateway-resources.yaml but the helm command uses vllm_agg_qwen.yaml. Align the filename or clarify roles (values vs. manifest).

-The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
+The Inference Gateway is configured via a values file (e.g., `vllm_agg_qwen.yaml`) passed to the Helm chart.
@@
-helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml
+helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml

If there is indeed a standalone manifest named inference-gateway-resources.yaml, either rename this reference or add a note explaining how it differs from the values file.


137-146: Fix typos and broken kubectl command when populating GATEWAY_URL.

  • “User” → “Use”; “alternateive” → “alternative”.
  • kubectl uses two -o flags; only the last one applies. This breaks the command.
- a. User minikube tunnel to expose the gateway to the host
-    This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b).
+ a. Use minikube tunnel to expose the gateway to the host
+    This requires `sudo` access to the host machine. Alternatively, you can use port-forward to expose the gateway to the host as shown in alternative (b).
@@
-GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}')
+GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}')
ATTRIBUTIONS-Go.md (1)

10381-10826: Update SPDX identifiers and add code fence languages

I ran the provided verification script and found the following issues:

  • Non-SPDX identifiers (must be updated to “Apache-2.0”):
    • Line 10411 (sigs.k8s.io/randfill)
    • Line 10620 (github.com/NVIDIA/grove/operator/api)

  • Fenced code blocks without a language (MD040):
    There are numerous occurrences of bare fences throughout ATTRIBUTIONS-Go.md. All license blocks should use a language tag (e.g.text) for consistency and to satisfy markdown linters.

The license bodies themselves (MIT for blang/semver/v4 and Apache 2.0 for the other two) match the standard upstream texts and require no content changes, but please manually confirm they exactly match the source repositories.

Next steps:

  • Replace
    License Identifier: Apache 2.0
    with
    License Identifier: Apache-2.0
    in both newly-added Apache blocks.

  • Update all fences around license texts totext.

  • (Optional) Enhance the generator to:

    • Emit SPDX-normalized identifiers (MIT, Apache-2.0, etc.).
    • Always include a language on code fences (e.g. ```text).
    • Fail the generation step if any license text retrieval fails.
components/backends/sglang/README.md (1)

119-121: Typo: “conjuction” → “conjunction”.

-Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead, we first route to a random prefill worker, select a random decode worker, and then send the request to both. Internally, SGLang's bootstrap server (which is a part of the `tokenizer_manager`) is used in conjuction with NIXL to handle the kv transfer.
+Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead, we first route to a random prefill worker, select a random decode worker, and then send the request to both. Internally, SGLang's bootstrap server (which is a part of the `tokenizer_manager`) is used in conjunction with NIXL to handle the KV transfer.
container/Dockerfile.tensorrt_llm (2)

129-139: Supply-chain hardening: add checksum/signature verification for downloaded artifacts.

wget/curl downloads for NATS, etcd, and rustup-init occur without integrity checks.

  • Fetch and verify SHA256 (or GPG signature) for each artifact.
  • Example for etcd:
-RUN wget https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
+RUN set -euo pipefail; \
+    wget -q https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
+    wget -q https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/sha256sum.txt -O /tmp/etcd.sha256 && \
+    (cd /tmp && grep "etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz" etcd.sha256 | sha256sum -c -) && \
     mkdir -p /usr/local/bin/etcd && \
     tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
     rm /tmp/etcd.tar.gz

Do similarly for NATS and rustup.

Also applies to: 205-213, 263-268


145-175: Ensure all wheel references point to /workspace/wheelhouse/ instead of legacy paths

The repository still contains numerous references to /workspace/dist and /workspace/wheels/nixl that must be updated now that artifacts are staged under /workspace/wheelhouse/. Please update downstream CI/job scripts, Dockerfiles, and the Earthfile to use the new location:

• container/Dockerfile.vllm
– Lines 178–185: uv build . --out-dir /workspace/wheels/nixl and uv pip install /workspace/wheels/nixl/*.whl
– Lines 370–378: uv build --wheel --out-dir /workspace/dist …
– Lines 409–410: uv pip install /workspace/dist/…
– Line 474: COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/ (should use /workspace/wheelhouse/)

• container/Dockerfile.tensorrt_llm
– Lines 234–241: uv build . --out-dir /workspace/wheels/nixl and uv pip install /workspace/wheels/nixl/*.whl
– Lines 301–307: uv build --wheel --out-dir /workspace/dist …
– Lines 332–333: uv pip install /workspace/dist/…
– Lines 497–498: COPY --from=dev /workspace/wheels/nixl/*.whl /workspace/wheelhouse/ and COPY --from=wheel_builder /workspace/dist/*.whl /workspace/wheelhouse/

• container/Dockerfile.sglang
– Lines 161–168: uv build . --out-dir /workspace/wheels/nixl and uv pip install /workspace/wheels/nixl/*.whl
– Lines 338–343: uv build --wheel --out-dir /workspace/dist …
– Lines 373–374: uv pip install /workspace/dist/…
– Lines 449–450: COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/ and COPY --from=base /workspace/wheels/nixl/*.whl wheelhouse/

• Earthfile
– Lines 119–121: uv build --wheel --out-dir /workspace/dist …
– Lines 124–126: SAVE ARTIFACT /workspace/dist/*.whl

Please refactor these paths to /workspace/wheelhouse/ so that all wheel-producing and consuming steps are consistent.

docs/components/backends/trtllm/kv-cache-tranfer.md (1)

1-1: Fix filename typo to avoid broken links: “kv-cache-tranfer.md” ➜ “kv-cache-transfer.md”

“tranfer” is misspelled. This is user-facing and likely referenced elsewhere; keeping the misspelling risks 404s and SEO issues.

Rename and update inbound references:

#!/bin/bash
set -euo pipefail

# Rename the file
git mv docs/components/backends/trtllm/kv-cache-tranfer.md docs/components/backends/trtllm/kv-cache-transfer.md

# Update references across the repo
rg -nP 'kv-cache-tranfer\.md' -S -g '!**/node_modules/**' -g '!**/dist/**' -g '!**/build/**'
sed -i.bak 's/kv-cache-tranfer\.md/kv-cache-transfer.md/g' $(rg -lP 'kv-cache-tranfer\.md' -S -g '!**/node_modules/**' -g '!**/dist/**' -g '!**/build/**')
find . -name "*.bak" -delete

If your docs tooling supports redirects (e.g., Sphinx with redirects or mkdocs-redirects), consider adding one from the old path to the new path to keep external links working. Do you want me to add a redirect config in this PR?

deploy/cloud/helm/deploy.sh (1)

167-173: Do not print generated-values.yaml; it may leak secrets.

Immediately after envsubst, the script cats generated-values.yaml. If the values file contains credentials (docker or pipelines), this prints them to stdout and potentially CI logs. Remove the dump or print a redacted version.

Apply:

-echo "generated file contents:"
-cat generated-values.yaml
+echo "Generated values file created at generated-values.yaml"
+echo "(contents withheld to avoid leaking credentials; open locally if needed)"
container/Dockerfile.vllm (1)

363-377: Gate the cargo build features with ENABLE_KVBM to avoid feature skew

You correctly gate maturin with ENABLE_KVBM, but the prior cargo build always enables dynamo-llm/block-manager. This can produce mismatched artifacts when ENABLE_KVBM=false.

-RUN cargo build \
-	--release \
-	--locked \
-	--features dynamo-llm/block-manager \
-	--workspace
+RUN if [ "$ENABLE_KVBM" = "true" ]; then \
+      cargo build --release --locked --features dynamo-llm/block-manager --workspace; \
+    else \
+      cargo build --release --locked --workspace; \
+    fi


[workspace.package]
version = "0.4.0"
version = "0.4.0+post0"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Workspace version bump may desync local crate versions

Bumping [workspace.package].version to 0.4.0+post0 while workspace.dependencies pin local crates at 0.4.0 can cause version mismatches if member crates inherit the workspace version or if any member’s package.version is updated without updating these dependency constraints.

Choose one consistent approach:

  • Option A (quick): keep workspace at 0.4.0 for now.
  • Option B (preferred): bump local crate versions and dependency pins to 0.4.0+post0.

If going with Option B, update the local workspace dependency versions:

 [workspace.dependencies]
 # Local crates
-dynamo-runtime = { path = "lib/runtime", version = "0.4.0" }
-dynamo-llm = { path = "lib/llm", version = "0.4.0" }
-dynamo-tokens = { path = "lib/tokens", version = "0.4.0" }
+dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" }
+dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" }
+dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" }

I can also generate a script to scan member crates for version.workspace = true and align everything if helpful.


🏁 Script executed:

#!/bin/bash
set -eux

# Inspect root Cargo.toml for workspace.package version
echo "===== [workspace.package] block in root Cargo.toml ====="
sed -n -n '/^\[workspace.package\]/,/^\[/{p}' Cargo.toml || true

# Inspect root Cargo.toml for workspace.dependencies block
echo -e "\n===== [workspace.dependencies] block in root Cargo.toml ====="
sed -n -n '/^\[workspace.dependencies\]/,/^\[/{p}' Cargo.toml || true

# Inspect member crate Cargo.toml files
for crate in lib/runtime lib/llm lib/tokens; do
  echo -e "\n===== $crate/Cargo.toml ====="
  if [ -f "$crate/Cargo.toml" ]; then
    # Show [package] block
    echo "---- [package] block ----"
    sed -n -n '/^\[package\]/,/^\[/{p}' "$crate/Cargo.toml"
    # Show [dependencies] block
    echo "---- [dependencies] block ----"
    sed -n -n '/^\[dependencies\]/,/^\[/{p}' "$crate/Cargo.toml"
  else
    echo "$crate/Cargo.toml not found"
  fi
done

Length of output: 8765


Action Required: Align local crate dependency versions with the bumped workspace version

The root Cargo.toml now sets

[workspace.package]
version = "0.4.0+post0"

but under

[workspace.dependencies]

the local crates are still pinned at 0.4.0. Since each member crate uses version.workspace = true, their actual version becomes 0.4.0+post0, which will conflict with the pinned 0.4.0 in workspace.dependencies. You have two choices:

  • Option A: Revert the workspace version bump back to "0.4.0".
  • Option B (preferred): Update the local crate entries in [workspace.dependencies] to match 0.4.0+post0.

If you proceed with Option B, apply this diff in root Cargo.toml:

 [workspace.dependencies]
 # Local crates
-dynamo-runtime = { path = "lib/runtime", version = "0.4.0" }
-dynamo-llm      = { path = "lib/llm",     version = "0.4.0" }
-dynamo-tokens   = { path = "lib/tokens",  version = "0.4.0" }
+dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" }
+dynamo-llm      = { path = "lib/llm",     version = "0.4.0+post0" }
+dynamo-tokens   = { path = "lib/tokens",  version = "0.4.0+post0" }

• Root manifest: Cargo.toml
• Lines: update the three dynamo-* entries under [workspace.dependencies]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
version = "0.4.0+post0"
[workspace.dependencies]
# Local crates
dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" }
dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" }
dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" }
🤖 Prompt for AI Agents
In Cargo.toml around line 18, the workspace package version was bumped to
"0.4.0+post0" but the local crate entries under [workspace.dependencies] remain
pinned to "0.4.0", causing a version mismatch; update the three dynamo-* entries
under [workspace.dependencies] to use version "0.4.0+post0" (or alternatively
revert the workspace.package.version to "0.4.0" if you prefer Option A).

Comment on lines +109 to +116
First, create a secret for the HuggingFace token.
```bash
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Define $NAMESPACE before use.

Add export to prevent kubectl errors when copying commands.

 First, create a secret for the HuggingFace token.
 ```bash
+export NAMESPACE=<your-k8s-namespace>
 export HF_TOKEN=your_hf_token
 kubectl create secret generic hf-token-secret \
   --from-literal=HF_TOKEN=${HF_TOKEN} \
   -n ${NAMESPACE}

<details>
<summary>🤖 Prompt for AI Agents</summary>

In components/backends/sglang/deploy/README.md around lines 109 to 116, the
snippet uses ${NAMESPACE} but never defines or exports it and could cause
kubectl errors when copied; update the docs to instruct users to define and
export the NAMESPACE variable first (e.g., export
NAMESPACE=) before exporting HF_TOKEN and running kubectl
create secret so the commands work when copied into a shell.


</details>

<!-- fingerprinting:phantom:triton:chinchilla -->

<!-- This is an auto-generated comment by CodeRabbit -->

# Running DeepSeek-R1 Disaggregated with WideEP on H100s

Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Inconsistent Dockerfile name: “sglang-deepep” vs build command using “sglang-wideep”.

The text references container/Dockerfile.sglang-deepep, but the build command uses Dockerfile.sglang-wideep. Align the filename to avoid user confusion.

-... We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` ...
+... We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` ...
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
🤖 Prompt for AI Agents
In components/backends/sglang/docs/dsr1-wideep-h100.md around line 8 the
referenced Dockerfile name is inconsistent: the prose mentions
container/Dockerfile.sglang-deepep while the build command uses
Dockerfile.sglang-wideep; update one to match the other so they are identical
(choose the canonical filename used in the repo, e.g., rename references to
container/Dockerfile.sglang-wideep if that file exists or change the build
command to use container/Dockerfile.sglang-deepep) and ensure all occurrences in
this document use the same filename to avoid confusion.

Comment on lines +46 to 51
| Feature | SGLang | Notes |
|---------------------|--------|--------------------------------------------------------------|
| **WideEP** | | Full support on H100s/GB200 |
| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker |
| **GB200 Support** | | |

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix grammar: “router” → “route” to DP worker.

Small but user‑visible in the Feature Matrix.

-| **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not router to DP worker |
+| **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not route to DP worker |
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
| Feature | SGLang | Notes |
|---------------------|--------|--------------------------------------------------------------|
| **WideEP** || Full support on H100s/GB200 |
| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker |
| **GB200 Support** || |
| Feature | SGLang | Notes |
|---------------------|--------|--------------------------------------------------------------|
| **WideEP** || Full support on H100s/GB200 |
| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not route to DP worker |
| **GB200 Support** || |
🧰 Tools
🪛 LanguageTool

[grammar] ~49-~49: There might be a mistake here.
Context: ...KV router does not router to DP worker | | GB200 Support | ✅ | ...

(QB_NEW_EN)

🤖 Prompt for AI Agents
In components/backends/sglang/README.md around lines 46 to 51, the
feature-matrix note uses the incorrect verb "router" in "Dynamo KV router does
not router to DP worker"; update the phrase to use the correct verb "route"
(e.g., "Dynamo KV router does not route to DP worker") and ensure surrounding
punctuation/capitalization remains consistent with the table style.

Comment on lines +55 to 56
Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
### Start NATS and ETCD in the background
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix wording: extra “the” in Quick Start intro.

-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of our common deployment patterns on a single node.
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
### Start NATS and ETCD in the background
Below we provide a guide that lets you run all of our common deployment patterns on a single node.
### Start NATS and ETCD in the background
🧰 Tools
🪛 LanguageTool

[grammar] ~55-~55: There might be a mistake here.
Context: ...on deployment patterns on a single node. ### Start NATS and ETCD in the background S...

(QB_NEW_EN)

🤖 Prompt for AI Agents
components/backends/sglang/README.md around lines 55 to 56: The Quick Start
intro contains an extra definite article ("the the") — update the sentence
"Below we provide a guide that lets you run all of our the common deployment
patterns on a single node." to remove the duplicated "the" so it reads correctly
(e.g., "Below we provide a guide that lets you run all of our common deployment
patterns on a single node.").

Comment on lines +240 to +241
For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Broken link: kv-cache-tranfer.md → kv-cache-transfer.md

Typo in the filename “tranfer” breaks the link.

Apply:

-For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).
+For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).
For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).
🤖 Prompt for AI Agents
In docs/components/backends/trtllm/deploy/README.md around lines 240 to 241, the
link references "../kv-cache-tranfer.md" which contains a typo in the filename;
update the link target to "../kv-cache-transfer.md" so it points to the correct
file name and verify the linked file exists and the relative path is correct.

Comment on lines +167 to +168
kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Fix port mismatch between port-forward and curl example.

You port-forward 8000:8000 but later curl localhost:8080. Recommend forwarding 8080:8000 to keep curl unchanged.

-kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
+kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
```
kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000
🤖 Prompt for AI Agents
In docs/components/backends/vllm/deploy/README.md around lines 167 to 168, the
kubectl port-forward command forwards 8000:8000 while the curl example targets
localhost:8080; update the port-forward to kubectl port-forward
deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000 so local port 8080
maps to container port 8000 (or alternatively change the curl to use :8000) to
make the example ports consistent.

Comment on lines +72 to +96
### Multi-node Disaggregated Serving

Deploy prefill and decode workers on separate nodes for optimized resource utilization:

**Node 1**: Run ingress and prefill workers
```bash
# Start ingress
python -m dynamo.frontend --router-mode kv &

# Start prefill worker
python -m dynamo.vllm \
--model meta-llama/Llama-3.3-70B-Instruct
--tensor-parallel-size 8 \
--enforce-eager
```

**Node 2**: Run decode workers
```bash
# Start decode worker
python -m dynamo.vllm \
--model meta-llama/Llama-3.3-70B-Instruct
--tensor-parallel-size 8 \
--enforce-eager \
--is-prefill-worker
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Decode worker is started with --is-prefill-worker (wrong flag)

Node 2 is labeled “decode worker” but passes --is-prefill-worker. That will misroute roles.

Apply this diff to fix roles and make flags explicit on both ends:

-# Start prefill worker
-python -m dynamo.vllm \
-  --model meta-llama/Llama-3.3-70B-Instruct
-  --tensor-parallel-size 8 \
-  --enforce-eager
+# Start prefill worker
+python -m dynamo.vllm \
+  --model meta-llama/Llama-3.3-70B-Instruct \
+  --tensor-parallel-size 8 \
+  --enforce-eager \
+  --is-prefill-worker
@@
-# Start decode worker
-python -m dynamo.vllm \
-  --model meta-llama/Llama-3.3-70B-Instruct
-  --tensor-parallel-size 8 \
-  --enforce-eager \
-  --is-prefill-worker
+# Start decode worker
+python -m dynamo.vllm \
+  --model meta-llama/Llama-3.3-70B-Instruct \
+  --tensor-parallel-size 8 \
+  --enforce-eager
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
### Multi-node Disaggregated Serving
Deploy prefill and decode workers on separate nodes for optimized resource utilization:
**Node 1**: Run ingress and prefill workers
```bash
# Start ingress
python -m dynamo.frontend --router-mode kv &
# Start prefill worker
python -m dynamo.vllm \
--model meta-llama/Llama-3.3-70B-Instruct
--tensor-parallel-size 8 \
--enforce-eager
```
**Node 2**: Run decode workers
```bash
# Start decode worker
python -m dynamo.vllm \
--model meta-llama/Llama-3.3-70B-Instruct
--tensor-parallel-size 8 \
--enforce-eager \
--is-prefill-worker
```
### Multi-node Disaggregated Serving
Deploy prefill and decode workers on separate nodes for optimized resource utilization:
**Node 1**: Run ingress and prefill workers
🤖 Prompt for AI Agents
In docs/components/backends/vllm/multi-node.md around lines 72–96, the Node 2
"decode worker" example incorrectly uses the --is-prefill-worker flag which will
misroute roles; update the examples to make flags explicit by adding
--is-prefill-worker to the Node 1 prefill worker invocation and replacing the
incorrect --is-prefill-worker on Node 2 with the correct --is-decode-worker (or
simply remove the prefill flag there) so Node 1 is explicitly a prefill worker
and Node 2 is explicitly a decode worker.

Comment on lines +1 to +2
:orphan:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

⚠️ Potential issue

MyST/Sphinx: replace :orphan: with supported front matter in Markdown

In Markdown (MyST), :orphan: may not be interpreted and can leak into the rendered page. Use YAML front matter instead:

Apply:

-:orphan:
-
+---
+orphan: true
+---

Also ensure this page is reachable via your toctree or intentionally “hidden” via nav; otherwise Sphinx may still warn depending on your build config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
:orphan:
---
orphan: true
---
🤖 Prompt for AI Agents
In docs/guides/dynamo_deploy/gke_setup.md lines 1-2, remove the standalone
":orphan:" directive and replace it with YAML front matter at the top of the
file declaring the page as orphaned (add a YAML block containing an orphan: true
key), then ensure this page is either linked from your toctree or intentionally
excluded/hidden in your site nav so Sphinx doesn't emit warnings during the
build.

Comment on lines +485 to 490
See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#build-container) to build the dynamo container with TensorRT-LLM.

##### Step 2: Run the environment

See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#run-container) to run the built environment.
See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#run-container) to run the built environment.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Verification agent

🧩 Analysis chain

Updated TRT-LLM links — verify anchors exist in README.

Ensure #build-container and #run-container headings exist in components/backends/trtllm/README.md so these links don’t 404.


🏁 Script executed:

#!/bin/bash
set -euo pipefail
file="components/backends/trtllm/README.md"
echo "Checking anchors in $file"
rg -nP '^\s*#{2,}\s*Build the environment\b' "$file" || echo "Missing 'Build the environment' heading"
rg -nP '^\s*#{2,}\s*Run the environment\b' "$file"   || echo "Missing 'Run the environment' heading"

Length of output: 538


Add missing anchors in components/backends/trtllm/README.md

The links in docs/guides/dynamo_run.md (lines 485–490) point to #build-container and #run-container, but those anchors don’t exist in components/backends/trtllm/README.md, so they will 404. Please update the README to include matching headings or explicit anchors. For example:

• In components/backends/trtllm/README.md, add:

### Build Container
<!-- or: <a name="build-container"></a> -->
Steps to build the Dynamo container with TensorRT-LLM…

• And:

### Run Container
<!-- or: <a name="run-container"></a> -->
Instructions to run the built environment…

Alternatively, adjust the links in dynamo_run.md to match whatever headings are already present.

🤖 Prompt for AI Agents
In docs/guides/dynamo_run.md around lines 485–490 the links target anchors
#build-container and #run-container that don't exist in
components/backends/trtllm/README.md; fix by either adding matching
anchors/heading blocks to components/backends/trtllm/README.md (e.g., add "###
Build Container" and "### Run Container" headings or explicit <a
name="build-container"></a> / <a name="run-container"></a> anchors with the
corresponding build/run instructions) or by updating the links in
docs/guides/dynamo_run.md to point to the actual existing headings in
components/backends/trtllm/README.md so they resolve correctly.

@biswapanda biswapanda closed this Aug 23, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.