-
Notifications
You must be signed in to change notification settings - Fork 688
fix: prevent crash looping hello world #2625 #2670
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: Ishan Dhanani <idhanani@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>
…link (#2239) Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> Co-authored-by: Anish <80174047+athreesh@users.noreply.github.com>
…#2319) Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>
WalkthroughThis PR adds extensive documentation, deployment manifests, and container build updates across backends (vLLM, SGLang, TensorRT-LLM), introduces ETCD-backed port allocation for vLLM, refactors Grove settings in the operator/Helm, updates TRT-LLM engine configs, and revises numerous docs/links. Minor code changes include safer dict access in TRT-LLM and operator wiring for Grove detection/termination delay. Changes
Sequence Diagram(s)sequenceDiagram
autonumber
participant Worker as vLLM Worker
participant Args as CLI Args Parser
participant Ports as Ports Module
participant ETCD as Etcd KV
participant OS as Host OS
Args->>Worker: parse(--port-min/--port-max)
Worker->>Ports: configure_ports_with_etcd(port_range, etcd_ctx, tp/dp)
alt KV events enabled
Ports->>ETCD: reserve kv_port (metadata)
end
Ports->>OS: check_port_available()
loop allocate block (tp or 1)
Ports->>ETCD: reserve contiguous block (metadata with block_info)
ETCD-->>Ports: success/fail
end
Ports-->>Worker: kv_port, side_channel_port(s)
Worker->>OS: set env (host_ip, ports)
sequenceDiagram
autonumber
participant Main as Operator Main
participant Disc as K8s Discovery
participant Cfg as GroveConfig
participant Reconciler as Controller
participant Grove as Grove PodGangSet
Main->>Cfg: set TerminationDelay (flag/default)
Main->>Disc: DetectGroveAvailability()
Disc-->>Main: true/false
Main->>Cfg: Enabled = result
alt Grove Enabled
Reconciler->>Grove: Reconcile with TerminationDelay
else
Reconciler->>Reconciler: Reconcile Dynamo components only
end
Estimated code review effort🎯 5 (Critical) | ⏱️ ~120 minutes Possibly related PRs
Poem
Warning There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure. 🔧 golangci-lint (2.2.2)Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions Tip 🔌 Remote MCP (Model Context Protocol) integration is now available!Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR/Issue comments)Type Other keywords and placeholders
Status, Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 33
Caution
Some comments are outside the diff and can’t be posted inline due to platform limitations.
⚠️ Outside diff range comments (12)
components/backends/sglang/launch/agg_router.sh (1)
14-20: Fail-fast on namespace clear and interpreter consistency for frontend.
- The retrieved learning explicitly states clear_namespace must block router startup if it fails. Currently, the script proceeds even if the clear step exits non‑zero (no set -e and no &&). Also, the frontend is launched with python while the rest uses python3. Aligning interpreters prevents PATH/env drift issues.
Apply this diff to enforce fail-fast and interpreter consistency:
@@ -# run clear_namespace -python3 -m dynamo.sglang.utils.clear_namespace --namespace dynamo +# run clear_namespace (must succeed before starting the frontend) +python3 -m dynamo.sglang.utils.clear_namespace --namespace dynamo || { + echo "[ERROR] clear_namespace failed; aborting startup." >&2 + exit 1 +} @@ -# run ingress -python -m dynamo.frontend --router-mode kv --http-port=8000 & +# run ingress +python3 -m dynamo.frontend --router-mode kv --http-port=8000 &container/Dockerfile.sglang-wideep (1)
108-114: Supply-chain risk: rustup-init downloaded without integrity verification.An executable is fetched and run without checksum/signature validation. This is a critical supply-chain risk.
Apply this diff to add SHA256 verification (Rust provides .sha256 files per artifact):
ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu @@ RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \ - # TODO: Add SHA check back based on RUSTARCH + wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init.sha256" && \ + # Verify checksum before executing + echo "$(cat rustup-init.sha256) rustup-init" | sha256sum -c - && \ chmod +x rustup-init && \ ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \ - rm rustup-init && \ + rm rustup-init rustup-init.sha256 && \ chmod -R a+w $RUSTUP_HOME $CARGO_HOMEdeploy/inference-gateway/README.md (3)
41-45: Typo: “Inferenece” → “Inference”.User-facing typo in the CRD description.
-b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs) +b. Install the Inference Extension CRDs (InferenceModel and InferencePool CRDs)
84-91: Inconsistency: referenced config file vs. helm -f value.Text mentions inference-gateway-resources.yaml but the helm command uses vllm_agg_qwen.yaml. Align the filename or clarify roles (values vs. manifest).
-The Inference Gateway is configured through the `inference-gateway-resources.yaml` file. +The Inference Gateway is configured via a values file (e.g., `vllm_agg_qwen.yaml`) passed to the Helm chart. @@ -helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml +helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yamlIf there is indeed a standalone manifest named inference-gateway-resources.yaml, either rename this reference or add a note explaining how it differs from the values file.
137-146: Fix typos and broken kubectl command when populating GATEWAY_URL.
- “User” → “Use”; “alternateive” → “alternative”.
- kubectl uses two -o flags; only the last one applies. This breaks the command.
- a. User minikube tunnel to expose the gateway to the host - This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b). + a. Use minikube tunnel to expose the gateway to the host + This requires `sudo` access to the host machine. Alternatively, you can use port-forward to expose the gateway to the host as shown in alternative (b). @@ -GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}') +GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}')ATTRIBUTIONS-Go.md (1)
10381-10826: Update SPDX identifiers and add code fence languagesI ran the provided verification script and found the following issues:
Non-SPDX identifiers (must be updated to “Apache-2.0”):
• Line 10411 (sigs.k8s.io/randfill)
• Line 10620 (github.com/NVIDIA/grove/operator/api)Fenced code blocks without a language (MD040):
There are numerous occurrences of barefences throughout ATTRIBUTIONS-Go.md. All license blocks should use a language tag (e.g.text) for consistency and to satisfy markdown linters.The license bodies themselves (MIT for blang/semver/v4 and Apache 2.0 for the other two) match the standard upstream texts and require no content changes, but please manually confirm they exactly match the source repositories.
Next steps:
Replace
License Identifier: Apache 2.0
with
License Identifier: Apache-2.0
in both newly-added Apache blocks.Update all
fences around license texts totext.(Optional) Enhance the generator to:
- Emit SPDX-normalized identifiers (MIT, Apache-2.0, etc.).
- Always include a language on code fences (e.g. ```text).
- Fail the generation step if any license text retrieval fails.
components/backends/sglang/README.md (1)
119-121: Typo: “conjuction” → “conjunction”.-Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead, we first route to a random prefill worker, select a random decode worker, and then send the request to both. Internally, SGLang's bootstrap server (which is a part of the `tokenizer_manager`) is used in conjuction with NIXL to handle the kv transfer. +Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead, we first route to a random prefill worker, select a random decode worker, and then send the request to both. Internally, SGLang's bootstrap server (which is a part of the `tokenizer_manager`) is used in conjunction with NIXL to handle the KV transfer.container/Dockerfile.tensorrt_llm (2)
129-139: Supply-chain hardening: add checksum/signature verification for downloaded artifacts.
wget/curldownloads for NATS, etcd, andrustup-initoccur without integrity checks.
- Fetch and verify SHA256 (or GPG signature) for each artifact.
- Example for etcd:
-RUN wget https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \ +RUN set -euo pipefail; \ + wget -q https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \ + wget -q https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/sha256sum.txt -O /tmp/etcd.sha256 && \ + (cd /tmp && grep "etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz" etcd.sha256 | sha256sum -c -) && \ mkdir -p /usr/local/bin/etcd && \ tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \ rm /tmp/etcd.tar.gzDo similarly for NATS and rustup.
Also applies to: 205-213, 263-268
145-175: Ensure all wheel references point to/workspace/wheelhouse/instead of legacy pathsThe repository still contains numerous references to
/workspace/distand/workspace/wheels/nixlthat must be updated now that artifacts are staged under/workspace/wheelhouse/. Please update downstream CI/job scripts, Dockerfiles, and the Earthfile to use the new location:• container/Dockerfile.vllm
– Lines 178–185:uv build . --out-dir /workspace/wheels/nixlanduv pip install /workspace/wheels/nixl/*.whl
– Lines 370–378:uv build --wheel --out-dir /workspace/dist …
– Lines 409–410:uv pip install /workspace/dist/…
– Line 474:COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/(should use/workspace/wheelhouse/)• container/Dockerfile.tensorrt_llm
– Lines 234–241:uv build . --out-dir /workspace/wheels/nixlanduv pip install /workspace/wheels/nixl/*.whl
– Lines 301–307:uv build --wheel --out-dir /workspace/dist …
– Lines 332–333:uv pip install /workspace/dist/…
– Lines 497–498:COPY --from=dev /workspace/wheels/nixl/*.whl /workspace/wheelhouse/andCOPY --from=wheel_builder /workspace/dist/*.whl /workspace/wheelhouse/• container/Dockerfile.sglang
– Lines 161–168:uv build . --out-dir /workspace/wheels/nixlanduv pip install /workspace/wheels/nixl/*.whl
– Lines 338–343:uv build --wheel --out-dir /workspace/dist …
– Lines 373–374:uv pip install /workspace/dist/…
– Lines 449–450:COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/andCOPY --from=base /workspace/wheels/nixl/*.whl wheelhouse/• Earthfile
– Lines 119–121:uv build --wheel --out-dir /workspace/dist …
– Lines 124–126:SAVE ARTIFACT /workspace/dist/*.whlPlease refactor these paths to
/workspace/wheelhouse/so that all wheel-producing and consuming steps are consistent.docs/components/backends/trtllm/kv-cache-tranfer.md (1)
1-1: Fix filename typo to avoid broken links: “kv-cache-tranfer.md” ➜ “kv-cache-transfer.md”“tranfer” is misspelled. This is user-facing and likely referenced elsewhere; keeping the misspelling risks 404s and SEO issues.
Rename and update inbound references:
#!/bin/bash set -euo pipefail # Rename the file git mv docs/components/backends/trtllm/kv-cache-tranfer.md docs/components/backends/trtllm/kv-cache-transfer.md # Update references across the repo rg -nP 'kv-cache-tranfer\.md' -S -g '!**/node_modules/**' -g '!**/dist/**' -g '!**/build/**' sed -i.bak 's/kv-cache-tranfer\.md/kv-cache-transfer.md/g' $(rg -lP 'kv-cache-tranfer\.md' -S -g '!**/node_modules/**' -g '!**/dist/**' -g '!**/build/**') find . -name "*.bak" -deleteIf your docs tooling supports redirects (e.g., Sphinx with redirects or mkdocs-redirects), consider adding one from the old path to the new path to keep external links working. Do you want me to add a redirect config in this PR?
deploy/cloud/helm/deploy.sh (1)
167-173: Do not print generated-values.yaml; it may leak secrets.Immediately after envsubst, the script cats generated-values.yaml. If the values file contains credentials (docker or pipelines), this prints them to stdout and potentially CI logs. Remove the dump or print a redacted version.
Apply:
-echo "generated file contents:" -cat generated-values.yaml +echo "Generated values file created at generated-values.yaml" +echo "(contents withheld to avoid leaking credentials; open locally if needed)"container/Dockerfile.vllm (1)
363-377: Gate the cargo build features with ENABLE_KVBM to avoid feature skewYou correctly gate maturin with ENABLE_KVBM, but the prior cargo build always enables dynamo-llm/block-manager. This can produce mismatched artifacts when ENABLE_KVBM=false.
-RUN cargo build \ - --release \ - --locked \ - --features dynamo-llm/block-manager \ - --workspace +RUN if [ "$ENABLE_KVBM" = "true" ]; then \ + cargo build --release --locked --features dynamo-llm/block-manager --workspace; \ + else \ + cargo build --release --locked --workspace; \ + fi
|
|
||
| [workspace.package] | ||
| version = "0.4.0" | ||
| version = "0.4.0+post0" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Workspace version bump may desync local crate versions
Bumping [workspace.package].version to 0.4.0+post0 while workspace.dependencies pin local crates at 0.4.0 can cause version mismatches if member crates inherit the workspace version or if any member’s package.version is updated without updating these dependency constraints.
Choose one consistent approach:
- Option A (quick): keep workspace at
0.4.0for now. - Option B (preferred): bump local crate versions and dependency pins to
0.4.0+post0.
If going with Option B, update the local workspace dependency versions:
[workspace.dependencies]
# Local crates
-dynamo-runtime = { path = "lib/runtime", version = "0.4.0" }
-dynamo-llm = { path = "lib/llm", version = "0.4.0" }
-dynamo-tokens = { path = "lib/tokens", version = "0.4.0" }
+dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" }
+dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" }
+dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" }I can also generate a script to scan member crates for version.workspace = true and align everything if helpful.
🏁 Script executed:
#!/bin/bash
set -eux
# Inspect root Cargo.toml for workspace.package version
echo "===== [workspace.package] block in root Cargo.toml ====="
sed -n -n '/^\[workspace.package\]/,/^\[/{p}' Cargo.toml || true
# Inspect root Cargo.toml for workspace.dependencies block
echo -e "\n===== [workspace.dependencies] block in root Cargo.toml ====="
sed -n -n '/^\[workspace.dependencies\]/,/^\[/{p}' Cargo.toml || true
# Inspect member crate Cargo.toml files
for crate in lib/runtime lib/llm lib/tokens; do
echo -e "\n===== $crate/Cargo.toml ====="
if [ -f "$crate/Cargo.toml" ]; then
# Show [package] block
echo "---- [package] block ----"
sed -n -n '/^\[package\]/,/^\[/{p}' "$crate/Cargo.toml"
# Show [dependencies] block
echo "---- [dependencies] block ----"
sed -n -n '/^\[dependencies\]/,/^\[/{p}' "$crate/Cargo.toml"
else
echo "$crate/Cargo.toml not found"
fi
doneLength of output: 8765
Action Required: Align local crate dependency versions with the bumped workspace version
The root Cargo.toml now sets
[workspace.package]
version = "0.4.0+post0"
but under
[workspace.dependencies]
the local crates are still pinned at 0.4.0. Since each member crate uses version.workspace = true, their actual version becomes 0.4.0+post0, which will conflict with the pinned 0.4.0 in workspace.dependencies. You have two choices:
- Option A: Revert the workspace version bump back to
"0.4.0". - Option B (preferred): Update the local crate entries in
[workspace.dependencies]to match0.4.0+post0.
If you proceed with Option B, apply this diff in root Cargo.toml:
[workspace.dependencies]
# Local crates
-dynamo-runtime = { path = "lib/runtime", version = "0.4.0" }
-dynamo-llm = { path = "lib/llm", version = "0.4.0" }
-dynamo-tokens = { path = "lib/tokens", version = "0.4.0" }
+dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" }
+dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" }
+dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" }• Root manifest: Cargo.toml
• Lines: update the three dynamo-* entries under [workspace.dependencies]
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| version = "0.4.0+post0" | |
| [workspace.dependencies] | |
| # Local crates | |
| dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" } | |
| dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" } | |
| dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" } |
🤖 Prompt for AI Agents
In Cargo.toml around line 18, the workspace package version was bumped to
"0.4.0+post0" but the local crate entries under [workspace.dependencies] remain
pinned to "0.4.0", causing a version mismatch; update the three dynamo-* entries
under [workspace.dependencies] to use version "0.4.0+post0" (or alternatively
revert the workspace.package.version to "0.4.0" if you prefer Option A).
| First, create a secret for the HuggingFace token. | ||
| ```bash | ||
| export HF_TOKEN=your_hf_token | ||
| kubectl create secret generic hf-token-secret \ | ||
| --from-literal=HF_TOKEN=${HF_TOKEN} \ | ||
| -n ${NAMESPACE} | ||
| ``` | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Define $NAMESPACE before use.
Add export to prevent kubectl errors when copying commands.
First, create a secret for the HuggingFace token.
```bash
+export NAMESPACE=<your-k8s-namespace>
export HF_TOKEN=your_hf_token
kubectl create secret generic hf-token-secret \
--from-literal=HF_TOKEN=${HF_TOKEN} \
-n ${NAMESPACE}
<details>
<summary>🤖 Prompt for AI Agents</summary>
In components/backends/sglang/deploy/README.md around lines 109 to 116, the
snippet uses ${NAMESPACE} but never defines or exports it and could cause
kubectl errors when copied; update the docs to instruct users to define and
export the NAMESPACE variable first (e.g., export
NAMESPACE=) before exporting HF_TOKEN and running kubectl
create secret so the commands work when copied into a shell.
</details>
<!-- fingerprinting:phantom:triton:chinchilla -->
<!-- This is an auto-generated comment by CodeRabbit -->
| # Running DeepSeek-R1 Disaggregated with WideEP on H100s | ||
|
|
||
| Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs). | ||
| Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Inconsistent Dockerfile name: “sglang-deepep” vs build command using “sglang-wideep”.
The text references container/Dockerfile.sglang-deepep, but the build command uses Dockerfile.sglang-wideep. Align the filename to avoid user confusion.
-... We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` ...
+... We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` ...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs). | |
| Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs). |
🤖 Prompt for AI Agents
In components/backends/sglang/docs/dsr1-wideep-h100.md around line 8 the
referenced Dockerfile name is inconsistent: the prose mentions
container/Dockerfile.sglang-deepep while the build command uses
Dockerfile.sglang-wideep; update one to match the other so they are identical
(choose the canonical filename used in the repo, e.g., rename references to
container/Dockerfile.sglang-wideep if that file exists or change the build
command to use container/Dockerfile.sglang-deepep) and ensure all occurrences in
this document use the same filename to avoid confusion.
| | Feature | SGLang | Notes | | ||
| |---------------------|--------|--------------------------------------------------------------| | ||
| | **WideEP** | ✅ | Full support on H100s/GB200 | | ||
| | **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker | | ||
| | **GB200 Support** | ✅ | | | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix grammar: “router” → “route” to DP worker.
Small but user‑visible in the Feature Matrix.
-| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker |
+| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not route to DP worker |📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| | Feature | SGLang | Notes | | |
| |---------------------|--------|--------------------------------------------------------------| | |
| | **WideEP** | ✅ | Full support on H100s/GB200 | | |
| | **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker | | |
| | **GB200 Support** | ✅ | | | |
| | Feature | SGLang | Notes | | |
| |---------------------|--------|--------------------------------------------------------------| | |
| | **WideEP** | ✅ | Full support on H100s/GB200 | | |
| | **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not route to DP worker | | |
| | **GB200 Support** | ✅ | | |
🧰 Tools
🪛 LanguageTool
[grammar] ~49-~49: There might be a mistake here.
Context: ...KV router does not router to DP worker | | GB200 Support | ✅ | ...
(QB_NEW_EN)
🤖 Prompt for AI Agents
In components/backends/sglang/README.md around lines 46 to 51, the
feature-matrix note uses the incorrect verb "router" in "Dynamo KV router does
not router to DP worker"; update the phrase to use the correct verb "route"
(e.g., "Dynamo KV router does not route to DP worker") and ensure surrounding
punctuation/capitalization remains consistent with the table style.
| Below we provide a guide that lets you run all of our the common deployment patterns on a single node. | ||
| ### Start NATS and ETCD in the background |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix wording: extra “the” in Quick Start intro.
-Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
+Below we provide a guide that lets you run all of our common deployment patterns on a single node.📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| Below we provide a guide that lets you run all of our the common deployment patterns on a single node. | |
| ### Start NATS and ETCD in the background | |
| Below we provide a guide that lets you run all of our common deployment patterns on a single node. | |
| ### Start NATS and ETCD in the background |
🧰 Tools
🪛 LanguageTool
[grammar] ~55-~55: There might be a mistake here.
Context: ...on deployment patterns on a single node. ### Start NATS and ETCD in the background S...
(QB_NEW_EN)
🤖 Prompt for AI Agents
components/backends/sglang/README.md around lines 55 to 56: The Quick Start
intro contains an extra definite article ("the the") — update the sentence
"Below we provide a guide that lets you run all of our the common deployment
patterns on a single node." to remove the duplicated "the" so it reads correctly
(e.g., "Below we provide a guide that lets you run all of our common deployment
patterns on a single node.").
| For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md). | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Broken link: kv-cache-tranfer.md → kv-cache-transfer.md
Typo in the filename “tranfer” breaks the link.
Apply:
-For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).
+For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md). | |
| For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md). |
🤖 Prompt for AI Agents
In docs/components/backends/trtllm/deploy/README.md around lines 240 to 241, the
link references "../kv-cache-tranfer.md" which contains a typo in the filename;
update the link target to "../kv-cache-transfer.md" so it points to the correct
file name and verify the linked file exists and the relative path is correct.
| kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000 | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fix port mismatch between port-forward and curl example.
You port-forward 8000:8000 but later curl localhost:8080. Recommend forwarding 8080:8000 to keep curl unchanged.
-kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
+kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000 | |
| ``` | |
| kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000 |
🤖 Prompt for AI Agents
In docs/components/backends/vllm/deploy/README.md around lines 167 to 168, the
kubectl port-forward command forwards 8000:8000 while the curl example targets
localhost:8080; update the port-forward to kubectl port-forward
deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000 so local port 8080
maps to container port 8000 (or alternatively change the curl to use :8000) to
make the example ports consistent.
| ### Multi-node Disaggregated Serving | ||
|
|
||
| Deploy prefill and decode workers on separate nodes for optimized resource utilization: | ||
|
|
||
| **Node 1**: Run ingress and prefill workers | ||
| ```bash | ||
| # Start ingress | ||
| python -m dynamo.frontend --router-mode kv & | ||
|
|
||
| # Start prefill worker | ||
| python -m dynamo.vllm \ | ||
| --model meta-llama/Llama-3.3-70B-Instruct | ||
| --tensor-parallel-size 8 \ | ||
| --enforce-eager | ||
| ``` | ||
|
|
||
| **Node 2**: Run decode workers | ||
| ```bash | ||
| # Start decode worker | ||
| python -m dynamo.vllm \ | ||
| --model meta-llama/Llama-3.3-70B-Instruct | ||
| --tensor-parallel-size 8 \ | ||
| --enforce-eager \ | ||
| --is-prefill-worker | ||
| ``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Decode worker is started with --is-prefill-worker (wrong flag)
Node 2 is labeled “decode worker” but passes --is-prefill-worker. That will misroute roles.
Apply this diff to fix roles and make flags explicit on both ends:
-# Start prefill worker
-python -m dynamo.vllm \
- --model meta-llama/Llama-3.3-70B-Instruct
- --tensor-parallel-size 8 \
- --enforce-eager
+# Start prefill worker
+python -m dynamo.vllm \
+ --model meta-llama/Llama-3.3-70B-Instruct \
+ --tensor-parallel-size 8 \
+ --enforce-eager \
+ --is-prefill-worker
@@
-# Start decode worker
-python -m dynamo.vllm \
- --model meta-llama/Llama-3.3-70B-Instruct
- --tensor-parallel-size 8 \
- --enforce-eager \
- --is-prefill-worker
+# Start decode worker
+python -m dynamo.vllm \
+ --model meta-llama/Llama-3.3-70B-Instruct \
+ --tensor-parallel-size 8 \
+ --enforce-eager📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ### Multi-node Disaggregated Serving | |
| Deploy prefill and decode workers on separate nodes for optimized resource utilization: | |
| **Node 1**: Run ingress and prefill workers | |
| ```bash | |
| # Start ingress | |
| python -m dynamo.frontend --router-mode kv & | |
| # Start prefill worker | |
| python -m dynamo.vllm \ | |
| --model meta-llama/Llama-3.3-70B-Instruct | |
| --tensor-parallel-size 8 \ | |
| --enforce-eager | |
| ``` | |
| **Node 2**: Run decode workers | |
| ```bash | |
| # Start decode worker | |
| python -m dynamo.vllm \ | |
| --model meta-llama/Llama-3.3-70B-Instruct | |
| --tensor-parallel-size 8 \ | |
| --enforce-eager \ | |
| --is-prefill-worker | |
| ``` | |
| ### Multi-node Disaggregated Serving | |
| Deploy prefill and decode workers on separate nodes for optimized resource utilization: | |
| **Node 1**: Run ingress and prefill workers |
🤖 Prompt for AI Agents
In docs/components/backends/vllm/multi-node.md around lines 72–96, the Node 2
"decode worker" example incorrectly uses the --is-prefill-worker flag which will
misroute roles; update the examples to make flags explicit by adding
--is-prefill-worker to the Node 1 prefill worker invocation and replacing the
incorrect --is-prefill-worker on Node 2 with the correct --is-decode-worker (or
simply remove the prefill flag there) so Node 1 is explicitly a prefill worker
and Node 2 is explicitly a decode worker.
| :orphan: | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🛠️ Refactor suggestion
MyST/Sphinx: replace :orphan: with supported front matter in Markdown
In Markdown (MyST), :orphan: may not be interpreted and can leak into the rendered page. Use YAML front matter instead:
Apply:
-:orphan:
-
+---
+orphan: true
+---Also ensure this page is reachable via your toctree or intentionally “hidden” via nav; otherwise Sphinx may still warn depending on your build config.
📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| :orphan: | |
| --- | |
| orphan: true | |
| --- |
🤖 Prompt for AI Agents
In docs/guides/dynamo_deploy/gke_setup.md lines 1-2, remove the standalone
":orphan:" directive and replace it with YAML front matter at the top of the
file declaring the page as orphaned (add a YAML block containing an orphan: true
key), then ensure this page is either linked from your toctree or intentionally
excluded/hidden in your site nav so Sphinx doesn't emit warnings during the
build.
| See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#build-container) to build the dynamo container with TensorRT-LLM. | ||
|
|
||
| ##### Step 2: Run the environment | ||
|
|
||
| See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#run-container) to run the built environment. | ||
| See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#run-container) to run the built environment. | ||
|
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
💡 Verification agent
🧩 Analysis chain
Updated TRT-LLM links — verify anchors exist in README.
Ensure #build-container and #run-container headings exist in components/backends/trtllm/README.md so these links don’t 404.
🏁 Script executed:
#!/bin/bash
set -euo pipefail
file="components/backends/trtllm/README.md"
echo "Checking anchors in $file"
rg -nP '^\s*#{2,}\s*Build the environment\b' "$file" || echo "Missing 'Build the environment' heading"
rg -nP '^\s*#{2,}\s*Run the environment\b' "$file" || echo "Missing 'Run the environment' heading"Length of output: 538
Add missing anchors in components/backends/trtllm/README.md
The links in docs/guides/dynamo_run.md (lines 485–490) point to #build-container and #run-container, but those anchors don’t exist in components/backends/trtllm/README.md, so they will 404. Please update the README to include matching headings or explicit anchors. For example:
• In components/backends/trtllm/README.md, add:
### Build Container
<!-- or: <a name="build-container"></a> -->
Steps to build the Dynamo container with TensorRT-LLM…• And:
### Run Container
<!-- or: <a name="run-container"></a> -->
Instructions to run the built environment…Alternatively, adjust the links in dynamo_run.md to match whatever headings are already present.
🤖 Prompt for AI Agents
In docs/guides/dynamo_run.md around lines 485–490 the links target anchors
#build-container and #run-container that don't exist in
components/backends/trtllm/README.md; fix by either adding matching
anchors/heading blocks to components/backends/trtllm/README.md (e.g., add "###
Build Container" and "### Run Container" headings or explicit <a
name="build-container"></a> / <a name="run-container"></a> anchors with the
corresponding build/run instructions) or by updating the links in
docs/guides/dynamo_run.md to point to the actual existing headings in
components/backends/trtllm/README.md so they resolve correctly.
Overview:
Cherry-pick: #2625
Prevent crashloops in helloworld k8s example
nvbug: https://nvbugspro.nvidia.com/bug/5471412
closes: DYN-926
Summary by CodeRabbit