fix: prevent crash looping hello world #2625 #2670

biswapanda · 2025-08-22T23:30:01Z

Overview:

Cherry-pick: #2625

Prevent crashloops in helloworld k8s example

nvbug: https://nvbugspro.nvidia.com/bug/5471412
closes: DYN-926

Summary by CodeRabbit

New Features
- Added TensorRT-LLM Kubernetes deployment templates (agg, router, disagg, disagg-router).
- Operator now supports Grove termination delay and auto-detects Grove availability.
- New multi-stage Dockerfile and build flag to enable KVBM; updated images and runtime tooling.
Documentation
- Major docs revamp: new backend guides (vLLM, SGLang, TRT-LLM), Kubernetes deploy guides, multinode examples, Quickstart/Helm updates, and Support Matrix.
Bug Fixes
- More robust config handling and numerous link/path corrections across docs and scripts.
Chores
- Version bump to 0.4.0+post0.
- Expanded third-party license attributions.

Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: Ishan Dhanani <idhanani@nvidia.com>

Co-authored-by: Tanmay Verma <tanmayv@nvidia.com>

Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

…link (#2239) Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

…2190) (#2237)

…0rc4 (#2233)

#2278)

Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> Co-authored-by: Anish <80174047+athreesh@users.noreply.github.com>

…2260) Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

…#2322)

…#2319) Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> Co-authored-by: Kristen Kelleher <kkelleher@nvidia.com> Co-authored-by: Biswa Panda <biswa.panda@gmail.com> Co-authored-by: Neal Vaidya <nealv@nvidia.com>

Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>

copy-pr-bot · 2025-08-22T23:30:04Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-22T23:48:56Z

Walkthrough

This PR adds extensive documentation, deployment manifests, and container build updates across backends (vLLM, SGLang, TensorRT-LLM), introduces ETCD-backed port allocation for vLLM, refactors Grove settings in the operator/Helm, updates TRT-LLM engine configs, and revises numerous docs/links. Minor code changes include safer dict access in TRT-LLM and operator wiring for Grove detection/termination delay.

Changes

Cohort / File(s)	Summary
Licenses `ATTRIBUTIONS-Go.md`	Appends many third‑party license attributions (MIT/Apache/BSD/ISC); documentation only.
Versioning `Cargo.toml`	Bumps workspace version to 0.4.0+post0.
Top-level README and Docs Restructure `README.md`, `docs/index.rst`, `docs/hidden_toctree.rst`, `docs/guides/dynamo_deploy/`, `docs/support_matrix.md`, `docs/examples/`, `docs/runtime/README.md`, `docs/architecture/*`, `deploy/helm/README.md`, `deploy/cloud/README.md`	Reorganizes navigation, adds support matrix link, updates guides/paths, quickstart version bump, new Helm guide, link fixes, adds Hello World example README.
SGLang Docs & Manifests `components/backends/sglang/README.md`, `.../deploy/README.md`, `.../deploy/disagg.yaml`, `.../docs/`, `.../launch/agg_router.sh`, `.../slurm_jobs/`	Path/link fixes, Quick Start/containers, multi-node/WideEP docs, decode worker entrypoint update, Docker-compose/run examples, SLURM docs and template/env tweaks.
SGLang Containers `container/Dockerfile.sglang`, `container/Dockerfile.sglang-wideep`	Update NIXL refs, add jq/curl, adjust PYTHONPATH/removals, add flashinfer pre-release and requirements installs; non-editable install in wideep.
TRT-LLM K8s Deploy & Docs `components/backends/trtllm/README.md`, `.../deploy/.yaml`, `docs/components/backends/trtllm/`	Adds agg/agg_router/disagg/disagg_router manifests, new README and deploy guide, KV transfer doc link tweak.
TRT-LLM Engine Configs `components/backends/trtllm/engine_configs/*/`	Normalize CUDA graph config under cuda_graph_config, move kv_cache_dtype to kv_cache_config.dtype, add cache_transceiver_config, moe_config restructure, value tweaks (e.g., free_gpu_memory_fraction).
TRT-LLM Runtime Code `components/backends/trtllm/src/dynamo/trtllm/main.py`	Change kv_cache_config event buffer check to dict-key presence; preserves falsy explicit values.
TRT-LLM Container `container/Dockerfile.tensorrt_llm`	Update NIXL ref, runtime env (ARCH_ALT, LD_LIBRARY_PATH/PATH), add jq, wheelhouse path changes, pin Triton 3.3.1 on amd64, add env script and CI artifacts.
vLLM Runtime – Ports `components/backends/vllm/src/dynamo/vllm/ports.py`, `.../args.py`, `docs/components/backends/vllm/*`, `components/backends/vllm/README.md`	Introduces ETCD-backed port allocation (single/block), host IP util, CLI port range, removes local alloc funcs; updates docs and README.
vLLM Container & Install `container/Dockerfile.vllm`, `container/deps/vllm/install_vllm.sh`	Add ENABLE_KVBM build-arg, pin DeepGEMM/NIXL, switch install modes, add jq; vLLM/FlashInfer install flow updates and Torch pin for amd64.
Generic Container Infra `container/Dockerfile` (new), `container/Dockerfile.none` (removed), `container/build.sh`	New multi-stage base/wheel/dev Dockerfile; drop old NONE Dockerfile; add ENABLE_KVBM flag; update defaults (TRT-LLM, NIXL, base images), paths, and build args.
Operator & Helm – Grove Refactor `deploy/cloud/operator/cmd/main.go`, `.../internal/controller_common/`, `.../internal/dynamo/`, `.../helm/*`	Replace enableGrove with GroveConfig{Enabled, TerminationDelay}, auto-detect Grove via discovery, propagate terminationDelay to PodGangSet, adjust Helm values/templates/RBAC and deploy script.
Inference Gateway Docs `deploy/inference-gateway/README.md`	Adds explicit model deploy step and GAIE chart install instructions; quickstart link fix.
Minor Doc Fixes/Removals `benchmarks/llm/README.md`, `components/README.md`, `components/backends/llama_cpp/README.md`, `docs/API/nixl_connect/*`, `docs/components/backends/llm/README.md`, `docs/guides/dynamo_deploy/operator_deployment.md`, `docs/architecture/kv_cache_routing.md`, `examples/basics/disaggregated_serving/README.md`	Link/path corrections, small wording changes, remove obsolete pages/entries.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Worker as vLLM Worker
  participant Args as CLI Args Parser
  participant Ports as Ports Module
  participant ETCD as Etcd KV
  participant OS as Host OS

  Args->>Worker: parse(--port-min/--port-max)
  Worker->>Ports: configure_ports_with_etcd(port_range, etcd_ctx, tp/dp)
  alt KV events enabled
    Ports->>ETCD: reserve kv_port (metadata)
  end
  Ports->>OS: check_port_available()
  loop allocate block (tp or 1)
    Ports->>ETCD: reserve contiguous block (metadata with block_info)
    ETCD-->>Ports: success/fail
  end
  Ports-->>Worker: kv_port, side_channel_port(s)
  Worker->>OS: set env (host_ip, ports)

sequenceDiagram
  autonumber
  participant Main as Operator Main
  participant Disc as K8s Discovery
  participant Cfg as GroveConfig
  participant Reconciler as Controller
  participant Grove as Grove PodGangSet

  Main->>Cfg: set TerminationDelay (flag/default)
  Main->>Disc: DetectGroveAvailability()
  Disc-->>Main: true/false
  Main->>Cfg: Enabled = result
  alt Grove Enabled
    Reconciler->>Grove: Reconcile with TerminationDelay
  else
    Reconciler->>Reconciler: Reconcile Dynamo components only
  end

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

Possibly related PRs

fix: handle groveTerminationDelay and auto-detect grove installation #2190 — Grove refactor: replaces enableGrove with GroveConfig and auto-detection; aligns with operator/Helm changes here.
fix: add better port logic #2175 — vLLM ETCD-backed port allocation introducing ports.py and args.py wiring; matches this PR’s port range/ETCD logic.
feat: Add trtllm deploy examples for k8s #2133 #2207 — Adds TensorRT-LLM deploy manifests and docs; overlaps with new TRTLLM deploy YAMLs and README.

Poem

I tap my paw on ports that glow,
ETCD whispers, “Here you go.”
Grove trees sway, a timed goodnight,
WideEP moons sail through the night.
Wheels are baked, configs sing,
Frontends hop—request on wing.
Ship it, ship it—thump-thump-spring! 🐇🚀

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 33

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (12)

components/backends/sglang/launch/agg_router.sh (1)
14-20: Fail-fast on namespace clear and interpreter consistency for frontend.

The retrieved learning explicitly states clear_namespace must block router startup if it fails. Currently, the script proceeds even if the clear step exits non‑zero (no set -e and no &&). Also, the frontend is launched with python while the rest uses python3. Aligning interpreters prevents PATH/env drift issues.

Apply this diff to enforce fail-fast and interpreter consistency:
@@
-# run clear_namespace
-python3 -m dynamo.sglang.utils.clear_namespace --namespace dynamo
+# run clear_namespace (must succeed before starting the frontend)
+python3 -m dynamo.sglang.utils.clear_namespace --namespace dynamo || {
+  echo "[ERROR] clear_namespace failed; aborting startup." >&2
+  exit 1
+}
@@
-# run ingress
-python -m dynamo.frontend --router-mode kv --http-port=8000 &
+# run ingress
+python3 -m dynamo.frontend --router-mode kv --http-port=8000 &
container/Dockerfile.sglang-wideep (1)
108-114: Supply-chain risk: rustup-init downloaded without integrity verification.

An executable is fetched and run without checksum/signature validation. This is a critical supply-chain risk.

Apply this diff to add SHA256 verification (Rust provides .sha256 files per artifact):
 ARG RUSTARCH=${ARCH_ALT}-unknown-linux-gnu
@@
 RUN wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init" && \
-    # TODO: Add SHA check back based on RUSTARCH
+    wget --tries=3 --waitretry=5 "https://static.rust-lang.org/rustup/archive/1.28.1/${RUSTARCH}/rustup-init.sha256" && \
+    # Verify checksum before executing
+    echo "$(cat rustup-init.sha256)  rustup-init" | sha256sum -c - && \
     chmod +x rustup-init && \
     ./rustup-init -y --no-modify-path --profile minimal --default-toolchain $RUST_VERSION --default-host ${RUSTARCH} && \
-    rm rustup-init && \
+    rm rustup-init rustup-init.sha256 && \
     chmod -R a+w $RUSTUP_HOME $CARGO_HOME
deploy/inference-gateway/README.md (3)
41-45: Typo: “Inferenece” → “Inference”.

User-facing typo in the CRD description.
-b. Install the Inference Extension CRDs (Inferenece Model and Inference Pool CRDs)
+b. Install the Inference Extension CRDs (InferenceModel and InferencePool CRDs)
84-91: Inconsistency: referenced config file vs. helm -f value.

Text mentions inference-gateway-resources.yaml but the helm command uses vllm_agg_qwen.yaml. Align the filename or clarify roles (values vs. manifest).
-The Inference Gateway is configured through the `inference-gateway-resources.yaml` file.
+The Inference Gateway is configured via a values file (e.g., `vllm_agg_qwen.yaml`) passed to the Helm chart.
@@
-helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml
+helm install dynamo-gaie ./helm/dynamo-gaie -n my-model -f ./vllm_agg_qwen.yaml
If there is indeed a standalone manifest named inference-gateway-resources.yaml, either rename this reference or add a note explaining how it differs from the values file.

137-146: Fix typos and broken kubectl command when populating GATEWAY_URL.

“User” → “Use”; “alternateive” → “alternative”.

kubectl uses two -o flags; only the last one applies. This breaks the command.
- a. User minikube tunnel to expose the gateway to the host
-    This requires `sudo` access to the host machine. alternatively, you can use port-forward to expose the gateway to the host as shown in alternateive (b).
+ a. Use minikube tunnel to expose the gateway to the host
+    This requires `sudo` access to the host machine. Alternatively, you can use port-forward to expose the gateway to the host as shown in alternative (b).
@@
-GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o yaml -o jsonpath='{.spec.clusterIP}')
+GATEWAY_URL=$(kubectl get svc inference-gateway -n my-model -o jsonpath='{.spec.clusterIP}')
ATTRIBUTIONS-Go.md (1)

10381-10826: Update SPDX identifiers and add code fence languages

I ran the provided verification script and found the following issues:

Non-SPDX identifiers (must be updated to “Apache-2.0”):
• Line 10411 (sigs.k8s.io/randfill)
• Line 10620 (github.com/NVIDIA/grove/operator/api)

Fenced code blocks without a language (MD040):
There are numerous occurrences of bare fences throughout ATTRIBUTIONS-Go.md. All license blocks should use a language tag (e.g.text) for consistency and to satisfy markdown linters.

The license bodies themselves (MIT for blang/semver/v4 and Apache 2.0 for the other two) match the standard upstream texts and require no content changes, but please manually confirm they exactly match the source repositories.

Next steps:

Replace
License Identifier: Apache 2.0
with
License Identifier: Apache-2.0
in both newly-added Apache blocks.

Update all fences around license texts totext.

(Optional) Enhance the generator to:

Emit SPDX-normalized identifiers (MIT, Apache-2.0, etc.).

Always include a language on code fences (e.g. ```text).

Fail the generation step if any license text retrieval fails.
components/backends/sglang/README.md (1)
119-121: Typo: “conjuction” → “conjunction”.
-Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead, we first route to a random prefill worker, select a random decode worker, and then send the request to both. Internally, SGLang's bootstrap server (which is a part of the `tokenizer_manager`) is used in conjuction with NIXL to handle the kv transfer.
+Because Dynamo has a discovery mechanism, we do not use a load balancer. Instead, we first route to a random prefill worker, select a random decode worker, and then send the request to both. Internally, SGLang's bootstrap server (which is a part of the `tokenizer_manager`) is used in conjunction with NIXL to handle the KV transfer.
container/Dockerfile.tensorrt_llm (2)
129-139: Supply-chain hardening: add checksum/signature verification for downloaded artifacts.

wget/curl downloads for NATS, etcd, and rustup-init occur without integrity checks.

Fetch and verify SHA256 (or GPG signature) for each artifact.

Example for etcd:
-RUN wget https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
+RUN set -euo pipefail; \
+    wget -q https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz -O /tmp/etcd.tar.gz && \
+    wget -q https://github.com/etcd-io/etcd/releases/download/$ETCD_VERSION/sha256sum.txt -O /tmp/etcd.sha256 && \
+    (cd /tmp && grep "etcd-$ETCD_VERSION-linux-${ARCH}.tar.gz" etcd.sha256 | sha256sum -c -) && \
     mkdir -p /usr/local/bin/etcd && \
     tar -xvf /tmp/etcd.tar.gz -C /usr/local/bin/etcd --strip-components=1 && \
     rm /tmp/etcd.tar.gz
Do similarly for NATS and rustup.

Also applies to: 205-213, 263-268

145-175: Ensure all wheel references point to /workspace/wheelhouse/ instead of legacy paths

The repository still contains numerous references to /workspace/dist and /workspace/wheels/nixl that must be updated now that artifacts are staged under /workspace/wheelhouse/. Please update downstream CI/job scripts, Dockerfiles, and the Earthfile to use the new location:

• container/Dockerfile.vllm
– Lines 178–185: uv build . --out-dir /workspace/wheels/nixl and uv pip install /workspace/wheels/nixl/*.whl
– Lines 370–378: uv build --wheel --out-dir /workspace/dist …
– Lines 409–410: uv pip install /workspace/dist/…
– Line 474: COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/ (should use /workspace/wheelhouse/)

• container/Dockerfile.tensorrt_llm
– Lines 234–241: uv build . --out-dir /workspace/wheels/nixl and uv pip install /workspace/wheels/nixl/*.whl
– Lines 301–307: uv build --wheel --out-dir /workspace/dist …
– Lines 332–333: uv pip install /workspace/dist/…
– Lines 497–498: COPY --from=dev /workspace/wheels/nixl/*.whl /workspace/wheelhouse/ and COPY --from=wheel_builder /workspace/dist/*.whl /workspace/wheelhouse/

• container/Dockerfile.sglang
– Lines 161–168: uv build . --out-dir /workspace/wheels/nixl and uv pip install /workspace/wheels/nixl/*.whl
– Lines 338–343: uv build --wheel --out-dir /workspace/dist …
– Lines 373–374: uv pip install /workspace/dist/…
– Lines 449–450: COPY --from=wheel_builder /workspace/dist/*.whl wheelhouse/ and COPY --from=base /workspace/wheels/nixl/*.whl wheelhouse/

• Earthfile
– Lines 119–121: uv build --wheel --out-dir /workspace/dist …
– Lines 124–126: SAVE ARTIFACT /workspace/dist/*.whl

Please refactor these paths to /workspace/wheelhouse/ so that all wheel-producing and consuming steps are consistent.
docs/components/backends/trtllm/kv-cache-tranfer.md (1)
1-1: Fix filename typo to avoid broken links: “kv-cache-tranfer.md” ➜ “kv-cache-transfer.md”

“tranfer” is misspelled. This is user-facing and likely referenced elsewhere; keeping the misspelling risks 404s and SEO issues.

Rename and update inbound references:
#!/bin/bash
set -euo pipefail

# Rename the file
git mv docs/components/backends/trtllm/kv-cache-tranfer.md docs/components/backends/trtllm/kv-cache-transfer.md

# Update references across the repo
rg -nP 'kv-cache-tranfer\.md' -S -g '!**/node_modules/**' -g '!**/dist/**' -g '!**/build/**'
sed -i.bak 's/kv-cache-tranfer\.md/kv-cache-transfer.md/g' $(rg -lP 'kv-cache-tranfer\.md' -S -g '!**/node_modules/**' -g '!**/dist/**' -g '!**/build/**')
find . -name "*.bak" -delete
If your docs tooling supports redirects (e.g., Sphinx with redirects or mkdocs-redirects), consider adding one from the old path to the new path to keep external links working. Do you want me to add a redirect config in this PR?
deploy/cloud/helm/deploy.sh (1)
167-173: Do not print generated-values.yaml; it may leak secrets.

Immediately after envsubst, the script cats generated-values.yaml. If the values file contains credentials (docker or pipelines), this prints them to stdout and potentially CI logs. Remove the dump or print a redacted version.

Apply:
-echo "generated file contents:"
-cat generated-values.yaml
+echo "Generated values file created at generated-values.yaml"
+echo "(contents withheld to avoid leaking credentials; open locally if needed)"
container/Dockerfile.vllm (1)
363-377: Gate the cargo build features with ENABLE_KVBM to avoid feature skew

You correctly gate maturin with ENABLE_KVBM, but the prior cargo build always enables dynamo-llm/block-manager. This can produce mismatched artifacts when ENABLE_KVBM=false.
-RUN cargo build \
-	--release \
-	--locked \
-	--features dynamo-llm/block-manager \
-	--workspace
+RUN if [ "$ENABLE_KVBM" = "true" ]; then \
+      cargo build --release --locked --features dynamo-llm/block-manager --workspace; \
+    else \
+      cargo build --release --locked --workspace; \
+    fi

coderabbitai · 2025-08-22T23:49:00Z

Cargo.toml


 [workspace.package]
-version = "0.4.0"
+version = "0.4.0+post0"


💡 Verification agent

🧩 Analysis chain

Workspace version bump may desync local crate versions

Bumping [workspace.package].version to 0.4.0+post0 while workspace.dependencies pin local crates at 0.4.0 can cause version mismatches if member crates inherit the workspace version or if any member’s package.version is updated without updating these dependency constraints.

Choose one consistent approach:

Option A (quick): keep workspace at 0.4.0 for now.

Option B (preferred): bump local crate versions and dependency pins to 0.4.0+post0.

If going with Option B, update the local workspace dependency versions:

[workspace.dependencies] # Local crates -dynamo-runtime = { path = "lib/runtime", version = "0.4.0" } -dynamo-llm = { path = "lib/llm", version = "0.4.0" } -dynamo-tokens = { path = "lib/tokens", version = "0.4.0" } +dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" } +dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" } +dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" }

I can also generate a script to scan member crates for version.workspace = true and align everything if helpful.

🏁 Script executed:

#!/bin/bash set -eux # Inspect root Cargo.toml for workspace.package version echo "===== [workspace.package] block in root Cargo.toml =====" sed -n -n '/^\[workspace.package\]/,/^\[/{p}' Cargo.toml || true # Inspect root Cargo.toml for workspace.dependencies block echo -e "\n===== [workspace.dependencies] block in root Cargo.toml =====" sed -n -n '/^\[workspace.dependencies\]/,/^\[/{p}' Cargo.toml || true # Inspect member crate Cargo.toml files for crate in lib/runtime lib/llm lib/tokens; do echo -e "\n===== $crate/Cargo.toml =====" if [ -f "$crate/Cargo.toml" ]; then # Show [package] block echo "---- [package] block ----" sed -n -n '/^\[package\]/,/^\[/{p}' "$crate/Cargo.toml" # Show [dependencies] block echo "---- [dependencies] block ----" sed -n -n '/^\[dependencies\]/,/^\[/{p}' "$crate/Cargo.toml" else echo "$crate/Cargo.toml not found" fi done

Length of output: 8765

Action Required: Align local crate dependency versions with the bumped workspace version

The root Cargo.toml now sets

[workspace.package] version = "0.4.0+post0"

but under

[workspace.dependencies]

the local crates are still pinned at 0.4.0. Since each member crate uses version.workspace = true, their actual version becomes 0.4.0+post0, which will conflict with the pinned 0.4.0 in workspace.dependencies. You have two choices:

Option A: Revert the workspace version bump back to "0.4.0".

Option B (preferred): Update the local crate entries in [workspace.dependencies] to match 0.4.0+post0.

If you proceed with Option B, apply this diff in root Cargo.toml:

[workspace.dependencies] # Local crates -dynamo-runtime = { path = "lib/runtime", version = "0.4.0" } -dynamo-llm = { path = "lib/llm", version = "0.4.0" } -dynamo-tokens = { path = "lib/tokens", version = "0.4.0" } +dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" } +dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" } +dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" }

• Root manifest: Cargo.toml
• Lines: update the three dynamo-* entries under [workspace.dependencies]

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

version = "0.4.0+post0"

[workspace.dependencies]

# Local crates

dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" }

dynamo-llm = { path = "lib/llm", version = "0.4.0+post0" }

dynamo-tokens = { path = "lib/tokens", version = "0.4.0+post0" }

🤖 Prompt for AI Agents

In Cargo.toml around line 18, the workspace package version was bumped to "0.4.0+post0" but the local crate entries under [workspace.dependencies] remain pinned to "0.4.0", causing a version mismatch; update the three dynamo-* entries under [workspace.dependencies] to use version "0.4.0+post0" (or alternatively revert the workspace.package.version to "0.4.0" if you prefer Option A).

coderabbitai · 2025-08-22T23:49:00Z

components/backends/sglang/deploy/README.md

+First, create a secret for the HuggingFace token.
+```bash
+export HF_TOKEN=your_hf_token
+kubectl create secret generic hf-token-secret \
+  --from-literal=HF_TOKEN=${HF_TOKEN} \
+  -n ${NAMESPACE}
+```
+


⚠️ Potential issue

Define $NAMESPACE before use.

Add export to prevent kubectl errors when copying commands.

First, create a secret for the HuggingFace token. ```bash +export NAMESPACE=<your-k8s-namespace> export HF_TOKEN=your_hf_token kubectl create secret generic hf-token-secret \ --from-literal=HF_TOKEN=${HF_TOKEN} \ -n ${NAMESPACE}

<details> <summary>🤖 Prompt for AI Agents</summary>

In components/backends/sglang/deploy/README.md around lines 109 to 116, the
snippet uses ${NAMESPACE} but never defines or exports it and could cause
kubectl errors when copied; update the docs to instruct users to define and
export the NAMESPACE variable first (e.g., export
NAMESPACE=) before exporting HF_TOKEN and running kubectl
create secret so the commands work when copied into a shell.

</details>  

coderabbitai · 2025-08-22T23:49:00Z

components/backends/sglang/docs/dsr1-wideep-h100.md

 # Running DeepSeek-R1 Disaggregated with WideEP on H100s

-Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://www.nvidia.com/en-us/technologies/ai/deepseek-r1-large-scale-p-d-with-wide-expert-parallelism/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).
+Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).


⚠️ Potential issue

Inconsistent Dockerfile name: “sglang-deepep” vs build command using “sglang-wideep”.

The text references container/Dockerfile.sglang-deepep, but the build command uses Dockerfile.sglang-wideep. Align the filename to avoid user confusion.

-... We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` ... +... We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` ...

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-deepep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).

Dynamo supports SGLang's implementation of wide expert parallelism and large scale P/D for DeepSeek-R1! You can read their blog post [here](https://lmsys.org/blog/2025-05-05-large-scale-ep/) for more details. We provide a Dockerfile for this in `container/Dockerfile.sglang-wideep` and configurations to deploy this at scale. In this example, we will run 1 prefill worker on 4 H100 nodes and 1 decode worker on 9 H100 nodes (104 total GPUs).

🤖 Prompt for AI Agents

In components/backends/sglang/docs/dsr1-wideep-h100.md around line 8 the referenced Dockerfile name is inconsistent: the prose mentions container/Dockerfile.sglang-deepep while the build command uses Dockerfile.sglang-wideep; update one to match the other so they are identical (choose the canonical filename used in the repo, e.g., rename references to container/Dockerfile.sglang-wideep if that file exists or change the build command to use container/Dockerfile.sglang-deepep) and ensure all occurrences in this document use the same filename to avoid confusion.

coderabbitai · 2025-08-22T23:49:00Z

components/backends/sglang/README.md

+| Feature             | SGLang | Notes                                                        |
+|---------------------|--------|--------------------------------------------------------------|
+| **WideEP**          | ✅     | Full support on H100s/GB200                                  |
+| **DP Rank Routing** | 🚧     | Direct routing supported. Dynamo KV router does not router to DP worker |
+| **GB200 Support**   | ✅     |                                                              |



⚠️ Potential issue

Fix grammar: “router” → “route” to DP worker.

Small but user‑visible in the Feature Matrix.

-| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker | +| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not route to DP worker |

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

| Feature | SGLang | Notes |

|---------------------|--------|--------------------------------------------------------------|

| **WideEP** | ✅ | Full support on H100s/GB200 |

| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not router to DP worker |

| **GB200 Support** | ✅ | |

| Feature | SGLang | Notes |

|---------------------|--------|--------------------------------------------------------------|

| **WideEP** | ✅ | Full support on H100s/GB200 |

| **DP Rank Routing** | 🚧 | Direct routing supported. Dynamo KV router does not route to DP worker |

| **GB200 Support** | ✅ | |

🧰 Tools

🪛 LanguageTool

[grammar] ~49-~49: There might be a mistake here.
Context: ...KV router does not router to DP worker | | GB200 Support | ✅ | ...

(QB_NEW_EN)

🤖 Prompt for AI Agents

In components/backends/sglang/README.md around lines 46 to 51, the feature-matrix note uses the incorrect verb "router" in "Dynamo KV router does not router to DP worker"; update the phrase to use the correct verb "route" (e.g., "Dynamo KV router does not route to DP worker") and ensure surrounding punctuation/capitalization remains consistent with the table style.

coderabbitai · 2025-08-22T23:49:00Z

components/backends/sglang/README.md

+Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
 ### Start NATS and ETCD in the background


⚠️ Potential issue

Fix wording: extra “the” in Quick Start intro.

-Below we provide a guide that lets you run all of our the common deployment patterns on a single node. +Below we provide a guide that lets you run all of our common deployment patterns on a single node.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

Below we provide a guide that lets you run all of our the common deployment patterns on a single node.

### Start NATS and ETCD in the background

Below we provide a guide that lets you run all of our common deployment patterns on a single node.

### Start NATS and ETCD in the background

🧰 Tools

🪛 LanguageTool

[grammar] ~55-~55: There might be a mistake here.
Context: ...on deployment patterns on a single node. ### Start NATS and ETCD in the background S...

(QB_NEW_EN)

🤖 Prompt for AI Agents

components/backends/sglang/README.md around lines 55 to 56: The Quick Start intro contains an extra definite article ("the the") — update the sentence "Below we provide a guide that lets you run all of our the common deployment patterns on a single node." to remove the duplicated "the" so it reads correctly (e.g., "Below we provide a guide that lets you run all of our common deployment patterns on a single node.").

coderabbitai · 2025-08-22T23:49:04Z

docs/components/backends/trtllm/deploy/README.md

+For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).
+


⚠️ Potential issue

Broken link: kv-cache-tranfer.md → kv-cache-transfer.md

Typo in the filename “tranfer” breaks the link.

Apply:

-For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md). +For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).

For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-transfer.md).

🤖 Prompt for AI Agents

In docs/components/backends/trtllm/deploy/README.md around lines 240 to 241, the link references "../kv-cache-tranfer.md" which contains a typo in the filename; update the link target to "../kv-cache-transfer.md" so it points to the correct file name and verify the linked file exists and the relative path is correct.

coderabbitai · 2025-08-22T23:49:04Z

docs/components/backends/vllm/deploy/README.md

+kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
+```


⚠️ Potential issue

Fix port mismatch between port-forward and curl example.

You port-forward 8000:8000 but later curl localhost:8080. Recommend forwarding 8080:8000 to keep curl unchanged.

-kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000 +kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000

```

kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000

🤖 Prompt for AI Agents

In docs/components/backends/vllm/deploy/README.md around lines 167 to 168, the kubectl port-forward command forwards 8000:8000 while the curl example targets localhost:8080; update the port-forward to kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8080:8000 so local port 8080 maps to container port 8000 (or alternatively change the curl to use :8000) to make the example ports consistent.

coderabbitai · 2025-08-22T23:49:04Z

docs/components/backends/vllm/multi-node.md

+### Multi-node Disaggregated Serving
+
+Deploy prefill and decode workers on separate nodes for optimized resource utilization:
+
+**Node 1**: Run ingress and prefill workers
+```bash
+# Start ingress
+python -m dynamo.frontend --router-mode kv &
+
+# Start prefill worker
+python -m dynamo.vllm \
+  --model meta-llama/Llama-3.3-70B-Instruct
+  --tensor-parallel-size 8 \
+  --enforce-eager
+```
+
+**Node 2**: Run decode workers
+```bash
+# Start decode worker
+python -m dynamo.vllm \
+  --model meta-llama/Llama-3.3-70B-Instruct
+  --tensor-parallel-size 8 \
+  --enforce-eager \
+  --is-prefill-worker
+```


⚠️ Potential issue

Decode worker is started with --is-prefill-worker (wrong flag)

Node 2 is labeled “decode worker” but passes --is-prefill-worker. That will misroute roles.

Apply this diff to fix roles and make flags explicit on both ends:

-# Start prefill worker -python -m dynamo.vllm \ - --model meta-llama/Llama-3.3-70B-Instruct - --tensor-parallel-size 8 \ - --enforce-eager +# Start prefill worker +python -m dynamo.vllm \ + --model meta-llama/Llama-3.3-70B-Instruct \ + --tensor-parallel-size 8 \ + --enforce-eager \ + --is-prefill-worker @@ -# Start decode worker -python -m dynamo.vllm \ - --model meta-llama/Llama-3.3-70B-Instruct - --tensor-parallel-size 8 \ - --enforce-eager \ - --is-prefill-worker +# Start decode worker +python -m dynamo.vllm \ + --model meta-llama/Llama-3.3-70B-Instruct \ + --tensor-parallel-size 8 \ + --enforce-eager

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

### Multi-node Disaggregated Serving

Deploy prefill and decode workers on separate nodes for optimized resource utilization:

**Node 1**: Run ingress and prefill workers

```bash

# Start ingress

python -m dynamo.frontend --router-mode kv &

# Start prefill worker

python -m dynamo.vllm \

--model meta-llama/Llama-3.3-70B-Instruct

--tensor-parallel-size 8 \

--enforce-eager

```

**Node 2**: Run decode workers

```bash

# Start decode worker

python -m dynamo.vllm \

--model meta-llama/Llama-3.3-70B-Instruct

--tensor-parallel-size 8 \

--enforce-eager \

--is-prefill-worker

```

### Multi-node Disaggregated Serving

Deploy prefill and decode workers on separate nodes for optimized resource utilization:

**Node 1**: Run ingress and prefill workers

🤖 Prompt for AI Agents

In docs/components/backends/vllm/multi-node.md around lines 72–96, the Node 2 "decode worker" example incorrectly uses the --is-prefill-worker flag which will misroute roles; update the examples to make flags explicit by adding --is-prefill-worker to the Node 1 prefill worker invocation and replacing the incorrect --is-prefill-worker on Node 2 with the correct --is-decode-worker (or simply remove the prefill flag there) so Node 1 is explicitly a prefill worker and Node 2 is explicitly a decode worker.

coderabbitai · 2025-08-22T23:49:04Z

docs/guides/dynamo_deploy/gke_setup.md

+:orphan:
+


🛠️ Refactor suggestion

⚠️ Potential issue

MyST/Sphinx: replace :orphan: with supported front matter in Markdown

In Markdown (MyST), :orphan: may not be interpreted and can leak into the rendered page. Use YAML front matter instead:

Apply:

-:orphan: - +--- +orphan: true +---

Also ensure this page is reachable via your toctree or intentionally “hidden” via nav; otherwise Sphinx may still warn depending on your build config.

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

:orphan:

---

orphan: true

---

🤖 Prompt for AI Agents

In docs/guides/dynamo_deploy/gke_setup.md lines 1-2, remove the standalone ":orphan:" directive and replace it with YAML front matter at the top of the file declaring the page as orphaned (add a YAML block containing an orphan: true key), then ensure this page is either linked from your toctree or intentionally excluded/hidden in your site nav so Sphinx doesn't emit warnings during the build.

coderabbitai · 2025-08-22T23:49:04Z

docs/guides/dynamo_run.md

+See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#build-container) to build the dynamo container with TensorRT-LLM.

 ##### Step 2: Run the environment

-See instructions [here](https://github.com/ai-dynamo/dynamo/blob/main/examples/tensorrt_llm/README.md#run-container) to run the built environment.
+See instructions [here](https://github.com/ai-dynamo/dynamo/tree/main/components/backends/trtllm#run-container) to run the built environment.



💡 Verification agent

🧩 Analysis chain

Updated TRT-LLM links — verify anchors exist in README.

Ensure #build-container and #run-container headings exist in components/backends/trtllm/README.md so these links don’t 404.

🏁 Script executed:

#!/bin/bash set -euo pipefail file="components/backends/trtllm/README.md" echo "Checking anchors in $file" rg -nP '^\s*#{2,}\s*Build the environment\b' "$file" || echo "Missing 'Build the environment' heading" rg -nP '^\s*#{2,}\s*Run the environment\b' "$file" || echo "Missing 'Run the environment' heading"

Length of output: 538

Add missing anchors in components/backends/trtllm/README.md

The links in docs/guides/dynamo_run.md (lines 485–490) point to #build-container and #run-container, but those anchors don’t exist in components/backends/trtllm/README.md, so they will 404. Please update the README to include matching headings or explicit anchors. For example:

• In components/backends/trtllm/README.md, add:

### Build Container  Steps to build the Dynamo container with TensorRT-LLM…

• And:

### Run Container  Instructions to run the built environment…

Alternatively, adjust the links in dynamo_run.md to match whatever headings are already present.

🤖 Prompt for AI Agents

In docs/guides/dynamo_run.md around lines 485–490 the links target anchors #build-container and #run-container that don't exist in components/backends/trtllm/README.md; fix by either adding matching anchors/heading blocks to components/backends/trtllm/README.md (e.g., add "### Build Container" and "### Run Container" headings or explicit <a name="build-container"></a> / <a name="run-container"></a> anchors with the corresponding build/run instructions) or by updating the links in docs/guides/dynamo_run.md to point to the actual existing headings in components/backends/trtllm/README.md so they resolve correctly.

alec-flowers and others added 30 commits July 30, 2025 22:07

fix: add better port logic (#2175) (#2192)

992adfb

chore: fix install (#2191)

9a93f11

Co-authored-by: Anant Sharma <anants@nvidia.com> Co-authored-by: Ishan Dhanani <idhanani@nvidia.com>

chore: fix QA bugs in documentation/readmes (#2199)

2a616da

feat: Add trtllm deploy examples for k8s #2133 (#2207)

d0de1a0

Co-authored-by: Tanmay Verma <tanmayv@nvidia.com>

fix(sglang): disagg yaml worker change and agg kv router fix (#2205)

edccbd5

Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

fix: add curl and jq for health checks #2203 (#2209)

54fbff3

Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

fix: Kprashanth/trtllm rc4 cherry pick (#2218)

a9b6b28

chore: cleanup dead links (#2208)

65e89b3

Co-authored-by: Dmitry Tokarev <dtokarev@nvidia.com>

chore: update nixl version to 0.4.1 (#2221) (#2228)

c92dc98

chore: Remove multimodal readme. (#2212) (#2234)

eb58916

fix: Cherry pick pr 2186 release 0.4.0 to fix docs/runtime/README.md …

e848cf5

…link (#2239) Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>

fix: drop cuda graph bs (batch size) on dsr1 h100 sgl (#2235)

5e3586d

fix: handle groveTerminationDelay and auto-detect grove installation (#…

4fbb4e5

…2190) (#2237)

fix: Locked triton==3.3.1 since triton 3.4.0 breaks tensorrt-llm 1.0.…

dc13774

…0rc4 (#2233)

fix: sgl instructions point to new frontend (#2245)

e5e94ad

fix: Update disagg configs for trtllm 1.0.0rc4 changes (release/0.4.0) (

92781d3

#2278)

fix: readme instruction (#2265)

58ad4a2

fix: Update eagle_one configs with speculative_model_dir field (#2283)

039c061

docs: Backport: Dyn 591 (#2247) to 0.4.0 (#2251)

2a8e251

Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com> Co-authored-by: Anish <80174047+athreesh@users.noreply.github.com>

fix: trtllm container - ENV var used before declaration (#2277)

2dc4a4b

fix: Update the NIXL TRTLLM commit version to rc4 (#2285)

85737ba

docs: add instruction to deploy model with inference gateway #2257 (#…

27c8a97

…2260) Signed-off-by: Biswa Panda <biswa.panda@gmail.com>

fix: fix nil pointer deref in dynamo controller (#2293) (#2299)

641e49d

fix: fix broken doc links (#2308)

1b145bb

fix: Copy cuda libraries from devel to runtime stage (#2298)

4e4818f

docs: update deploy readme (#2306)

c92c1f4

fix: Add common and test dependencies to sglang runtime build (#2279) (…

6fce98a

…#2322)

fix: Revert the commit for DeepGEMM to fix vLLM WideEP (#2302) (#2325)

035d6d8

docs: Final fixes to links reported by QA (#2334)

409aa9e

Signed-off-by: Anish <80174047+athreesh@users.noreply.github.com>

biswapanda requested review from a team, hhzhang16, hutm, ishandhanani, julienmancuso, mohammedabdulwahhab and nealvaidya as code owners August 22, 2025 23:30

biswapanda requested review from GuanLuo, PeaBrane, alec-flowers, grahamking, jthomson04, kkranen, paulhendricks, piotrm-nvidia, ptarasiewiczNV, richardhuo-nv, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners August 22, 2025 23:30

github-actions bot added the fix label Aug 22, 2025

pull-request-size bot added the size/XXL label Aug 22, 2025

biswapanda changed the base branch from main to release/0.4.1 August 22, 2025 23:30

coderabbitai bot reviewed Aug 22, 2025

View reviewed changes

biswapanda closed this Aug 23, 2025

-version = "0.4.0+post0"
+[workspace.dependencies]
+# Local crates
+dynamo-runtime = { path = "lib/runtime", version = "0.4.0+post0" }
+dynamo-llm      = { path = "lib/llm",     version = "0.4.0+post0" }
+dynamo-tokens   = { path = "lib/tokens",  version = "0.4.0+post0" }

		Below we provide a guide that lets you run all of our the common deployment patterns on a single node.
		### Start NATS and ETCD in the background

		For detailed configuration instructions, see the [KV cache transfer guide](../kv-cache-tranfer.md).

		kubectl port-forward deployment/vllm-v1-disagg-frontend-<pod-uuid-info> 8000:8000
		```

fix: prevent crash looping hello world #2625 #2670

fix: prevent crash looping hello world #2625 #2670

Uh oh!

Conversation

biswapanda commented Aug 22, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 22, 2025

Uh oh!

coderabbitai bot commented Aug 22, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

coderabbitai bot Aug 22, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

19 participants

biswapanda commented Aug 22, 2025 •

edited by coderabbitai bot

Loading