fix: hello world DGD #2732

mohammedabdulwahhab · 2025-08-26T22:59:02Z

Overview:

cherry pick of #2727

Summary by CodeRabbit

New Features
- Kubernetes CRDs add sharedMemory config for /dev/shm (enable/size).
- Helm charts become componentType-aware (frontend/worker env, ports, health checks); add terminationDelay.
- New multimodal LLAVA aggregated deployment example.
Improvements
- Default model switched to Qwen/Qwen3-0.6B across samples and launch scripts.
- Readiness gating prevents requests before model registration; tokenizer init auto-skipped.
- Container updates: TensorRT-LLM 1.0.0rc6, vLLM 0.10.1.1, base images/UCX pinned; Prometheus included in runtimes.
Bug Fixes
- Clear error when output tokens are absent; example script imports fixed; GPU limits moved to pod level.
Documentation
- New Quickstart (local), Installation, Architecture, Examples; links refreshed; metrics guide updated; support matrix revised.

copy-pr-bot · 2025-08-26T22:59:05Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

coderabbitai · 2025-08-26T23:15:35Z

Caution

Review failed

Failed to post review comments.

Walkthrough

Broad updates across docs, configs, and code: switch default demo model to Qwen/Qwen3-0.6B; add SGLang readiness gate and tokenizer-init enforcement; refine error handling; add CRD/operator “sharedMemory” support; rework Helm templates for componentType; bump TRT‑LLM/vLLM/UCX versions; remove local async-openai-macros crate; add examples and docs reorg.

Changes

Cohort / File(s)	Summary
SGLang runtime flow `components/backends/sglang/src/dynamo/sglang/args.py`, `.../main.py`, `.../register.py`, `.../request_handlers/decode_handler.py`	Auto-enable skip_tokenizer_init with warning; add readiness gate to queue requests until model registration completes; make register_llm_with_runtime_config return bool; add defensive handling when output_ids missing.
SGLang deploy/launch `components/backends/sglang/deploy/.yaml`, `components/backends/sglang/launch/.sh`, `components/backends/sglang/slurm_jobs/scripts/`, `components/backends/sglang/README.md`, `.../deploy/README.md`, `.../docs/`	Switch model to Qwen/Qwen3-0.6B across examples; add flags like --skip-tokenizer-init, kv-events-config, disaggregation options; minor namespace/type/link fixes; update hicache flag to --hicache-ratio.
TRT‑LLM configs and docs `components/backends/trtllm/deploy/.yaml`, `components/backends/trtllm/engine_configs/llama4/eagle/`, `components/backends/trtllm/README.md`, `.../gpt-oss.md`, `.../gemma3_sliding_window_attention.md`, `.../launch/*.sh`	Point deployments to Qwen/Qwen3-0.6B; adjust Eagle configs (delete some, tweak others incl. cuda_graph_config, token limits); consolidate multimodal docs to external guide; readiness/health docs added.
vLLM updates `components/backends/vllm/deploy/agg_router.yaml`, `container/Dockerfile.vllm`, `container/deps/vllm/install_vllm.sh`	Move GPU limit to pod-level; bump vLLM ref to 0.10.1.1 and copy Prometheus/UCX into runtime.
Containers and build pins `container/Dockerfile*`, `container/build.sh`, `pyproject.toml`, `README.md`	Update UCX to v1.19.0; bump TRT‑LLM base/runtime tags and deps to rc6; update Torch pins; copy Prometheus into runtime images; switch some install commands to uv.
Operator/CRDs `deploy/cloud/helm/crds/templates/nvidia.com_`, `deploy/cloud/operator/api/v1alpha1/`, `.../internal/consts/consts.go`, `.../internal/dynamo/graph.go`, `.../internal/controller/_test.go`, `deploy/helm/chart/templates/`	Add sharedMemory spec (disabled/size) across CRDs, API types, deepcopy, and use defaults (/dev/shm, 8Gi). Add BackendFrameworkNoop and componentType-aware Helm charts (env/ports/probes/commands).
Docs reorganization `docs/conf.py`, `docs/index.rst`, `docs/_sections/`, `docs/_includes/`, `docs/hidden_toctree.rst`, `docs/support_matrix.md`, multiple moved/removed links	Rebrand docs config, simplify MyST extensions; restructure index and sections; add install/quick start snippets; update support matrix to TRT‑LLM rc6 and add AL2023 footnote; replace/move various links.
Examples `examples/runtime/hello_world/*`, `examples/basics/multinode/README.md`, `examples/multimodal/deploy/agg_llava.yaml`	Add retry loop to client; adjust probes/args and add backendFramework; fix missing import; add LLAVA aggregated deployment manifest.
Rust workspace/macros `Cargo.toml`, `lib/async-openai-macros/*`, `lib/async-openai/Cargo.toml`	Remove local async-openai-macros crate from workspace; switch to published crate version 0.1.0.
Tests `tests/serve/test_sglang.py`, `tests/serve/test_vllm.py`, `tests/kvbm/test_determinism.py`	Update model to Qwen; increase vLLM timeout; comment out certain pytest markers pending CI support.
Attributions `ATTRIBUTIONS-Go.md`	Add two third-party license blocks (entries duplicated).
Misc `deploy/inference-gateway/README.md`, `docs/components/backends/*`	Link/path fixes and small doc additions/removals.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  participant Client
  participant Frontend
  participant SGLang as SGLang Runtime
  participant Registrar as Model Registrar

  Note over Frontend: Readiness gate
  Frontend->>Registrar: register_llm_with_runtime_config(...)
  par Start endpoint immediately
    Frontend-->>Client: /v1/... generate endpoint available
  and Register model concurrently
    Registrar-->>Frontend: success(bool=true) or failure
  end

  alt Registration succeeds
    Frontend->>Frontend: ready_event.set()
    Client->>Frontend: generate(request)
    Frontend->>Frontend: wait until ready_event
    Frontend->>SGLang: handler.generate(request)
    SGLang-->>Frontend: stream chunks
    Frontend-->>Client: stream chunks
  else Registration fails
    Registrar->>Frontend: error
    Frontend->>SGLang: shutdown()
    Frontend-->>Client: error response
  end

sequenceDiagram
  autonumber
  participant Operator as Operator Graph Builder
  participant CRD as CRD Spec (sharedMemory)
  participant K8s as Kubernetes

  Operator->>CRD: read spec.sharedMemory {disabled,size}
  alt disabled == true
    Operator->>K8s: do not mount /dev/shm tmpfs
  else not set or false
    Operator->>K8s: create EmptyDir medium=Memory sizeLimit=(size or 8Gi)
    Operator->>K8s: mount at /dev/shm (default path)
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~75 minutes

Possibly related PRs

fix: increase shm default size and make it configurable #2616 — Implements sharedMemory across CRDs/operator; overlaps directly with added SharedMemorySpec and mounting logic here.
fix: fix manual helm chart #2648 — Refactors Helm templates to be componentType-aware; matches this PR’s deployment.yaml/grove-podgangset.yaml/service.yaml changes.
feat: use consistent small models across all deploy examples #2573 — Replaces DeepSeek model refs with Qwen/Qwen3-0.6B across SGLang/TRT‑LLM configs; mirrors model swaps in this PR.

Poem

Bun to the metal, I tweak and I tune,
Gates hold requests till models commune.
Qwen is the default, the routes are anew,
Shared mem grows comfy, with pods in a queue.
Docs shed their clutter, containers align—
Hippity-hop, ship it, all fine! 🐇✨

Warning

There were issues while running some tools. Please review the errors and either fix the tool's configuration or disable the tool if it's a critical failure.

🔧 golangci-lint (2.2.2)

Error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions
The command is terminated due to an error: can't load config: unsupported version of the configuration: "" See https://golangci-lint.run/product/migration-guide for migration instructions

Tip

🔌 Remote MCP (Model Context Protocol) integration is now available!

Pro plan users can now connect to remote MCP servers from the Integrations page. Connect with popular remote MCPs such as Notion and Linear to add more context to your reviews and chats.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

fix: fix hello world (#2727)

dc20b44

mohammedabdulwahhab requested a review from indrajit96 as a code owner August 26, 2025 22:59

github-actions bot added the fix label Aug 26, 2025

pull-request-size bot added the size/XXL label Aug 26, 2025

mohammedabdulwahhab changed the base branch from main to release/0.4.1 August 26, 2025 22:59

pull-request-size bot added size/XS and removed size/XXL labels Aug 26, 2025

biswapanda approved these changes Aug 26, 2025

View reviewed changes

atchernych approved these changes Aug 26, 2025

View reviewed changes

nv-nmailhot merged commit ec66a42 into release/0.4.1 Aug 26, 2025
4 of 5 checks passed

nv-nmailhot deleted the mabdulwahhab/cp-hello-world-fix branch August 26, 2025 23:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix: hello world DGD #2732

fix: hello world DGD #2732

Uh oh!

mohammedabdulwahhab commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

copy-pr-bot bot commented Aug 26, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Aug 26, 2025 •

edited

Loading

Review failed

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix: hello world DGD #2732

fix: hello world DGD #2732

Uh oh!

Conversation

mohammedabdulwahhab commented Aug 26, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 26, 2025

Uh oh!

Uh oh!

coderabbitai bot commented Aug 26, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review failed

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

mohammedabdulwahhab commented Aug 26, 2025 •

edited by coderabbitai bot

Loading

coderabbitai bot commented Aug 26, 2025 •

edited

Loading