-
Notifications
You must be signed in to change notification settings - Fork 676
docs(dynamo-run): Remove vllm/sglang/trtllm engines from dynamo-run docs #2332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
UX v2 has a smaller role for dynamo-run. Clarify that in the guide. - Remove sglang, vllm and trtllm engines. - Remove old KV-routing (vllm patch pre merge) instructions. - Adjust the Markdown header levels to reflect document organisation. Follow-up to #1920
WalkthroughThis update revises the Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant dynamo-run CLI
participant Engine (mistralrs/llamacpp/mocker/echo)
User->>dynamo-run CLI: Run with selected engine and flags
dynamo-run CLI->>Engine: Initialize engine (default: mistralrs, or mocker/echo)
Engine-->>dynamo-run CLI: Ready for inference or mock
User->>dynamo-run CLI: Submit prompt
dynamo-run CLI->>Engine: Process prompt (real or simulated)
Engine-->>dynamo-run CLI: Return output
dynamo-run CLI-->>User: Output result
Estimated code review effort🎯 2 (Simple) | ⏱️ ~7 minutes Possibly related PRs
Poem
Note ⚡️ Unit Test Generation is now available in beta!Learn more here, or try it out under "Finishing Touches" below. Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 1
🔭 Outside diff range comments (1)
launch/dynamo-run/src/flags.rs (1)
110-115: Validate that--extra-engine-argsis only accepted for themockerengineThe new comment clarifies intent but nothing enforces it. A typo or copy-paste will silently pass the JSON file to a non-mocker engine and be ignored.
@@ Output::Mocker => { // nothing to check here } + _ => { + if self.extra_engine_args.is_some() { + anyhow::bail!( + "`--extra-engine-args` is supported exclusively by the mocker engine" + ); + } + } }
🧹 Nitpick comments (2)
launch/dynamo-run/src/flags.rs (1)
106-108: Out-of-date docstring still says “vllm only”Line 106 advertises
kv_cache_block_sizeas “vllm only”, yet this PR intentionally purges vLLM references from the user-facing surface. Either drop the qualifier or add an explanatory note similar to the new “Mocker engine only” line to stay consistent.docs/guides/dynamo_run.md (1)
60-70: Markdown-lint: add language identifiers to fenced blocksSeveral new code fences lack a language tag (
```bash,```json…). The repo already enforces MD040; adding tags prevents future CI failures.Examples:
-``` -dynamo-run in=http out=mistralrs <model> -v -``` +```bash +dynamo-run in=http out=mistralrs <model> -v +```Also applies to: 94-102, 211-218, 322-324
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (2)
docs/guides/dynamo_run.md(15 hunks)launch/dynamo-run/src/flags.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (9)
📓 Common learnings
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Learnt from: dmitry-tokarev-nv
PR: ai-dynamo/dynamo#2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.
📚 Learning: the asyncenginecontextprovider trait in lib/runtime/src/engine.rs was intentionally changed from `se...
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1919
File: lib/runtime/src/engine.rs:168-168
Timestamp: 2025-07-14T21:25:56.930Z
Learning: The AsyncEngineContextProvider trait in lib/runtime/src/engine.rs was intentionally changed from `Send + Sync + Debug` to `Send + Debug` because the Sync bound was overly constraining. The trait should only require Send + Debug as designed.
Applied to files:
launch/dynamo-run/src/flags.rs
📚 Learning: in docs/support_matrix.md, the nixl version difference between runtime dependencies (0.5.0) and buil...
Learnt from: dmitry-tokarev-nv
PR: ai-dynamo/dynamo#2179
File: docs/support_matrix.md:61-63
Timestamp: 2025-07-30T00:34:35.810Z
Learning: In docs/support_matrix.md, the NIXL version difference between runtime dependencies (0.5.0) and build dependencies (0.4.0) is intentional and expected, not an error that needs to be corrected.
Applied to files:
docs/guides/dynamo_run.md
📚 Learning: in the dynamo codebase, bentoml has been completely removed from all executable code, with only docu...
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
Applied to files:
docs/guides/dynamo_run.md
📚 Learning: in multi-node setups with head/worker architecture, the head node typically doesn't need environment...
Learnt from: GuanLuo
PR: ai-dynamo/dynamo#1371
File: examples/llm/benchmarks/vllm_multinode_setup.sh:18-25
Timestamp: 2025-06-05T01:46:15.509Z
Learning: In multi-node setups with head/worker architecture, the head node typically doesn't need environment variables pointing to its own services (like NATS_SERVER, ETCD_ENDPOINTS) because local processes can access them via localhost. Only worker nodes need these environment variables to connect to the head node's external IP address.
Applied to files:
docs/guides/dynamo_run.md
📚 Learning: in examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd)...
Learnt from: fsaady
PR: ai-dynamo/dynamo#1730
File: examples/sglang/slurm_jobs/scripts/worker_setup.py:230-244
Timestamp: 2025-07-03T10:14:30.570Z
Learning: In examples/sglang/slurm_jobs/scripts/worker_setup.py, background processes (like nats-server, etcd) are intentionally left running even if later processes fail. This design choice allows users to manually connect to nodes and debug issues without having to restart the entire SLURM job from scratch, providing operational flexibility for troubleshooting in cluster environments.
Applied to files:
docs/guides/dynamo_run.md
📚 Learning: in the kv router scheduler code, peabrane prefers fail-fast behavior over silent failure handling. w...
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scheduler.rs:260-266
Timestamp: 2025-05-30T06:34:12.785Z
Learning: In the KV router scheduler code, PeaBrane prefers fail-fast behavior over silent failure handling. When accessing worker metrics data that could be out-of-bounds (like dp_rank indexing), explicit panics are preferred over graceful degradation with continue statements to ensure data integrity issues are caught early.
Applied to files:
docs/guides/dynamo_run.md
📚 Learning: the `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects...
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.
Applied to files:
docs/guides/dynamo_run.md
📚 Learning: the `--torch-backend=auto` flag works with vllm installations via uv pip install, even though it's n...
Learnt from: ptarasiewiczNV
PR: ai-dynamo/dynamo#2027
File: container/deps/vllm/install_vllm.sh:0-0
Timestamp: 2025-07-22T10:22:28.972Z
Learning: The `--torch-backend=auto` flag works with vLLM installations via uv pip install, even though it's not a standard pip option. This flag is processed by vLLM's build system during installation to automatically match PyTorch distribution with container CUDA versions.
Applied to files:
docs/guides/dynamo_run.md
🪛 markdownlint-cli2 (0.17.2)
docs/guides/dynamo_run.md
10-10: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
28-28: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
60-60: Emphasis used instead of a heading
(MD036, no-emphasis-as-heading)
61-61: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
66-66: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
94-94: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
100-100: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
130-130: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
211-211: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
322-322: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 LanguageTool
docs/guides/dynamo_run.md
[style] ~381-~381: Consider removing “of” to be more concise
Context: ...our own engine and attach it to Dynamo. All of the main backend components in `components/...
(ALL_OF_THE)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (3)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: Build and Test - vllm
- GitHub Check: pre-merge-rust (.)
🔇 Additional comments (1)
docs/guides/dynamo_run.md (1)
160-163: Residual vLLM reference contradicts PR goalThis paragraph still cites “vllm announces when a KV block is created…”. Given the engines were intentionally removed from the guide, rename the sentence (e.g. “the backend engine”) or delete it.
UX v2 has a smaller role for dynamo-run. Clarify that in the guide.
Follow-up to #1920
Summary by CodeRabbit
dynamo-runto enhance clarity and update usage instructions.Introduced documentation for the new mocker engine as a lightweight testing tool.That was already there