-
Notifications
You must be signed in to change notification settings - Fork 686
chore(dynamo-run): Refactor to library #1687
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis change refactors the Dynamo runner and input handling architecture. It introduces a builder pattern for local model construction, centralizes engine configuration, and migrates input mode management and routing into the core library. Several modules and methods are removed or relocated, and subprocess management is streamlined for clarity and modularity. Changes
Sequence Diagram(s)sequenceDiagram
participant CLI
participant LocalModelBuilder
participant EngineConfig
participant InputRouter
participant Runtime
CLI->>LocalModelBuilder: Configure and build LocalModel
LocalModelBuilder-->>CLI: Returns LocalModel
CLI->>EngineConfig: Create engine using LocalModel & flags
EngineConfig-->>CLI: Returns EngineConfig
CLI->>InputRouter: Run input handling (with Input, EngineConfig, Runtime)
InputRouter->>Runtime: Dispatches to specific input handler (http, text, batch, endpoint)
Runtime-->>InputRouter: Handles requests/responses
InputRouter-->>CLI: Completes input handling
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 5
🧹 Nitpick comments (1)
launch/dynamo-run/src/lib.rs (1)
48-51: Improve error handling for endpoint parsing.The endpoint parsing on line 50 uses
parse()?which could fail with a generic error. Consider providing a more specific error message.- builder.endpoint_id(path.parse()?); + builder.endpoint_id(path.parse() + .map_err(|e| anyhow::anyhow!("Failed to parse endpoint path '{}': {}", path, e))?);
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (1)
Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (18)
launch/dynamo-run/Cargo.toml(0 hunks)launch/dynamo-run/src/flags.rs(2 hunks)launch/dynamo-run/src/input.rs(0 hunks)launch/dynamo-run/src/lib.rs(3 hunks)launch/dynamo-run/src/main.rs(2 hunks)launch/dynamo-run/src/opt.rs(1 hunks)launch/dynamo-run/src/subprocess.rs(1 hunks)launch/llmctl/src/main.rs(2 hunks)lib/llm/Cargo.toml(2 hunks)lib/llm/src/entrypoint.rs(1 hunks)lib/llm/src/entrypoint/input.rs(1 hunks)lib/llm/src/entrypoint/input/batch.rs(5 hunks)lib/llm/src/entrypoint/input/common.rs(7 hunks)lib/llm/src/entrypoint/input/endpoint.rs(3 hunks)lib/llm/src/entrypoint/input/http.rs(2 hunks)lib/llm/src/entrypoint/input/text.rs(1 hunks)lib/llm/src/lib.rs(1 hunks)lib/llm/src/local_model.rs(5 hunks)
💤 Files with no reviewable changes (2)
- launch/dynamo-run/Cargo.toml
- launch/dynamo-run/src/input.rs
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/llm/src/entrypoint/input/endpoint.rs (1)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/entrypoint/input/text.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
launch/dynamo-run/src/opt.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
lib/llm/src/entrypoint/input/batch.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
lib/llm/src/entrypoint/input/common.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
launch/dynamo-run/src/subprocess.rs (1)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
lib/llm/src/entrypoint/input.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
launch/dynamo-run/src/flags.rs (1)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
launch/dynamo-run/src/lib.rs (4)
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
lib/llm/src/local_model.rs (1)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/protocols.rs:85-112
Timestamp: 2025-06-16T20:02:54.935Z
Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.
🧬 Code Graph Analysis (5)
launch/llmctl/src/main.rs (1)
lib/llm/src/local_model.rs (2)
default(52-64)model_name(73-76)
lib/llm/src/entrypoint/input/text.rs (2)
lib/llm/src/local_model.rs (2)
request_template(115-120)request_template(244-246)lib/llm/src/entrypoint/input/common.rs (1)
prepare_engine(49-123)
lib/llm/src/entrypoint/input/batch.rs (1)
lib/llm/src/entrypoint/input/common.rs (1)
prepare_engine(49-123)
lib/llm/src/entrypoint/input/common.rs (4)
lib/llm/src/entrypoint/input/http.rs (2)
common(66-69)common(73-76)lib/llm/src/local_model.rs (4)
request_template(115-120)request_template(244-246)card(228-230)service_name(240-242)lib/llm/src/entrypoint.rs (2)
local_model(52-59)new(26-31)lib/llm/src/engines.rs (1)
make_engine_core(84-86)
launch/dynamo-run/src/subprocess.rs (2)
lib/llm/src/local_model.rs (2)
endpoint_id(83-86)endpoint_id(263-265)lib/runtime/src/protocols.rs (1)
as_url(178-183)
⏰ Context from checks skipped due to timeout of 90000ms (2)
- GitHub Check: Mirror Repository to GitLab
- GitHub Check: Build and Test - vllm
🔇 Additional comments (30)
lib/llm/src/lib.rs (1)
18-18: LGTM: Clean module declarationThe addition of the
entrypointmodule declaration properly exposes the new functionality as part of the public API and maintains alphabetical ordering.lib/llm/src/entrypoint/input/endpoint.rs (3)
6-17: LGTM: Proper relative importsThe change from
dynamo_llmtocrateimports is correct now that this code resides within thedynamo-llmcrate.
24-24: LGTM: Consistent relative importThe update to use
crate::entrypoint::EngineConfigmaintains consistency with the other relative imports.
82-82: LGTM: Updated pattern matching for new enum structureThe change from
EngineConfig::DynamictoEngineConfig::Dynamic(_)correctly handles the updated enum variant that now contains data.launch/llmctl/src/main.rs (2)
9-9: LGTM: Updated import for builder patternThe import change from
LocalModeltoLocalModelBuilderaligns with the new builder pattern approach.
230-233: LGTM: Proper builder pattern implementationThe refactoring from
LocalModel::with_name_only(&model_name)to the builder pattern is correctly implemented:
- Uses
LocalModelBuilder::default()for initialization- Properly sets the model name with
model_name(Some(model_name))- Calls
.build().await?for asynchronous construction with proper error handlinglib/llm/Cargo.toml (2)
56-56: LGTM: Added workspace dependency for batch functionalityThe addition of
humantimeas a workspace dependency with clear commenting supports the new input/batch processing features.
84-85: LGTM: Well-configured dependency for interactive inputThe
dialoguerdependency is properly configured with:
- Disabled default features to minimize bloat
- Enabled "editor" and "history" features for rich interactive text input
- Clear section commenting indicating its purpose
launch/dynamo-run/src/subprocess.rs (2)
21-30: LGTM: Simplified function signature with better encapsulationThe removal of the
endpoint: &EndpointIdparameter simplifies the function interface by leveraging the endpoint ID now encapsulated within theLocalModel.
40-40: LGTM: Consistent use of encapsulated endpointThe change from
endpoint.as_url()tolocal_model.endpoint_id().as_url()correctly uses the endpoint ID from the LocalModel, maintaining the same functionality while improving encapsulation.launch/dynamo-run/src/opt.rs (1)
1-113: Clean refactoring: Input enum successfully migrated to library crate.The removal of the
Inputenum and its associated implementations is correctly executed as part of the migration todynamo_llm::entrypoint::input. The remainingOutputenum and its implementations are intact and functional.launch/dynamo-run/src/main.rs (3)
20-21: Import update correctly reflects the refactoring.The import change from local
dynamo_run::{Input, Output}todynamo_llm::entrypoint::input::Inputanddynamo_run::Outputproperly reflects the migration ofInputto the library crate.
131-133: Validation logic prevents incompatible input/output combinations.The validation correctly prevents using endpoint input (
Input::Endpoint) with dynamic output (Output::Dynamic) simultaneously, which could lead to conflicting configurations.
138-144: Helper functions are correctly implemented.Both
is_in_dynamicandis_out_dynamicfunctions use appropriate pattern matching to identify the respective dynamic configurations. The logic is sound and the functions are well-named.lib/llm/src/entrypoint/input/text.rs (2)
4-14: Import reorganization improves code organization.The imports have been correctly updated to use crate-relative paths (
crate::) and are well-organized with related imports grouped together.
20-37: Function refactoring simplifies parameter passing.The removal of unused
_flagsandtemplateparameters and the direct use ofprepared_engine.request_templateeliminates redundant parameter passing. The TODO comment appropriately identifies a potential future improvement to pass the entireprepared_enginetomain_loop.lib/llm/src/entrypoint/input/batch.rs (3)
4-22: Import reorganization follows consistent pattern.The imports are properly reorganized with crate-local imports grouped before external dependencies, consistent with the pattern used in other refactored files.
67-73: Correct ownership handling for card consumption.Using
prepared_engine.card.take()is the correct approach here since theOpenAIPreprocessor::new()method likely takes ownership of the card. Thehas_tokenizer()method provides a clean way to check if preprocessing is available.
97-97: Template handling simplified through prepared engine.Converting
prepared_engine.request_templatetoOption<Arc<RequestTemplate>>correctly handles the optional template from the prepared engine and makes it shareable across async tasks.lib/llm/src/entrypoint/input/http.rs (3)
6-21: Import reorganization and grouping improves readability.The imports are well-organized with related crate imports grouped together and external dependencies clearly separated.
24-31: Configuration access centralized through engine_config.The HTTP service configuration now correctly accesses port and request template directly from
engine_config.local_model(), eliminating the need for separate parameters and centralizing configuration access.
33-46: Router configuration properly extracted from engine config.The match arm correctly handles the
EngineConfig::Dynamic(_)variant and extracts router configuration fromengine_config.local_model().router_config(). The router mode and KV router config are properly passed torun_watcher.lib/llm/src/entrypoint.rs (1)
52-52: Consider making local_model() public if needed by external modules.The
local_model()method is currently private. If this method needs to be accessed from outside the module (which seems likely given its utility), consider making it public.#!/bin/bash # Check if local_model() is used outside the entrypoint module rg -A 2 "\.local_model\(\)" --type rustlaunch/dynamo-run/src/lib.rs (1)
21-82: Well-structured refactoring of the run function.The refactoring significantly improves code organization by:
- Using the builder pattern for LocalModel construction
- Extracting engine creation logic into a dedicated function
- Centralizing input handling through the dynamo_llm library
- Properly handling subprocess cleanup with the extra future
This makes the code more maintainable and follows the single responsibility principle.
lib/llm/src/local_model.rs (6)
8-10: LGTM!The new imports are appropriate for the builder pattern implementation and new model configuration features.
Also applies to: 18-18, 21-21
33-38: LGTM!Good choice of default values with clear explanatory comments.
39-65: LGTM!Well-structured builder pattern implementation with appropriate defaults.
217-265: LGTM!Clean struct definition with appropriate accessor methods following Rust conventions.
335-343: LGTM!Good approach using UUID for generating unique internal endpoints. The comment clearly explains the rationale.
109-112: Router config should not be optional if always requiredThe
router_configfield is anOptionin the builder, but thebuild()method expects it to always be present. This design is inconsistent and could lead to runtime panics.Either:
- Make
router_configa required parameter in the builder constructor, or- Provide a sensible default in the
build()method instead of panicking- router_config: self - .router_config - .take() - .expect("unreachable, RouterConfig missing"), + router_config: self + .router_config + .take() + .unwrap_or_default(),Also applies to: 150-155, 209-213
⛔ Skipped due to learnings
Learnt from: PeaBrane PR: ai-dynamo/dynamo#1285 File: lib/llm/src/kv_router/scoring.rs:58-63 Timestamp: 2025-05-30T06:38:09.630Z Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.Learnt from: PeaBrane PR: ai-dynamo/dynamo#1392 File: lib/llm/src/kv_router/scoring.rs:35-46 Timestamp: 2025-06-05T01:02:15.318Z Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.Learnt from: PeaBrane PR: ai-dynamo/dynamo#1236 File: lib/llm/src/mocker/protocols.rs:85-112 Timestamp: 2025-06-16T20:02:54.935Z Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.Learnt from: PeaBrane PR: ai-dynamo/dynamo#1285 File: lib/llm/src/kv_router/scheduler.rs:260-266 Timestamp: 2025-05-30T06:34:12.785Z Learning: In the KV router scheduler code, PeaBrane prefers fail-fast behavior over silent failure handling. When accessing worker metrics data that could be out-of-bounds (like dp_rank indexing), explicit panics are preferred over graceful degradation with continue statements to ensure data integrity issues are caught early.Learnt from: alec-flowers PR: ai-dynamo/dynamo#1181 File: lib/llm/src/kv_router/publisher.rs:379-425 Timestamp: 2025-05-29T00:02:35.018Z Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
9b7df16 to
03eb709
Compare
paulhendricks
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work! Left a few little comments.
479c38a to
b69fd0d
Compare
b69fd0d to
3a1ef41
Compare
Move much of what was in the `dynamo-run` crate into `dynamo-llm` so
that everyone can use it.
Example usage:
1. Create a `LocalModel`:
```
let local_model = LocalModelBuilder::default()
.model_path("Qwen/Qwen3-0.6B")
.http_port(8080)
.build().await?;
```
2. Make an engine:
```
let engine_config = EngineConfig::StaticFull {
engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
model: Box::new(local_model),
};
```
3. Connect it to an input and run it
```
dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```
For #1647
And fix bindings, update lock files.
They were usize because that's easiest in Rust, but naturally they are u32, so put in the extra work to do it right. The nice part is at the boundary (mistralrs, gguf, etc) they were already u32, suggesting this is indeed the correct type.
3a1ef41 to
6f7fd3e
Compare
6f7fd3e to
ae407e1
Compare
ae407e1 to
0373980
Compare
Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.
Example usage:
1. Create a `LocalModel`:
```
let local_model = LocalModelBuilder::default()
.model_path("Qwen/Qwen3-0.6B")
.http_port(8080)
.build().await?;
```
2. Make an engine:
```
let engine_config = EngineConfig::StaticFull {
engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
model: Box::new(local_model),
};
```
3. Connect it to an input and run it
```
dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```
For #1647
Code Rabbit summary, thanks:
* Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
* Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
* Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
* Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
* Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
* Streamlined configuration and validation for flags and router settings.
* Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.
Move much of what was in the
dynamo-runcrate intodynamo-llmso that everyone can use it.Example usage:
LocalModel:For #1647
Summary by CodeRabbit
New Features
Refactor
Bug Fixes
Chores