Skip to content

Conversation

@grahamking
Copy link
Contributor

@grahamking grahamking commented Jun 27, 2025

Move much of what was in the dynamo-run crate into dynamo-llm so that everyone can use it.

Example usage:

  1. Create a LocalModel:
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
  1. Make an engine:
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
  1. Connect it to an input and run it
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;

For #1647

Summary by CodeRabbit

  • New Features

    • Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
    • Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
    • Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  • Refactor

    • Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
    • Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
    • Streamlined configuration and validation for flags and router settings.
  • Bug Fixes

    • Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.
  • Chores

    • Updated and reorganized dependencies to reflect new input and configuration features.

@grahamking grahamking changed the title dynamo-run: Refactor to library chore(dynamo-run): Refactor to library Jun 27, 2025
@github-actions github-actions bot added the chore label Jun 27, 2025
@grahamking grahamking marked this pull request as draft June 27, 2025 21:08
@grahamking grahamking self-assigned this Jun 27, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jun 27, 2025

Walkthrough

This change refactors the Dynamo runner and input handling architecture. It introduces a builder pattern for local model construction, centralizes engine configuration, and migrates input mode management and routing into the core library. Several modules and methods are removed or relocated, and subprocess management is streamlined for clarity and modularity.

Changes

Files/Groups Change Summary
launch/dynamo-run/Cargo.toml, lib/llm/Cargo.toml Moved humantime and dialoguer dependencies from dynamo-run to llm crate.
launch/dynamo-run/src/flags.rs Removed kv_router_config and as_vec methods; added validate and router_config methods; updated imports.
launch/dynamo-run/src/input.rs, launch/dynamo-run/src/opt.rs Removed the entire input module and the Input enum/impls from CLI.
launch/dynamo-run/src/lib.rs Refactored run to use new engine creation and input handling; added helper and async functions for engine/subprocess management.
launch/dynamo-run/src/main.rs Changed Input import; added validation to prevent dynamic endpoint for both input and output.
launch/dynamo-run/src/subprocess.rs Updated subprocess start function to remove endpoint parameter and use model endpoint.
launch/llmctl/src/main.rs Switched to LocalModelBuilder for model creation.
lib/llm/src/lib.rs Added new public entrypoint module.
lib/llm/src/entrypoint.rs, lib/llm/src/entrypoint/input.rs Added new modules for engine configuration and input routing; defined EngineConfig, RouterConfig, and Input enum.
lib/llm/src/entrypoint/input/batch.rs, lib/llm/src/entrypoint/input/http.rs, lib/llm/src/entrypoint/input/text.rs Refactored input handlers to use new configuration and prepared engine structures; simplified signatures.
lib/llm/src/entrypoint/input/common.rs Added card and request_template to PreparedEngine; added has_tokenizer method.
lib/llm/src/entrypoint/input/endpoint.rs Updated imports and pattern matching for new engine config structure.
lib/llm/src/local_model.rs Introduced LocalModelBuilder for constructing LocalModel; added new fields and methods; refactored model preparation.

Sequence Diagram(s)

sequenceDiagram
    participant CLI
    participant LocalModelBuilder
    participant EngineConfig
    participant InputRouter
    participant Runtime

    CLI->>LocalModelBuilder: Configure and build LocalModel
    LocalModelBuilder-->>CLI: Returns LocalModel
    CLI->>EngineConfig: Create engine using LocalModel & flags
    EngineConfig-->>CLI: Returns EngineConfig
    CLI->>InputRouter: Run input handling (with Input, EngineConfig, Runtime)
    InputRouter->>Runtime: Dispatches to specific input handler (http, text, batch, endpoint)
    Runtime-->>InputRouter: Handles requests/responses
    InputRouter-->>CLI: Completes input handling
Loading

Possibly related PRs

  • ai-dynamo/dynamo#1259: Adds and modifies router configuration methods in Flags, directly related to the router config changes in this PR.
  • ai-dynamo/dynamo#1335: Introduces random internal endpoint generation, which is further extended and refactored in this PR.
  • ai-dynamo/dynamo#1328: Adjusts default engine selection logic, which is refactored and expanded upon in this PR.

Poem

A builder appears with a hop and a bound,
Models and engines now neatly unbound.
Inputs rerouted, subprocesses clear,
The codebase is lighter—let’s all give a cheer!
With rabbits and runners and endpoints anew,
Dynamo’s more nimble, and faster too!
🐇✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

🧹 Nitpick comments (1)
launch/dynamo-run/src/lib.rs (1)

48-51: Improve error handling for endpoint parsing.

The endpoint parsing on line 50 uses parse()? which could fail with a generic error. Consider providing a more specific error message.

-        builder.endpoint_id(path.parse()?);
+        builder.endpoint_id(path.parse()
+            .map_err(|e| anyhow::anyhow!("Failed to parse endpoint path '{}': {}", path, e))?);
📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 68d7461 and 8b2c336.

⛔ Files ignored due to path filters (1)
  • Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (18)
  • launch/dynamo-run/Cargo.toml (0 hunks)
  • launch/dynamo-run/src/flags.rs (2 hunks)
  • launch/dynamo-run/src/input.rs (0 hunks)
  • launch/dynamo-run/src/lib.rs (3 hunks)
  • launch/dynamo-run/src/main.rs (2 hunks)
  • launch/dynamo-run/src/opt.rs (1 hunks)
  • launch/dynamo-run/src/subprocess.rs (1 hunks)
  • launch/llmctl/src/main.rs (2 hunks)
  • lib/llm/Cargo.toml (2 hunks)
  • lib/llm/src/entrypoint.rs (1 hunks)
  • lib/llm/src/entrypoint/input.rs (1 hunks)
  • lib/llm/src/entrypoint/input/batch.rs (5 hunks)
  • lib/llm/src/entrypoint/input/common.rs (7 hunks)
  • lib/llm/src/entrypoint/input/endpoint.rs (3 hunks)
  • lib/llm/src/entrypoint/input/http.rs (2 hunks)
  • lib/llm/src/entrypoint/input/text.rs (1 hunks)
  • lib/llm/src/lib.rs (1 hunks)
  • lib/llm/src/local_model.rs (5 hunks)
💤 Files with no reviewable changes (2)
  • launch/dynamo-run/Cargo.toml
  • launch/dynamo-run/src/input.rs
🧰 Additional context used
🧠 Learnings (11)
📓 Common learnings
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/llm/src/entrypoint/input/endpoint.rs (1)
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
lib/llm/src/entrypoint/input/text.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
launch/dynamo-run/src/opt.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
lib/llm/src/entrypoint/input/batch.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
lib/llm/src/entrypoint/input/common.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
launch/dynamo-run/src/subprocess.rs (1)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
lib/llm/src/entrypoint/input.rs (1)
Learnt from: ishandhanani
PR: ai-dynamo/dynamo#1626
File: lib/llm/src/preprocessor.rs:238-239
Timestamp: 2025-06-24T20:59:35.725Z
Learning: In lib/llm/src/preprocessor.rs, the `sampling_options` call in the `preprocess_request` method is placed in the common section after the match statement on `request.prompt_input_type()`, meaning it applies to both `PromptInput::Tokens` and `PromptInput::Text` request types.
launch/dynamo-run/src/flags.rs (1)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
launch/dynamo-run/src/lib.rs (4)
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
Learnt from: ryanolson
PR: ai-dynamo/dynamo#1093
File: lib/llm/src/block_manager/block/registry.rs:98-122
Timestamp: 2025-05-29T06:20:12.901Z
Learning: In lib/llm/src/block_manager/block/registry.rs, the background task spawned for handling unregister notifications uses detached concurrency by design. The JoinHandle is intentionally not stored as this represents a reasonable architectural tradeoff for a long-running cleanup task.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: jthomson04
PR: ai-dynamo/dynamo#1429
File: lib/runtime/src/utils/leader_worker_barrier.rs:69-72
Timestamp: 2025-06-08T03:12:03.985Z
Learning: In the leader-worker barrier implementation in lib/runtime/src/utils/leader_worker_barrier.rs, the `wait_for_key_count` function correctly uses exact equality (`==`) instead of greater-than-or-equal (`>=`) because worker IDs must be unique (enforced by etcd create-only operations), ensuring exactly the expected number of workers can register.
lib/llm/src/local_model.rs (1)
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/protocols.rs:85-112
Timestamp: 2025-06-16T20:02:54.935Z
Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.
🧬 Code Graph Analysis (5)
launch/llmctl/src/main.rs (1)
lib/llm/src/local_model.rs (2)
  • default (52-64)
  • model_name (73-76)
lib/llm/src/entrypoint/input/text.rs (2)
lib/llm/src/local_model.rs (2)
  • request_template (115-120)
  • request_template (244-246)
lib/llm/src/entrypoint/input/common.rs (1)
  • prepare_engine (49-123)
lib/llm/src/entrypoint/input/batch.rs (1)
lib/llm/src/entrypoint/input/common.rs (1)
  • prepare_engine (49-123)
lib/llm/src/entrypoint/input/common.rs (4)
lib/llm/src/entrypoint/input/http.rs (2)
  • common (66-69)
  • common (73-76)
lib/llm/src/local_model.rs (4)
  • request_template (115-120)
  • request_template (244-246)
  • card (228-230)
  • service_name (240-242)
lib/llm/src/entrypoint.rs (2)
  • local_model (52-59)
  • new (26-31)
lib/llm/src/engines.rs (1)
  • make_engine_core (84-86)
launch/dynamo-run/src/subprocess.rs (2)
lib/llm/src/local_model.rs (2)
  • endpoint_id (83-86)
  • endpoint_id (263-265)
lib/runtime/src/protocols.rs (1)
  • as_url (178-183)
⏰ Context from checks skipped due to timeout of 90000ms (2)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (30)
lib/llm/src/lib.rs (1)

18-18: LGTM: Clean module declaration

The addition of the entrypoint module declaration properly exposes the new functionality as part of the public API and maintains alphabetical ordering.

lib/llm/src/entrypoint/input/endpoint.rs (3)

6-17: LGTM: Proper relative imports

The change from dynamo_llm to crate imports is correct now that this code resides within the dynamo-llm crate.


24-24: LGTM: Consistent relative import

The update to use crate::entrypoint::EngineConfig maintains consistency with the other relative imports.


82-82: LGTM: Updated pattern matching for new enum structure

The change from EngineConfig::Dynamic to EngineConfig::Dynamic(_) correctly handles the updated enum variant that now contains data.

launch/llmctl/src/main.rs (2)

9-9: LGTM: Updated import for builder pattern

The import change from LocalModel to LocalModelBuilder aligns with the new builder pattern approach.


230-233: LGTM: Proper builder pattern implementation

The refactoring from LocalModel::with_name_only(&model_name) to the builder pattern is correctly implemented:

  • Uses LocalModelBuilder::default() for initialization
  • Properly sets the model name with model_name(Some(model_name))
  • Calls .build().await? for asynchronous construction with proper error handling
lib/llm/Cargo.toml (2)

56-56: LGTM: Added workspace dependency for batch functionality

The addition of humantime as a workspace dependency with clear commenting supports the new input/batch processing features.


84-85: LGTM: Well-configured dependency for interactive input

The dialoguer dependency is properly configured with:

  • Disabled default features to minimize bloat
  • Enabled "editor" and "history" features for rich interactive text input
  • Clear section commenting indicating its purpose
launch/dynamo-run/src/subprocess.rs (2)

21-30: LGTM: Simplified function signature with better encapsulation

The removal of the endpoint: &EndpointId parameter simplifies the function interface by leveraging the endpoint ID now encapsulated within the LocalModel.


40-40: LGTM: Consistent use of encapsulated endpoint

The change from endpoint.as_url() to local_model.endpoint_id().as_url() correctly uses the endpoint ID from the LocalModel, maintaining the same functionality while improving encapsulation.

launch/dynamo-run/src/opt.rs (1)

1-113: Clean refactoring: Input enum successfully migrated to library crate.

The removal of the Input enum and its associated implementations is correctly executed as part of the migration to dynamo_llm::entrypoint::input. The remaining Output enum and its implementations are intact and functional.

launch/dynamo-run/src/main.rs (3)

20-21: Import update correctly reflects the refactoring.

The import change from local dynamo_run::{Input, Output} to dynamo_llm::entrypoint::input::Input and dynamo_run::Output properly reflects the migration of Input to the library crate.


131-133: Validation logic prevents incompatible input/output combinations.

The validation correctly prevents using endpoint input (Input::Endpoint) with dynamic output (Output::Dynamic) simultaneously, which could lead to conflicting configurations.


138-144: Helper functions are correctly implemented.

Both is_in_dynamic and is_out_dynamic functions use appropriate pattern matching to identify the respective dynamic configurations. The logic is sound and the functions are well-named.

lib/llm/src/entrypoint/input/text.rs (2)

4-14: Import reorganization improves code organization.

The imports have been correctly updated to use crate-relative paths (crate::) and are well-organized with related imports grouped together.


20-37: Function refactoring simplifies parameter passing.

The removal of unused _flags and template parameters and the direct use of prepared_engine.request_template eliminates redundant parameter passing. The TODO comment appropriately identifies a potential future improvement to pass the entire prepared_engine to main_loop.

lib/llm/src/entrypoint/input/batch.rs (3)

4-22: Import reorganization follows consistent pattern.

The imports are properly reorganized with crate-local imports grouped before external dependencies, consistent with the pattern used in other refactored files.


67-73: Correct ownership handling for card consumption.

Using prepared_engine.card.take() is the correct approach here since the OpenAIPreprocessor::new() method likely takes ownership of the card. The has_tokenizer() method provides a clean way to check if preprocessing is available.


97-97: Template handling simplified through prepared engine.

Converting prepared_engine.request_template to Option<Arc<RequestTemplate>> correctly handles the optional template from the prepared engine and makes it shareable across async tasks.

lib/llm/src/entrypoint/input/http.rs (3)

6-21: Import reorganization and grouping improves readability.

The imports are well-organized with related crate imports grouped together and external dependencies clearly separated.


24-31: Configuration access centralized through engine_config.

The HTTP service configuration now correctly accesses port and request template directly from engine_config.local_model(), eliminating the need for separate parameters and centralizing configuration access.


33-46: Router configuration properly extracted from engine config.

The match arm correctly handles the EngineConfig::Dynamic(_) variant and extracts router configuration from engine_config.local_model().router_config(). The router mode and KV router config are properly passed to run_watcher.

lib/llm/src/entrypoint.rs (1)

52-52: Consider making local_model() public if needed by external modules.

The local_model() method is currently private. If this method needs to be accessed from outside the module (which seems likely given its utility), consider making it public.

#!/bin/bash
# Check if local_model() is used outside the entrypoint module
rg -A 2 "\.local_model\(\)" --type rust
launch/dynamo-run/src/lib.rs (1)

21-82: Well-structured refactoring of the run function.

The refactoring significantly improves code organization by:

  • Using the builder pattern for LocalModel construction
  • Extracting engine creation logic into a dedicated function
  • Centralizing input handling through the dynamo_llm library
  • Properly handling subprocess cleanup with the extra future

This makes the code more maintainable and follows the single responsibility principle.

lib/llm/src/local_model.rs (6)

8-10: LGTM!

The new imports are appropriate for the builder pattern implementation and new model configuration features.

Also applies to: 18-18, 21-21


33-38: LGTM!

Good choice of default values with clear explanatory comments.


39-65: LGTM!

Well-structured builder pattern implementation with appropriate defaults.


217-265: LGTM!

Clean struct definition with appropriate accessor methods following Rust conventions.


335-343: LGTM!

Good approach using UUID for generating unique internal endpoints. The comment clearly explains the rationale.


109-112: Router config should not be optional if always required

The router_config field is an Option in the builder, but the build() method expects it to always be present. This design is inconsistent and could lead to runtime panics.

Either:

  1. Make router_config a required parameter in the builder constructor, or
  2. Provide a sensible default in the build() method instead of panicking
-                router_config: self
-                    .router_config
-                    .take()
-                    .expect("unreachable, RouterConfig missing"),
+                router_config: self
+                    .router_config
+                    .take()
+                    .unwrap_or_default(),

Also applies to: 150-155, 209-213

⛔ Skipped due to learnings
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scoring.rs:58-63
Timestamp: 2025-05-30T06:38:09.630Z
Learning: In lib/llm/src/kv_router/scoring.rs, the user prefers to keep the panic behavior when calculating load_avg and variance with empty endpoints rather than adding guards for division by zero. They want the code to fail fast on this error condition.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1392
File: lib/llm/src/kv_router/scoring.rs:35-46
Timestamp: 2025-06-05T01:02:15.318Z
Learning: In lib/llm/src/kv_router/scoring.rs, PeaBrane prefers panic-based early failure over Result-based error handling for the worker_id() method to catch invalid data early during development.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/protocols.rs:85-112
Timestamp: 2025-06-16T20:02:54.935Z
Learning: When using derive_builder::Builder macro, the macro generates the builder struct and its methods, but does NOT generate a `builder()` method on the original struct. A manual `impl StructName { pub fn builder() -> StructNameBuilder { StructNameBuilder::default() } }` is required to provide the convenient `StructName::builder()` API pattern.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1285
File: lib/llm/src/kv_router/scheduler.rs:260-266
Timestamp: 2025-05-30T06:34:12.785Z
Learning: In the KV router scheduler code, PeaBrane prefers fail-fast behavior over silent failure handling. When accessing worker metrics data that could be out-of-bounds (like dp_rank indexing), explicit panics are preferred over graceful degradation with continue statements to ensure data integrity issues are caught early.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.

Copy link
Member

@paulhendricks paulhendricks left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work! Left a few little comments.

Move much of what was in the `dynamo-run` crate into `dynamo-llm` so
that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For #1647
And fix bindings, update lock files.
They were usize because that's easiest in Rust, but naturally they are
u32, so put in the extra work to do it right.

The nice part is at the boundary (mistralrs, gguf, etc) they were already u32, suggesting
this is indeed the correct type.
@grahamking grahamking merged commit 92f06b0 into main Jun 30, 2025
11 of 12 checks passed
@grahamking grahamking deleted the gk-dr-split-1 branch June 30, 2025 21:06
atchernych pushed a commit that referenced this pull request Jul 9, 2025
Move much of what was in the `dynamo-run` crate into `dynamo-llm` so that everyone can use it.

Example usage:

1. Create a `LocalModel`:

```
    let local_model = LocalModelBuilder::default()
	.model_path("Qwen/Qwen3-0.6B")
	.http_port(8080)
	.build().await?;
```

2. Make an engine:

```
    let engine_config = EngineConfig::StaticFull {
	engine: dynamo_engine_mistralrs::make_engine(&local_model).await?,
	model: Box::new(local_model),
    };
```

3. Connect it to an input and run it

```
    dynamo_llm::entrypoint::input::run_input(Input::Http, runtime, engine_config).await?;
```

For #1647

Code Rabbit summary, thanks:
  * Introduced a flexible builder pattern for local model configuration, allowing advanced customization and easier initialization.
  * Added new input modes and unified input handling, supporting interactive chat, HTTP server, batch file, and distributed endpoint modes.
  * Centralized engine configuration and routing, enabling more extensible and maintainable engine management.
  * Simplified and modularized the codebase by moving input and engine logic into dedicated modules.
  * Replaced direct model construction with an asynchronous builder for improved clarity and extensibility.
  * Streamlined configuration and validation for flags and router settings.
  * Added validation to prevent incompatible input and output combinations in endpoint and dynamic modes.
This was referenced Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants