Skip to content

Conversation

@grahamking
Copy link
Contributor

@grahamking grahamking commented Jul 7, 2025

HTTP server with mistralrs running Qwen3:

from dynamo.llm import EngineType, EntrypointArgs, make_engine, run_input

@dynamo_worker(static=False)
async def run(runtime: DistributedRuntime):
    e = EntrypointArgs(EngineType.MistralRs, model_path="Qwen/Qwen3-0.6B")
    engine = await make_engine(runtime, e)
    await run_input(runtime, "http", engine)

uvloop.run(run())

Full dynamo-run style Python CLI in examples/cli/cli/py.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Jul 7, 2025

Walkthrough

This update introduces a new CLI example for running LLM engines with configurable parameters, expands Python bindings and type hints for engine creation and execution, and adds support for new engine backends (MistralRs and LlamaCpp) with corresponding Rust and Python integration. Several Rust APIs are updated for optional parameters and runtime selection, and documentation is improved for experimental features.

Changes

File(s) Change Summary
examples/cli/cli.py New CLI script for running LLM engines with argument parsing, async execution, and engine selection.
lib/bindings/python/rust/llm/entrypoint.rs New Rust module: Python bindings for engine entrypoint—defines EngineType, EntrypointArgs, EngineConfig, and async functions make_engine and run_input.
lib/bindings/python/rust/llm.rs Removes license header; adds public entrypoint module.
lib/bindings/python/rust/lib.rs Exposes make_engine, run_input, EntrypointArgs, EngineConfig, and EngineType to Python.
lib/bindings/python/src/dynamo/_core.pyi Adds type hints and async function signatures for engine creation and execution types/functions.
lib/bindings/python/src/dynamo/llm/__init__.py Imports and exposes EngineType, EntrypointArgs, make_engine, and run_input.
lib/bindings/python/Cargo.toml Adds features and dependencies for mistralrs, llamacpp, and either crate; updates tokio version.
lib/bindings/python/README.md Documents experimental build for mistralrs/llamacpp features and runtime troubleshooting.
lib/engines/llamacpp/Cargo.toml Removes license header; adds dynamic-link feature for shared library support.
lib/llm/src/entrypoint.rs Derives Clone for EngineConfig.
lib/llm/src/entrypoint/input.rs Implements FromStr for Input; updates run_input to accept Either<Runtime, DistributedRuntime> and clarifies requirements for endpoint input.
lib/llm/src/local_model.rs Updates LocalModelBuilder setters: endpoint_id and http_port now accept Option types to allow resetting to defaults.
launch/dynamo-run/Cargo.toml Adds either crate as a workspace dependency.
launch/dynamo-run/src/lib.rs Updates runtime selection logic: uses Either to switch between standard and distributed runtime for input processing.

Sequence Diagram(s)

sequenceDiagram
    participant User
    participant CLI (Python)
    participant Dynamo Python Bindings
    participant Rust Entrypoint
    participant Engine

    User->>CLI (Python): Run CLI with arguments
    CLI (Python)->>Dynamo Python Bindings: parse_args() & EntrypointArgs
    CLI (Python)->>Dynamo Python Bindings: await make_engine(args)
    Dynamo Python Bindings->>Rust Entrypoint: make_engine(args)
    Rust Entrypoint->>Engine: Build engine (Echo/MistralRs/LlamaCpp/Dynamic)
    Engine-->>Rust Entrypoint: EngineConfig
    Rust Entrypoint-->>Dynamo Python Bindings: EngineConfig
    Dynamo Python Bindings-->>CLI (Python): EngineConfig
    CLI (Python)->>Dynamo Python Bindings: await run_input(input, runtime, EngineConfig)
    Dynamo Python Bindings->>Rust Entrypoint: run_input(input, runtime, EngineConfig)
    Rust Entrypoint->>Engine: Process input
    Engine-->>Rust Entrypoint: Output/Result
    Rust Entrypoint-->>Dynamo Python Bindings: Result
    Dynamo Python Bindings-->>CLI (Python): Result
    CLI (Python)-->>User: Output
Loading

Possibly related PRs

Poem

🐇
A bunny with code so neat,
Brings engines new for you to greet—
With Mistral, Llama, Echo’s call,
Python and Rust now bridge them all!
Command lines hop, bindings grow,
Engines run with a single go.
🥕✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (4)
lib/bindings/python/README.md (1)

49-65: Excellent documentation with minor formatting improvements needed.

The experimental build instructions are comprehensive and include valuable troubleshooting information. However, there are a few formatting and grammar issues to address:

-5. Experimental: To allow using mistral.rs and llama.cpp via the bindings, build with feature flags
+5. Experimental: To allow using mistral.rs and llama.cpp via the bindings, build with feature flags:

-```
+```bash
 maturin develop --features mistralrs,llamacpp

- +bash
patchelf --set-rpath '' _core.cpython-312-x86_64-linux-gnu.so


-If you include `llamacpp` feature flag, `libllama.so` and `libggml.so` (and family) will need to be available at runtime.
+If you include the `llamacpp` feature flag, `libllama.so` and `libggml.so` (and family) will need to be available at runtime.
examples/cli/cli.py (2)

82-92: Consider using a dictionary for engine type mapping.

While the current implementation works, a dictionary-based approach would be more maintainable and reduce code duplication.

-    if output == "echo":
-        engine_type = EngineType.Echo
-    elif output == "mistralrs":
-        engine_type = EngineType.MistralRs
-    elif output == "llamacpp":
-        engine_type = EngineType.LlamaCpp
-    elif output == "dyn":
-        engine_type = EngineType.Dynamic
-    else:
-        print(f"Unsupported output type: {output}")
-        sys.exit(1)
+    engine_type_map = {
+        "echo": EngineType.Echo,
+        "mistralrs": EngineType.MistralRs,
+        "llamacpp": EngineType.LlamaCpp,
+        "dyn": EngineType.Dynamic,
+    }
+    
+    engine_type = engine_type_map.get(output)
+    if engine_type is None:
+        print(f"Unsupported output type: {output}")
+        sys.exit(1)

94-94: Track the TODO for vllm, sglang, and trtllm engine types.

This TODO indicates missing engine type implementations that should be tracked for future development.

Would you like me to create a GitHub issue to track the implementation of vllm, sglang, and trtllm engine types that call Python directly?

lib/bindings/python/rust/llm/entrypoint.rs (1)

157-157: Consider exposing the Input enum to Python for better type safety.

The TODO correctly identifies that exposing the Input enum would provide better type safety and IDE support compared to parsing strings.

Would you like me to create a GitHub issue to track exposing the Input enum to Python bindings for improved type safety?

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b204456 and 3cca471.

⛔ Files ignored due to path filters (2)
  • Cargo.lock is excluded by !**/*.lock
  • lib/bindings/python/Cargo.lock is excluded by !**/*.lock
📒 Files selected for processing (14)
  • examples/cli/cli.py (1 hunks)
  • launch/dynamo-run/Cargo.toml (1 hunks)
  • launch/dynamo-run/src/lib.rs (3 hunks)
  • lib/bindings/python/Cargo.toml (1 hunks)
  • lib/bindings/python/README.md (1 hunks)
  • lib/bindings/python/rust/lib.rs (2 hunks)
  • lib/bindings/python/rust/llm.rs (1 hunks)
  • lib/bindings/python/rust/llm/entrypoint.rs (1 hunks)
  • lib/bindings/python/src/dynamo/_core.pyi (2 hunks)
  • lib/bindings/python/src/dynamo/llm/__init__.py (2 hunks)
  • lib/engines/llamacpp/Cargo.toml (1 hunks)
  • lib/llm/src/entrypoint.rs (1 hunks)
  • lib/llm/src/entrypoint/input.rs (4 hunks)
  • lib/llm/src/local_model.rs (2 hunks)
🧰 Additional context used
🧠 Learnings (6)
launch/dynamo-run/Cargo.toml (2)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
lib/bindings/python/rust/llm.rs (2)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/bindings/python/Cargo.toml (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/llm/src/entrypoint/input.rs (1)
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
examples/cli/cli.py (1)
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.
lib/bindings/python/src/dynamo/llm/__init__.py (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
🧬 Code Graph Analysis (3)
launch/dynamo-run/src/lib.rs (2)
lib/runtime/src/distributed.rs (3)
  • runtime (90-92)
  • from_settings (79-82)
  • from_settings (176-182)
lib/llm/src/entrypoint/input.rs (1)
  • run_input (102-134)
lib/llm/src/entrypoint/input.rs (5)
lib/bindings/python/rust/llm/entrypoint.rs (2)
  • run_input (160-177)
  • new (44-69)
launch/dynamo-run/src/lib.rs (1)
  • run (23-88)
lib/llm/src/entrypoint/input/endpoint.rs (1)
  • run (26-109)
lib/llm/src/entrypoint/input/http.rs (1)
  • run (24-92)
lib/llm/src/entrypoint/input/text.rs (1)
  • run (20-37)
lib/bindings/python/src/dynamo/llm/__init__.py (3)
lib/bindings/python/rust/lib.rs (2)
  • _core (61-112)
  • register_llm (131-171)
lib/bindings/python/rust/llm/entrypoint.rs (2)
  • make_engine (80-102)
  • run_input (160-177)
lib/llm/src/entrypoint/input.rs (1)
  • run_input (102-134)
🪛 LanguageTool
lib/bindings/python/README.md

[uncategorized] ~64-~64: You might be missing the article “the” here.
Context: ...x86_64-linux-gnu.so ``` If you include llamacpp feature flag, `libllama.so` an...

(AI_EN_LECTOR_MISSING_DETERMINER_THE)

🪛 markdownlint-cli2 (0.17.2)
lib/bindings/python/README.md

51-51: Fenced code blocks should have a language specified

(MD040, fenced-code-language)


60-60: Fenced code blocks should have a language specified

(MD040, fenced-code-language)

🪛 Ruff (0.11.9)
lib/bindings/python/src/dynamo/llm/__init__.py

27-27: dynamo._core.EngineType imported but unused

(F401)


45-45: dynamo._core.make_engine imported but unused

(F401)


47-47: dynamo._core.run_input imported but unused

(F401)

🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/1799/merge) by grahamking.
lib/bindings/python/src/dynamo/llm/__init__.py

[error] 27-27: Ruff: dynamo._core.EngineType imported but unused (F401)


[error] 45-45: Ruff: dynamo._core.make_engine imported but unused (F401)


[error] 47-47: Ruff: dynamo._core.run_input imported but unused (F401)

lib/bindings/python/src/dynamo/_core.pyi

[error] 802-802: Ruff: Undefined name Runtime (F821)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: pre-merge-rust (lib/runtime/examples)
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - vllm
🔇 Additional comments (23)
launch/dynamo-run/Cargo.toml (1)

37-37: LGTM: Clean dependency addition for runtime selection.

The addition of the either crate follows the established workspace dependency pattern and supports the dynamic runtime selection functionality mentioned in the PR.

lib/bindings/python/rust/llm.rs (1)

32-32: LGTM: Module addition follows established patterns.

The addition of the entrypoint module is clean and follows the existing module organization pattern in the file.

lib/engines/llamacpp/Cargo.toml (1)

21-22: LGTM: Appropriate feature addition for Python bindings.

The dynamic-link feature is correctly implemented with a clear explanation of why it's needed. This is essential for Python bindings integration where static linking into shared objects isn't possible.

lib/bindings/python/rust/lib.rs (2)

66-67: LGTM: Proper function exposure for Python bindings.

The new async functions make_engine and run_input are correctly wrapped and exposed to the Python module, enabling asynchronous engine creation and execution from Python.


79-81: LGTM: Proper class exposure for Python bindings.

The new classes EntrypointArgs, EngineConfig, and EngineType are correctly added to the Python module, providing the necessary types for engine configuration and management.

launch/dynamo-run/src/lib.rs (4)

14-14: LGTM: Appropriate dependency addition.

The either crate import is correctly added to support the new runtime selection logic.


45-45: LGTM: Updated API calls with optional parameters.

The calls to http_port() and endpoint_id() are correctly updated to use Some() wrappers, aligning with the new optional parameter signatures in the LocalModelBuilder.

Also applies to: 53-53


51-58: LGTM: Dynamic runtime selection implementation.

The logic correctly creates either a standard runtime or distributed runtime based on the input type. The Either::Left wraps the original runtime, while Either::Right wraps a newly created DistributedRuntime when the input is an endpoint. This aligns with the run_input function's requirement for distributed runtime when handling endpoint inputs.


79-79: LGTM: Updated function call with Either runtime.

The run_input call is correctly updated to pass the Either runtime type, supporting both standard and distributed runtime scenarios.

lib/bindings/python/Cargo.toml (4)

39-40: LGTM: Appropriate feature flag additions.

The new optional features mistralrs and llamacpp are correctly configured to enable the corresponding engine dependencies when needed.


45-46: LGTM: Well-configured optional dependencies.

The engine dependencies are properly configured with:

  • CUDA support enabled for both engines
  • Dynamic linking feature for llamacpp (improving flexibility)
  • Appropriate optional flags to avoid unnecessary compilation

52-52: LGTM: Appropriate dependency addition.

The either crate with serde support correctly supports the runtime selection functionality introduced in this PR.


58-58: LGTM: Tokio version specification.

The explicit tokio version 1.46.0 with full features provides stable async runtime support for the Python bindings.

lib/llm/src/local_model.rs (2)

83-86: LGTM: Enhanced flexibility with optional parameter.

The endpoint_id setter now accepts Option<EndpointId>, allowing callers to explicitly set the endpoint ID or clear it by passing None. This change aligns with the updated usage in launch/dynamo-run/src/lib.rs.


99-103: LGTM: Enhanced flexibility with optional parameter and clear documentation.

The http_port setter now accepts Option<u16> with proper default handling and clear documentation. The unwrap_or(DEFAULT_HTTP_PORT) correctly resets to the default when None is passed, providing flexible configuration options.

lib/llm/src/entrypoint/input.rs (4)

47-53: Good addition of the FromStr trait implementation.

This idiomatic implementation enables more ergonomic parsing of Input values using .parse() while reusing the existing TryFrom<&str> logic.


100-110: Well-designed runtime abstraction using Either.

The change elegantly handles both runtime types while maintaining a clear contract: Input::Endpoint requires a DistributedRuntime, while other inputs can use either runtime type. The runtime extraction logic correctly handles both cases.


113-125: Clean refactoring to eliminate redundant cloning.

Good improvement - the runtime is now cloned once during extraction rather than multiple times for each handler.


127-129: Clear enforcement of the DistributedRuntime requirement for endpoints.

The pattern matching with let-else is idiomatic, and the error message clearly communicates the requirement to users.

examples/cli/cli.py (2)

19-72: Well-structured argument parser with clear documentation.

Good use of type hints and the ArgumentDefaultsHelpFormatter to provide a user-friendly CLI interface. The default model path provides a reasonable starting point for users.


96-108: Verify the handling of optional model_path.

The model_path is always included in entrypoint_kwargs even when it might be None, while other optional arguments are conditionally included. Is this intentional behavior, or should model_path also be conditionally included?

lib/bindings/python/rust/llm/entrypoint.rs (2)

78-102: Clean implementation of the async engine creation.

Good use of the builder pattern and proper async/await handling with PyO3. The error propagation is consistent throughout.


118-151: Excellent handling of optional features with clear error messages.

The conditional compilation is well-implemented, and the error messages are particularly helpful by providing specific rebuild instructions when features are not enabled.

Example dynamo-run style Python CLI in `examples/cli/cli/py`.

Slower than the pure-Rust binary, need to find out why.

There are extra steps for using mistralrs or llamacpp, see the README.
@grahamking grahamking merged commit 2bf2792 into main Jul 8, 2025
12 of 13 checks passed
@grahamking grahamking deleted the gk-dr-bind branch July 8, 2025 21:40
@coderabbitai coderabbitai bot mentioned this pull request Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants