-
Notifications
You must be signed in to change notification settings - Fork 691
feat(python): Python bindings for the Dynamo CLI tools #1799
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
WalkthroughThis update introduces a new CLI example for running LLM engines with configurable parameters, expands Python bindings and type hints for engine creation and execution, and adds support for new engine backends (MistralRs and LlamaCpp) with corresponding Rust and Python integration. Several Rust APIs are updated for optional parameters and runtime selection, and documentation is improved for experimental features. Changes
Sequence Diagram(s)sequenceDiagram
participant User
participant CLI (Python)
participant Dynamo Python Bindings
participant Rust Entrypoint
participant Engine
User->>CLI (Python): Run CLI with arguments
CLI (Python)->>Dynamo Python Bindings: parse_args() & EntrypointArgs
CLI (Python)->>Dynamo Python Bindings: await make_engine(args)
Dynamo Python Bindings->>Rust Entrypoint: make_engine(args)
Rust Entrypoint->>Engine: Build engine (Echo/MistralRs/LlamaCpp/Dynamic)
Engine-->>Rust Entrypoint: EngineConfig
Rust Entrypoint-->>Dynamo Python Bindings: EngineConfig
Dynamo Python Bindings-->>CLI (Python): EngineConfig
CLI (Python)->>Dynamo Python Bindings: await run_input(input, runtime, EngineConfig)
Dynamo Python Bindings->>Rust Entrypoint: run_input(input, runtime, EngineConfig)
Rust Entrypoint->>Engine: Process input
Engine-->>Rust Entrypoint: Output/Result
Rust Entrypoint-->>Dynamo Python Bindings: Result
Dynamo Python Bindings-->>CLI (Python): Result
CLI (Python)-->>User: Output
Possibly related PRs
Poem
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. 🪧 TipsChatThere are 3 ways to chat with CodeRabbit:
SupportNeed help? Create a ticket on our support page for assistance with any issues or questions. Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments. CodeRabbit Commands (Invoked using PR comments)
Other keywords and placeholders
Documentation and Community
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 4
🧹 Nitpick comments (4)
lib/bindings/python/README.md (1)
49-65: Excellent documentation with minor formatting improvements needed.The experimental build instructions are comprehensive and include valuable troubleshooting information. However, there are a few formatting and grammar issues to address:
-5. Experimental: To allow using mistral.rs and llama.cpp via the bindings, build with feature flags +5. Experimental: To allow using mistral.rs and llama.cpp via the bindings, build with feature flags: -``` +```bash maturin develop --features mistralrs,llamacpp-
+bash
patchelf --set-rpath '' _core.cpython-312-x86_64-linux-gnu.so-If you include `llamacpp` feature flag, `libllama.so` and `libggml.so` (and family) will need to be available at runtime. +If you include the `llamacpp` feature flag, `libllama.so` and `libggml.so` (and family) will need to be available at runtime.examples/cli/cli.py (2)
82-92: Consider using a dictionary for engine type mapping.While the current implementation works, a dictionary-based approach would be more maintainable and reduce code duplication.
- if output == "echo": - engine_type = EngineType.Echo - elif output == "mistralrs": - engine_type = EngineType.MistralRs - elif output == "llamacpp": - engine_type = EngineType.LlamaCpp - elif output == "dyn": - engine_type = EngineType.Dynamic - else: - print(f"Unsupported output type: {output}") - sys.exit(1) + engine_type_map = { + "echo": EngineType.Echo, + "mistralrs": EngineType.MistralRs, + "llamacpp": EngineType.LlamaCpp, + "dyn": EngineType.Dynamic, + } + + engine_type = engine_type_map.get(output) + if engine_type is None: + print(f"Unsupported output type: {output}") + sys.exit(1)
94-94: Track the TODO for vllm, sglang, and trtllm engine types.This TODO indicates missing engine type implementations that should be tracked for future development.
Would you like me to create a GitHub issue to track the implementation of vllm, sglang, and trtllm engine types that call Python directly?
lib/bindings/python/rust/llm/entrypoint.rs (1)
157-157: Consider exposing the Input enum to Python for better type safety.The TODO correctly identifies that exposing the Input enum would provide better type safety and IDE support compared to parsing strings.
Would you like me to create a GitHub issue to track exposing the Input enum to Python bindings for improved type safety?
📜 Review details
Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
⛔ Files ignored due to path filters (2)
Cargo.lockis excluded by!**/*.locklib/bindings/python/Cargo.lockis excluded by!**/*.lock
📒 Files selected for processing (14)
examples/cli/cli.py(1 hunks)launch/dynamo-run/Cargo.toml(1 hunks)launch/dynamo-run/src/lib.rs(3 hunks)lib/bindings/python/Cargo.toml(1 hunks)lib/bindings/python/README.md(1 hunks)lib/bindings/python/rust/lib.rs(2 hunks)lib/bindings/python/rust/llm.rs(1 hunks)lib/bindings/python/rust/llm/entrypoint.rs(1 hunks)lib/bindings/python/src/dynamo/_core.pyi(2 hunks)lib/bindings/python/src/dynamo/llm/__init__.py(2 hunks)lib/engines/llamacpp/Cargo.toml(1 hunks)lib/llm/src/entrypoint.rs(1 hunks)lib/llm/src/entrypoint/input.rs(4 hunks)lib/llm/src/local_model.rs(2 hunks)
🧰 Additional context used
🧠 Learnings (6)
launch/dynamo-run/Cargo.toml (2)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: PeaBrane
PR: ai-dynamo/dynamo#1236
File: lib/llm/src/mocker/engine.rs:140-161
Timestamp: 2025-06-17T00:50:44.845Z
Learning: In Rust async code, when an Arc<Mutex<_>> is used solely to transfer ownership of a resource (like a channel receiver) into a spawned task rather than for sharing between multiple tasks, holding the mutex lock across an await is not problematic since there's no actual contention.
lib/bindings/python/rust/llm.rs (2)
Learnt from: kthui
PR: ai-dynamo/dynamo#1424
File: lib/runtime/src/pipeline/network/egress/push_router.rs:204-209
Timestamp: 2025-06-13T22:07:24.843Z
Learning: The codebase uses async-nats version 0.40, not the older nats crate. Error handling should use async_nats::error::Error variants, not nats::Error variants.
Learnt from: alec-flowers
PR: ai-dynamo/dynamo#1181
File: lib/llm/src/kv_router/publisher.rs:379-425
Timestamp: 2025-05-29T00:02:35.018Z
Learning: In lib/llm/src/kv_router/publisher.rs, the functions `create_stored_blocks` and `create_stored_block_from_parts` are correctly implemented and not problematic duplications of existing functionality elsewhere in the codebase.
lib/bindings/python/Cargo.toml (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
lib/llm/src/entrypoint/input.rs (1)
Learnt from: oandreeva-nv
PR: ai-dynamo/dynamo#1195
File: lib/llm/tests/block_manager.rs:150-152
Timestamp: 2025-06-02T19:37:27.666Z
Learning: In Rust/Tokio applications, when background tasks use channels for communication, dropping the sender automatically signals task termination when the receiver gets `None`. The `start_batching_publisher` function in `lib/llm/tests/block_manager.rs` demonstrates this pattern: when the `KVBMDynamoRuntimeComponent` is dropped, its `batch_tx` sender is dropped, causing `rx.recv()` to return `None`, which triggers cleanup and task termination.
examples/cli/cli.py (1)
Learnt from: nnshah1
PR: ai-dynamo/dynamo#1444
File: tests/fault_tolerance/utils/metrics.py:30-32
Timestamp: 2025-07-01T13:55:03.940Z
Learning: The `@dynamo_worker()` decorator in the dynamo codebase returns a wrapper that automatically injects the `runtime` parameter before calling the wrapped function. This means callers only need to provide the non-runtime parameters, while the decorator handles injecting the runtime argument automatically. For example, a function with signature `async def get_metrics(runtime, log_dir)` decorated with `@dynamo_worker()` can be called as `get_metrics(log_dir)` because the decorator wrapper injects the runtime parameter.
lib/bindings/python/src/dynamo/llm/__init__.py (1)
Learnt from: biswapanda
PR: ai-dynamo/dynamo#1412
File: lib/bindings/python/src/dynamo/runtime/logging.py:100-100
Timestamp: 2025-06-06T21:48:35.214Z
Learning: In the Dynamo codebase, BentoML has been completely removed from all executable code, with only documentation and attribution references remaining. The error_loggers configuration in lib/bindings/python/src/dynamo/runtime/logging.py should not include "bentoml" since those modules no longer exist.
🧬 Code Graph Analysis (3)
launch/dynamo-run/src/lib.rs (2)
lib/runtime/src/distributed.rs (3)
runtime(90-92)from_settings(79-82)from_settings(176-182)lib/llm/src/entrypoint/input.rs (1)
run_input(102-134)
lib/llm/src/entrypoint/input.rs (5)
lib/bindings/python/rust/llm/entrypoint.rs (2)
run_input(160-177)new(44-69)launch/dynamo-run/src/lib.rs (1)
run(23-88)lib/llm/src/entrypoint/input/endpoint.rs (1)
run(26-109)lib/llm/src/entrypoint/input/http.rs (1)
run(24-92)lib/llm/src/entrypoint/input/text.rs (1)
run(20-37)
lib/bindings/python/src/dynamo/llm/__init__.py (3)
lib/bindings/python/rust/lib.rs (2)
_core(61-112)register_llm(131-171)lib/bindings/python/rust/llm/entrypoint.rs (2)
make_engine(80-102)run_input(160-177)lib/llm/src/entrypoint/input.rs (1)
run_input(102-134)
🪛 LanguageTool
lib/bindings/python/README.md
[uncategorized] ~64-~64: You might be missing the article “the” here.
Context: ...x86_64-linux-gnu.so ``` If you include llamacpp feature flag, `libllama.so` an...
(AI_EN_LECTOR_MISSING_DETERMINER_THE)
🪛 markdownlint-cli2 (0.17.2)
lib/bindings/python/README.md
51-51: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
60-60: Fenced code blocks should have a language specified
(MD040, fenced-code-language)
🪛 Ruff (0.11.9)
lib/bindings/python/src/dynamo/llm/__init__.py
27-27: dynamo._core.EngineType imported but unused
(F401)
45-45: dynamo._core.make_engine imported but unused
(F401)
47-47: dynamo._core.run_input imported but unused
(F401)
🪛 GitHub Actions: Pre Merge Validation of (ai-dynamo/dynamo/refs/pull/1799/merge) by grahamking.
lib/bindings/python/src/dynamo/llm/__init__.py
[error] 27-27: Ruff: dynamo._core.EngineType imported but unused (F401)
[error] 45-45: Ruff: dynamo._core.make_engine imported but unused (F401)
[error] 47-47: Ruff: dynamo._core.run_input imported but unused (F401)
lib/bindings/python/src/dynamo/_core.pyi
[error] 802-802: Ruff: Undefined name Runtime (F821)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
- GitHub Check: pre-merge-rust (lib/runtime/examples)
- GitHub Check: pre-merge-rust (.)
- GitHub Check: pre-merge-rust (lib/bindings/python)
- GitHub Check: Build and Test - vllm
🔇 Additional comments (23)
launch/dynamo-run/Cargo.toml (1)
37-37: LGTM: Clean dependency addition for runtime selection.The addition of the
eithercrate follows the established workspace dependency pattern and supports the dynamic runtime selection functionality mentioned in the PR.lib/bindings/python/rust/llm.rs (1)
32-32: LGTM: Module addition follows established patterns.The addition of the
entrypointmodule is clean and follows the existing module organization pattern in the file.lib/engines/llamacpp/Cargo.toml (1)
21-22: LGTM: Appropriate feature addition for Python bindings.The
dynamic-linkfeature is correctly implemented with a clear explanation of why it's needed. This is essential for Python bindings integration where static linking into shared objects isn't possible.lib/bindings/python/rust/lib.rs (2)
66-67: LGTM: Proper function exposure for Python bindings.The new async functions
make_engineandrun_inputare correctly wrapped and exposed to the Python module, enabling asynchronous engine creation and execution from Python.
79-81: LGTM: Proper class exposure for Python bindings.The new classes
EntrypointArgs,EngineConfig, andEngineTypeare correctly added to the Python module, providing the necessary types for engine configuration and management.launch/dynamo-run/src/lib.rs (4)
14-14: LGTM: Appropriate dependency addition.The
eithercrate import is correctly added to support the new runtime selection logic.
45-45: LGTM: Updated API calls with optional parameters.The calls to
http_port()andendpoint_id()are correctly updated to useSome()wrappers, aligning with the new optional parameter signatures in theLocalModelBuilder.Also applies to: 53-53
51-58: LGTM: Dynamic runtime selection implementation.The logic correctly creates either a standard runtime or distributed runtime based on the input type. The
Either::Leftwraps the original runtime, whileEither::Rightwraps a newly createdDistributedRuntimewhen the input is an endpoint. This aligns with therun_inputfunction's requirement for distributed runtime when handling endpoint inputs.
79-79: LGTM: Updated function call with Either runtime.The
run_inputcall is correctly updated to pass theEitherruntime type, supporting both standard and distributed runtime scenarios.lib/bindings/python/Cargo.toml (4)
39-40: LGTM: Appropriate feature flag additions.The new optional features
mistralrsandllamacppare correctly configured to enable the corresponding engine dependencies when needed.
45-46: LGTM: Well-configured optional dependencies.The engine dependencies are properly configured with:
- CUDA support enabled for both engines
- Dynamic linking feature for llamacpp (improving flexibility)
- Appropriate optional flags to avoid unnecessary compilation
52-52: LGTM: Appropriate dependency addition.The
eithercrate with serde support correctly supports the runtime selection functionality introduced in this PR.
58-58: LGTM: Tokio version specification.The explicit tokio version
1.46.0with full features provides stable async runtime support for the Python bindings.lib/llm/src/local_model.rs (2)
83-86: LGTM: Enhanced flexibility with optional parameter.The
endpoint_idsetter now acceptsOption<EndpointId>, allowing callers to explicitly set the endpoint ID or clear it by passingNone. This change aligns with the updated usage inlaunch/dynamo-run/src/lib.rs.
99-103: LGTM: Enhanced flexibility with optional parameter and clear documentation.The
http_portsetter now acceptsOption<u16>with proper default handling and clear documentation. Theunwrap_or(DEFAULT_HTTP_PORT)correctly resets to the default whenNoneis passed, providing flexible configuration options.lib/llm/src/entrypoint/input.rs (4)
47-53: Good addition of theFromStrtrait implementation.This idiomatic implementation enables more ergonomic parsing of
Inputvalues using.parse()while reusing the existingTryFrom<&str>logic.
100-110: Well-designed runtime abstraction usingEither.The change elegantly handles both runtime types while maintaining a clear contract:
Input::Endpointrequires aDistributedRuntime, while other inputs can use either runtime type. The runtime extraction logic correctly handles both cases.
113-125: Clean refactoring to eliminate redundant cloning.Good improvement - the runtime is now cloned once during extraction rather than multiple times for each handler.
127-129: Clear enforcement of theDistributedRuntimerequirement for endpoints.The pattern matching with
let-elseis idiomatic, and the error message clearly communicates the requirement to users.examples/cli/cli.py (2)
19-72: Well-structured argument parser with clear documentation.Good use of type hints and the
ArgumentDefaultsHelpFormatterto provide a user-friendly CLI interface. The default model path provides a reasonable starting point for users.
96-108: Verify the handling of optional model_path.The
model_pathis always included inentrypoint_kwargseven when it might beNone, while other optional arguments are conditionally included. Is this intentional behavior, or shouldmodel_pathalso be conditionally included?lib/bindings/python/rust/llm/entrypoint.rs (2)
78-102: Clean implementation of the async engine creation.Good use of the builder pattern and proper async/await handling with PyO3. The error propagation is consistent throughout.
118-151: Excellent handling of optional features with clear error messages.The conditional compilation is well-implemented, and the error messages are particularly helpful by providing specific rebuild instructions when features are not enabled.
Example dynamo-run style Python CLI in `examples/cli/cli/py`. Slower than the pure-Rust binary, need to find out why. There are extra steps for using mistralrs or llamacpp, see the README.
HTTP server with mistralrs running Qwen3:
Full dynamo-run style Python CLI in
examples/cli/cli/py.