-
Notifications
You must be signed in to change notification settings - Fork 720
feat: add register_model function for non-llms #4686
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
WalkthroughThe changes introduce a new Changes
Estimated code review effort🎯 2 (Simple) | ⏱️ ~10 minutes
Poem
Pre-merge checks✅ Passed checks (3 passed)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actionable comments posted: 0
🧹 Nitpick comments (1)
lib/bindings/python/rust/lib.rs (1)
360-428:register_modelbinding matches stub and intended behaviorThe new
register_model:
- Mirrors the async pattern used by
register_llm(Tokiofuture_into_py, returns an awaitable that resolves toNone).- Applies sensible defaults (
TensorBased+Tensor) consistent with the Python stub and docstring.- Builds a minimal
ModelDeploymentCardviawith_name_only, fillsmodel_type,model_input,user_data, and optionallyruntime_config, then registers viaDiscoverySpec::from_model, matching the “no downloads” contract.One optional hardening you might consider (non-blocking):
- If this API is meant strictly for tensor-only deployments, you could defensively reject
model_typevalues that do not support tensors (or at least log a warning) to catch accidental misuse where someone passes a pure Chat/Completions type but expects HF-style behavior.Otherwise, this binding looks solid and cohesive with the rest of the module.
📜 Review details
Configuration used: Path: .coderabbit.yaml
Review profile: CHILL
Plan: Pro
📒 Files selected for processing (4)
lib/bindings/python/rust/lib.rs(2 hunks)lib/bindings/python/src/dynamo/_core.pyi(1 hunks)lib/bindings/python/src/dynamo/llm/__init__.py(1 hunks)lib/llm/src/model_card.rs(1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-21T17:23:02.836Z
Learnt from: michaelfeil
Repo: ai-dynamo/dynamo PR: 2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.
Applied to files:
lib/bindings/python/rust/lib.rs
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.
Applied to files:
lib/bindings/python/rust/lib.rslib/llm/src/model_card.rs
🧬 Code graph analysis (4)
lib/bindings/python/src/dynamo/llm/__init__.py (2)
lib/bindings/python/rust/lib.rs (2)
_core(127-206)register_model(374-428)lib/bindings/python/src/dynamo/_core.pyi (1)
register_model(1074-1099)
lib/bindings/python/rust/lib.rs (5)
lib/bindings/python/rust/prometheus_metrics.rs (9)
m(1004-1004)m(1005-1005)m(1006-1006)m(1007-1007)m(1008-1008)m(1009-1009)m(1010-1010)m(1011-1011)m(1012-1012)lib/bindings/python/src/dynamo/_core.pyi (7)
register_model(1074-1099)endpoint(104-108)Endpoint(120-161)ModelType(1007-1014)ModelInput(1003-1005)ModelRuntimeConfig(426-447)ModelDeploymentCard(419-424)lib/llm/src/model_card.rs (3)
model_type(570-570)model_type(753-755)with_name_only(243-249)lib/bindings/python/rust/llm/kv.rs (16)
py(68-68)py(1270-1270)new(45-51)new(120-132)new(143-154)new(172-195)new(242-269)new(390-418)new(647-683)new(737-773)new(825-885)new(955-965)new(972-984)new(991-1003)new(1010-1024)new(1098-1135)lib/runtime/src/discovery/mod.rs (1)
from_model(88-104)
lib/llm/src/model_card.rs (1)
lib/llm/src/local_model.rs (1)
display_name(360-362)
lib/bindings/python/src/dynamo/_core.pyi (2)
lib/bindings/python/rust/lib.rs (2)
register_model(374-428)endpoint(650-656)lib/llm/src/local_model.rs (4)
model_name(97-100)user_data(178-181)runtime_config(173-176)runtime_config(394-396)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
- GitHub Check: trtllm (amd64)
- GitHub Check: trtllm (arm64)
- GitHub Check: operator (arm64)
- GitHub Check: sglang (arm64)
- GitHub Check: vllm (arm64)
- GitHub Check: sglang (amd64)
- GitHub Check: operator (amd64)
- GitHub Check: vllm (amd64)
- GitHub Check: tests (launch/dynamo-run)
- GitHub Check: Build and Test - dynamo
- GitHub Check: tests (.)
- GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (4)
lib/bindings/python/src/dynamo/llm/__init__.py (1)
43-43: Exposeregister_modelinllmnamespace – looks correctImporting
register_modelfromdynamo._coreis consistent with howregister_llmand other bindings are surfaced; no issues spotted here.lib/llm/src/model_card.rs (1)
381-395: Confirm breadth ofsupports_tensor()guard indownload_configThe new early-return for
self.model_type.supports_tensor()means any tensor-capable model will now skip HuggingFace config/tokenizer downloads, even ifmodel_typemight be a bitflag combination (e.g., including chat/completions capabilities alongside tensor-based). If such mixed modes are allowed, those paths might still rely on HF artifacts and could be affected by this change.Can you confirm that
supports_tensor()is only true for cases where skipping all config/tokenizer downloads is always safe (i.e., pure TensorBased deployments), or otherwise narrow this check (e.g., to a specific variant) if mixed modes exist?lib/bindings/python/rust/lib.rs (1)
140-148: Module registration forregister_modelaligns with existing bindingsAdding
wrap_pyfunction!(register_model, m)?next toregister_llmkeeps the Python surface consistent; no concerns here.lib/bindings/python/src/dynamo/_core.pyi (1)
1074-1099:register_modelstub correctly mirrors the Rust bindingThe async signature and parameter ordering here match the PyO3 definition, and the docstring (TensorBased/Tensor defaults, no HuggingFace downloads) aligns with the Rust implementation. This should type-check cleanly for callers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is already support for tensor based models via register_llm, but maybe it would take some tweaking to skip tokenizer bits when given a tensor based model. Not sure if a whole new register_model function is needed, or if register_llm should just be renamed to register_model with some kind of LLM/tokenizer/HF related flag as an argument?
In general the python bindings are like our public facing APIs and will be more sticky once released.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, skipping the all of the HuggingFace and config file download stuff is the main motivation here. Right now to deploy a tensor based model with register_llm you still have to pass a dummy hugging face model that dynamo will download and then do nothing with.
No strong opinion on supporting this via a new function vs. renaming the old one and adding an argument
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Modified some of the test files to illustrate the change here
This commit introduces the `register_model` function, allowing users to register non-llm model endpoints without requiring local files or downloads from HuggingFace. The function is designed specifically for TensorBased models, where the frontend doesn't do any pre-processing. Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
2c86069 to
1700a36
Compare
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Overview:
This commit introduces the
register_modelfunction, allowing users to register non-llm model endpoints without requiring local files or downloads from HuggingFace. The function is designed specifically for TensorBased models, where the frontend doesn't do any pre-processing.Details:
download_configMDC function so that TensorBased models don't download anythingWhere should the reviewer start?
lib/bindings/python/rust/lib.rsRelated Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)
Summary by CodeRabbit
register_modelfunction to register models to endpoints with support for optional model type, input format, user data, and runtime configuration parameters.✏️ Tip: You can customize this high-level summary in your review settings.