feat: add register_model function for non-llms #4686

nealvaidya · 2025-12-02T01:27:29Z

Overview:

This commit introduces the register_model function, allowing users to register non-llm model endpoints without requiring local files or downloads from HuggingFace. The function is designed specifically for TensorBased models, where the frontend doesn't do any pre-processing.

Details:

Adds register_model function and bindings
Modifies the download_config MDC function so that TensorBased models don't download anything

Where should the reviewer start?

lib/bindings/python/rust/lib.rs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Relates to feat: tensor type for generic inference. #2746

Summary by CodeRabbit

New Features
- Added a new register_model function to register models to endpoints with support for optional model type, input format, user data, and runtime configuration parameters.

_{✏️ Tip: You can customize this high-level summary in your review settings.}

coderabbitai · 2025-12-02T01:34:30Z

Walkthrough

The changes introduce a new register_model function across the Rust FFI layer, Python type stubs, and module exports, enabling model registration without local files or HuggingFace downloads. Additionally, TensorBased models are optimized to skip downloading configuration files during card initialization.

Changes

Cohort / File(s)	Summary
Rust FFI Layer `lib/bindings/python/rust/lib.rs`	Introduces new PyO3-exposed `register_model` function accepting endpoint, model_name, optional model_type, model_input, user_data, and runtime_config parameters. Constructs minimal ModelDeploymentCard and registers it asynchronously through endpoint's discovery system.
Python Type Definitions `lib/bindings/python/src/dynamo/_core.pyi`	Adds public async function signature for `register_model` with endpoint, model_name, and optional parameters (model_type, model_input, user_data, runtime_config). Includes descriptive docstring documenting registration without local files.
Python Module Exports `lib/bindings/python/src/dynamo/llm/__init__.py`	Imports and exposes `register_model` from `dynamo._core` in module namespace.
Model Deployment Optimization `lib/llm/src/model_card.rs`	Adds early-return branch in `ModelDeploymentCard::download_config` to skip configuration file downloads for TensorBased models, with debug logging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Straightforward API introduction across coordinated layers
Minimal logic changes; primarily declarations and a simple early-return branch
Consistent pattern reduces per-file reasoning overhead
No complex state mutations or intricate control flow

Poem

🐰 A new register_model hops into sight,
Through Rust and Python, bindings held tight,
No config downloads for TensorBased friends,
Model registration that never descends! 🌱

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title clearly and concisely summarizes the main change: adding a register_model function for non-LLM models, which aligns with all file modifications.
Description check	✅ Passed	The description covers all required sections from the template: Overview, Details, Where should the reviewer start, and Related Issues with proper action keyword.
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 0

🧹 Nitpick comments (1)

lib/bindings/python/rust/lib.rs (1)

360-428: register_model binding matches stub and intended behavior

The new register_model:

Mirrors the async pattern used by register_llm (Tokio future_into_py, returns an awaitable that resolves to None).

Applies sensible defaults (TensorBased + Tensor) consistent with the Python stub and docstring.

Builds a minimal ModelDeploymentCard via with_name_only, fills model_type, model_input, user_data, and optionally runtime_config, then registers via DiscoverySpec::from_model, matching the “no downloads” contract.

One optional hardening you might consider (non-blocking):

If this API is meant strictly for tensor-only deployments, you could defensively reject model_type values that do not support tensors (or at least log a warning) to catch accidental misuse where someone passes a pure Chat/Completions type but expects HF-style behavior.

Otherwise, this binding looks solid and cohesive with the rest of the module.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5708b70 and 4917882.

📒 Files selected for processing (4)

lib/bindings/python/rust/lib.rs (2 hunks)
lib/bindings/python/src/dynamo/_core.pyi (1 hunks)
lib/bindings/python/src/dynamo/llm/__init__.py (1 hunks)
lib/llm/src/model_card.rs (1 hunks)

🧰 Additional context used

🧠 Learnings (2)

📚 Learning: 2025-08-21T17:23:02.836Z

Learnt from: michaelfeil
Repo: ai-dynamo/dynamo PR: 2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.

Applied to files:

lib/bindings/python/rust/lib.rs

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/bindings/python/rust/lib.rs
lib/llm/src/model_card.rs

🧬 Code graph analysis (4)

lib/bindings/python/src/dynamo/llm/__init__.py (2)

lib/bindings/python/rust/lib.rs (2)

_core (127-206)

register_model (374-428)

lib/bindings/python/src/dynamo/_core.pyi (1)

register_model (1074-1099)

lib/bindings/python/rust/lib.rs (5)

lib/bindings/python/rust/prometheus_metrics.rs (9)

m (1004-1004)

m (1005-1005)

m (1006-1006)

m (1007-1007)

m (1008-1008)

m (1009-1009)

m (1010-1010)

m (1011-1011)

m (1012-1012)

lib/bindings/python/src/dynamo/_core.pyi (7)

register_model (1074-1099)

endpoint (104-108)

Endpoint (120-161)

ModelType (1007-1014)

ModelInput (1003-1005)

ModelRuntimeConfig (426-447)

ModelDeploymentCard (419-424)

lib/llm/src/model_card.rs (3)

model_type (570-570)

model_type (753-755)

with_name_only (243-249)

lib/bindings/python/rust/llm/kv.rs (16)

py (68-68)

py (1270-1270)

new (45-51)

new (120-132)

new (143-154)

new (172-195)

new (242-269)

new (390-418)

new (647-683)

new (737-773)

new (825-885)

new (955-965)

new (972-984)

new (991-1003)

new (1010-1024)

new (1098-1135)

lib/runtime/src/discovery/mod.rs (1)

from_model (88-104)

lib/llm/src/model_card.rs (1)

lib/llm/src/local_model.rs (1)

display_name (360-362)

lib/bindings/python/src/dynamo/_core.pyi (2)

lib/bindings/python/rust/lib.rs (2)

register_model (374-428)

endpoint (650-656)

lib/llm/src/local_model.rs (4)

model_name (97-100)

user_data (178-181)

runtime_config (173-176)

runtime_config (394-396)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)

GitHub Check: trtllm (amd64)
GitHub Check: trtllm (arm64)
GitHub Check: operator (arm64)
GitHub Check: sglang (arm64)
GitHub Check: vllm (arm64)
GitHub Check: sglang (amd64)
GitHub Check: operator (amd64)
GitHub Check: vllm (amd64)
GitHub Check: tests (launch/dynamo-run)
GitHub Check: Build and Test - dynamo
GitHub Check: tests (.)
GitHub Check: tests (lib/bindings/python)

🔇 Additional comments (4)

lib/bindings/python/src/dynamo/llm/__init__.py (1)

43-43: Expose register_model in llm namespace – looks correct

Importing register_model from dynamo._core is consistent with how register_llm and other bindings are surfaced; no issues spotted here.

lib/llm/src/model_card.rs (1)

381-395: Confirm breadth of supports_tensor() guard in download_config

The new early-return for self.model_type.supports_tensor() means any tensor-capable model will now skip HuggingFace config/tokenizer downloads, even if model_type might be a bitflag combination (e.g., including chat/completions capabilities alongside tensor-based). If such mixed modes are allowed, those paths might still rely on HF artifacts and could be affected by this change.

Can you confirm that supports_tensor() is only true for cases where skipping all config/tokenizer downloads is always safe (i.e., pure TensorBased deployments), or otherwise narrow this check (e.g., to a specific variant) if mixed modes exist?

lib/bindings/python/rust/lib.rs (1)

140-148: Module registration for register_model aligns with existing bindings

Adding wrap_pyfunction!(register_model, m)? next to register_llm keeps the Python surface consistent; no concerns here.

lib/bindings/python/src/dynamo/_core.pyi (1)

1074-1099: register_model stub correctly mirrors the Rust binding

The async signature and parameter ordering here match the PyO3 definition, and the docstring (TensorBased/Tensor defaults, no HuggingFace downloads) aligns with the Rust implementation. This should type-check cleanly for callers.

rmccorm4 · 2025-12-02T01:38:13Z

lib/bindings/python/rust/lib.rs

There is already support for tensor based models via register_llm, but maybe it would take some tweaking to skip tokenizer bits when given a tensor based model. Not sure if a whole new register_model function is needed, or if register_llm should just be renamed to register_model with some kind of LLM/tokenizer/HF related flag as an argument?

In general the python bindings are like our public facing APIs and will be more sticky once released.

CC @GuanLuo @grahamking

Yeah, skipping the all of the HuggingFace and config file download stuff is the main motivation here. Right now to deploy a tensor based model with register_llm you still have to pass a dummy hugging face model that dynamo will download and then do nothing with.

No strong opinion on supporting this via a new function vs. renaming the old one and adding an argument

Modified some of the test files to illustrate the change here

This commit introduces the `register_model` function, allowing users to register non-llm model endpoints without requiring local files or downloads from HuggingFace. The function is designed specifically for TensorBased models, where the frontend doesn't do any pre-processing. Signed-off-by: Neal Vaidya <nealv@nvidia.com>

Signed-off-by: Neal Vaidya <nealv@nvidia.com>

nealvaidya requested review from a team as code owners December 2, 2025 01:27

pull-request-size bot added the size/L label Dec 2, 2025

github-actions bot added the feat label Dec 2, 2025

coderabbitai bot reviewed Dec 2, 2025

View reviewed changes

rmccorm4 reviewed Dec 2, 2025

View reviewed changes

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 01:58 Failure

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 19:56 Failure

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 21:35 Failure

pull-request-size bot added size/M and removed size/L labels Dec 2, 2025

nealvaidya added 5 commits December 2, 2025 21:37

validate model and input types

d08440b

Signed-off-by: Neal Vaidya <nealv@nvidia.com>

use register_model in tensor tests

bce1f1a

Signed-off-by: Neal Vaidya <nealv@nvidia.com>

fix: pull register_model logic back into register_llm

d3e4eeb

Signed-off-by: Neal Vaidya <nealv@nvidia.com>

fix rebase conflicts

1700a36

Signed-off-by: Neal Vaidya <nealv@nvidia.com>

nealvaidya force-pushed the nealv/register_model branch from 2c86069 to 1700a36 Compare December 2, 2025 21:46

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 21:46 Failure

fix rebase issue

e4f17c2

Signed-off-by: Neal Vaidya <nealv@nvidia.com>

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 22:01 Failure

fix formatting issue

988644b

Signed-off-by: Neal Vaidya <nealv@nvidia.com>

copy-pr-bot bot had a problem deploying to GITLAB December 2, 2025 22:09 Failure

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add register_model function for non-llms #4686

feat: add register_model function for non-llms #4686

Uh oh!

nealvaidya commented Dec 2, 2025 •

edited by coderabbitai bot

Loading

Uh oh!

coderabbitai bot commented Dec 2, 2025

Uh oh!

coderabbitai bot left a comment

Uh oh!

rmccorm4 Dec 2, 2025

Uh oh!

nealvaidya Dec 2, 2025

Uh oh!

nealvaidya Dec 2, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

feat: add register_model function for non-llms #4686

Are you sure you want to change the base?

feat: add register_model function for non-llms #4686

Uh oh!

Conversation

nealvaidya commented Dec 2, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Dec 2, 2025

Walkthrough

Changes

Estimated code review effort

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

rmccorm4 Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

nealvaidya Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

nealvaidya Dec 2, 2025

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

nealvaidya commented Dec 2, 2025 •

edited by coderabbitai bot

Loading