Skip to content

Conversation

@nealvaidya
Copy link
Contributor

@nealvaidya nealvaidya commented Dec 2, 2025

Overview:

This commit introduces the register_model function, allowing users to register non-llm model endpoints without requiring local files or downloads from HuggingFace. The function is designed specifically for TensorBased models, where the frontend doesn't do any pre-processing.

Details:

  • Adds register_model function and bindings
  • Modifies the download_config MDC function so that TensorBased models don't download anything

Where should the reviewer start?

  • lib/bindings/python/rust/lib.rs

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features
    • Added a new register_model function to register models to endpoints with support for optional model type, input format, user data, and runtime configuration parameters.

✏️ Tip: You can customize this high-level summary in your review settings.

@nealvaidya nealvaidya requested review from a team as code owners December 2, 2025 01:27
@github-actions github-actions bot added the feat label Dec 2, 2025
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Dec 2, 2025

Walkthrough

The changes introduce a new register_model function across the Rust FFI layer, Python type stubs, and module exports, enabling model registration without local files or HuggingFace downloads. Additionally, TensorBased models are optimized to skip downloading configuration files during card initialization.

Changes

Cohort / File(s) Summary
Rust FFI Layer
lib/bindings/python/rust/lib.rs
Introduces new PyO3-exposed register_model function accepting endpoint, model_name, optional model_type, model_input, user_data, and runtime_config parameters. Constructs minimal ModelDeploymentCard and registers it asynchronously through endpoint's discovery system.
Python Type Definitions
lib/bindings/python/src/dynamo/_core.pyi
Adds public async function signature for register_model with endpoint, model_name, and optional parameters (model_type, model_input, user_data, runtime_config). Includes descriptive docstring documenting registration without local files.
Python Module Exports
lib/bindings/python/src/dynamo/llm/__init__.py
Imports and exposes register_model from dynamo._core in module namespace.
Model Deployment Optimization
lib/llm/src/model_card.rs
Adds early-return branch in ModelDeploymentCard::download_config to skip configuration file downloads for TensorBased models, with debug logging.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

  • Straightforward API introduction across coordinated layers
  • Minimal logic changes; primarily declarations and a simple early-return branch
  • Consistent pattern reduces per-file reasoning overhead
  • No complex state mutations or intricate control flow

Poem

🐰 A new register_model hops into sight,
Through Rust and Python, bindings held tight,
No config downloads for TensorBased friends,
Model registration that never descends! 🌱

Pre-merge checks

✅ Passed checks (3 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly and concisely summarizes the main change: adding a register_model function for non-LLM models, which aligns with all file modifications.
Description check ✅ Passed The description covers all required sections from the template: Overview, Details, Where should the reviewer start, and Related Issues with proper action keyword.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (1)
lib/bindings/python/rust/lib.rs (1)

360-428: register_model binding matches stub and intended behavior

The new register_model:

  • Mirrors the async pattern used by register_llm (Tokio future_into_py, returns an awaitable that resolves to None).
  • Applies sensible defaults (TensorBased + Tensor) consistent with the Python stub and docstring.
  • Builds a minimal ModelDeploymentCard via with_name_only, fills model_type, model_input, user_data, and optionally runtime_config, then registers via DiscoverySpec::from_model, matching the “no downloads” contract.

One optional hardening you might consider (non-blocking):

  • If this API is meant strictly for tensor-only deployments, you could defensively reject model_type values that do not support tensors (or at least log a warning) to catch accidental misuse where someone passes a pure Chat/Completions type but expects HF-style behavior.

Otherwise, this binding looks solid and cohesive with the rest of the module.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5708b70 and 4917882.

📒 Files selected for processing (4)
  • lib/bindings/python/rust/lib.rs (2 hunks)
  • lib/bindings/python/src/dynamo/_core.pyi (1 hunks)
  • lib/bindings/python/src/dynamo/llm/__init__.py (1 hunks)
  • lib/llm/src/model_card.rs (1 hunks)
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-08-21T17:23:02.836Z
Learnt from: michaelfeil
Repo: ai-dynamo/dynamo PR: 2591
File: lib/bindings/python/rust/http.rs:0-0
Timestamp: 2025-08-21T17:23:02.836Z
Learning: In lib/bindings/python/rust/http.rs, the enable_endpoint method uses EndpointType::all() to dynamically support all available endpoint types with case-insensitive matching, which is more maintainable than hardcoded match statements for endpoint type mappings.

Applied to files:

  • lib/bindings/python/rust/lib.rs
📚 Learning: 2025-09-02T16:46:54.015Z
Learnt from: GuanLuo
Repo: ai-dynamo/dynamo PR: 2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

  • lib/bindings/python/rust/lib.rs
  • lib/llm/src/model_card.rs
🧬 Code graph analysis (4)
lib/bindings/python/src/dynamo/llm/__init__.py (2)
lib/bindings/python/rust/lib.rs (2)
  • _core (127-206)
  • register_model (374-428)
lib/bindings/python/src/dynamo/_core.pyi (1)
  • register_model (1074-1099)
lib/bindings/python/rust/lib.rs (5)
lib/bindings/python/rust/prometheus_metrics.rs (9)
  • m (1004-1004)
  • m (1005-1005)
  • m (1006-1006)
  • m (1007-1007)
  • m (1008-1008)
  • m (1009-1009)
  • m (1010-1010)
  • m (1011-1011)
  • m (1012-1012)
lib/bindings/python/src/dynamo/_core.pyi (7)
  • register_model (1074-1099)
  • endpoint (104-108)
  • Endpoint (120-161)
  • ModelType (1007-1014)
  • ModelInput (1003-1005)
  • ModelRuntimeConfig (426-447)
  • ModelDeploymentCard (419-424)
lib/llm/src/model_card.rs (3)
  • model_type (570-570)
  • model_type (753-755)
  • with_name_only (243-249)
lib/bindings/python/rust/llm/kv.rs (16)
  • py (68-68)
  • py (1270-1270)
  • new (45-51)
  • new (120-132)
  • new (143-154)
  • new (172-195)
  • new (242-269)
  • new (390-418)
  • new (647-683)
  • new (737-773)
  • new (825-885)
  • new (955-965)
  • new (972-984)
  • new (991-1003)
  • new (1010-1024)
  • new (1098-1135)
lib/runtime/src/discovery/mod.rs (1)
  • from_model (88-104)
lib/llm/src/model_card.rs (1)
lib/llm/src/local_model.rs (1)
  • display_name (360-362)
lib/bindings/python/src/dynamo/_core.pyi (2)
lib/bindings/python/rust/lib.rs (2)
  • register_model (374-428)
  • endpoint (650-656)
lib/llm/src/local_model.rs (4)
  • model_name (97-100)
  • user_data (178-181)
  • runtime_config (173-176)
  • runtime_config (394-396)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (12)
  • GitHub Check: trtllm (amd64)
  • GitHub Check: trtllm (arm64)
  • GitHub Check: operator (arm64)
  • GitHub Check: sglang (arm64)
  • GitHub Check: vllm (arm64)
  • GitHub Check: sglang (amd64)
  • GitHub Check: operator (amd64)
  • GitHub Check: vllm (amd64)
  • GitHub Check: tests (launch/dynamo-run)
  • GitHub Check: Build and Test - dynamo
  • GitHub Check: tests (.)
  • GitHub Check: tests (lib/bindings/python)
🔇 Additional comments (4)
lib/bindings/python/src/dynamo/llm/__init__.py (1)

43-43: Expose register_model in llm namespace – looks correct

Importing register_model from dynamo._core is consistent with how register_llm and other bindings are surfaced; no issues spotted here.

lib/llm/src/model_card.rs (1)

381-395: Confirm breadth of supports_tensor() guard in download_config

The new early-return for self.model_type.supports_tensor() means any tensor-capable model will now skip HuggingFace config/tokenizer downloads, even if model_type might be a bitflag combination (e.g., including chat/completions capabilities alongside tensor-based). If such mixed modes are allowed, those paths might still rely on HF artifacts and could be affected by this change.

Can you confirm that supports_tensor() is only true for cases where skipping all config/tokenizer downloads is always safe (i.e., pure TensorBased deployments), or otherwise narrow this check (e.g., to a specific variant) if mixed modes exist?

lib/bindings/python/rust/lib.rs (1)

140-148: Module registration for register_model aligns with existing bindings

Adding wrap_pyfunction!(register_model, m)? next to register_llm keeps the Python surface consistent; no concerns here.

lib/bindings/python/src/dynamo/_core.pyi (1)

1074-1099: register_model stub correctly mirrors the Rust binding

The async signature and parameter ordering here match the PyO3 definition, and the docstring (TensorBased/Tensor defaults, no HuggingFace downloads) aligns with the Rust implementation. This should type-check cleanly for callers.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is already support for tensor based models via register_llm, but maybe it would take some tweaking to skip tokenizer bits when given a tensor based model. Not sure if a whole new register_model function is needed, or if register_llm should just be renamed to register_model with some kind of LLM/tokenizer/HF related flag as an argument?

In general the python bindings are like our public facing APIs and will be more sticky once released.

CC @GuanLuo @grahamking

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, skipping the all of the HuggingFace and config file download stuff is the main motivation here. Right now to deploy a tensor based model with register_llm you still have to pass a dummy hugging face model that dynamo will download and then do nothing with.

No strong opinion on supporting this via a new function vs. renaming the old one and adding an argument

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified some of the test files to illustrate the change here

This commit introduces the `register_model` function, allowing users to register non-llm model endpoints without requiring local files or downloads from HuggingFace. The function is designed specifically for TensorBased models, where the frontend doesn't do any pre-processing.

Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Signed-off-by: Neal Vaidya <nealv@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants