feat: add RuntimeConfig to ModelEntry #2311

jorgeantonio21 · 2025-08-05T19:25:13Z

Overview:

This PR implements runtime configuration registration for vLLM and SGLang backends, adding a middle layer between static configs and dynamic metrics. Runtime configs are computed during engine initialization but remain constant during the session, providing accurate resource information for better routing and resource management.

Details:

New Features:

Added ModelRuntimeConfig structure to store runtime-computed values like total_kv_blocks, max_num_seqs, and gpu_memory_utilization
The ModelRuntimeConfig is added as a new field to ModelEntry, so it would get registered to etcd on LocalModel.attach(), a shared routine.
The KvRouter will have a new prefix watcher on models/ to listen to the runtime configs of all workers in a push-based manner, in order to update its knowledge of the total load capacity of each worker. (Note that this is not yet used for request rejection yet, to be implemented in a future PR.)

vLLM Implementation:

Extracts actual KV blocks from engine_client.engine.cache_config.num_gpu_blocks
Gets max sequences from engine_client.vllm_config.scheduler_config.max_num_seqs
Computes GPU memory utilization from engine_client.engine.cache_config.gpu_memory_utilization
Registers runtime config after engine initialization, only for rank 0 workers

SGLang Implementation:

Uses engine.get_server_info() to get actual computed values from SGLang engine
Implements retry logic with exponential backoff for engine readiness
Runs runtime config registration asynchronously without blocking main initialization
Handles both aggregated and disaggregated worker modes

Mocker Implementation

total_kv_blocks and max_num_seqs are passed in as extra_engine_args for the mocker engine, so on LocalModelBuilder.build(), they will override the runtime config

Where should the reviewer start?

Core Implementation:
- lib/llm/src/local_model/runtime_config.rs - New runtime config structure
- lib/llm/src/local_model.rs - register the runtime config as part of the ModelEntry on attach()
- lib/llm/src/kv_router/scheduler.rs - handles both instance and model watchers
vLLM Integration:
- components/backends/vllm/src/dynamo/vllm/main.py
SGLang Integration:
- components/backends/sglang/src/dynamo/sglang/worker/main.py

Related Issues:

closes GitHub issue: [FEATURE]: register runtime initialized parameters as metadata #2058

Summary by CodeRabbit

New Features
- Introduced support for registering and managing model runtime configuration, including parameters such as total key-value cache blocks, maximum concurrent sequences, and GPU memory utilization.
- Added the ability to view and update runtime configuration details for deployed models through the Python and Rust APIs.
- Enhanced model deployment cards to display runtime metrics and allow runtime configuration updates during the worker session.
Bug Fixes
- Ensured runtime configuration fields are initialized properly during model deployment card creation.

copy-pr-bot · 2025-08-05T19:25:16Z

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

github-actions · 2025-08-05T19:25:21Z

👋 Hi jorgeantonio21! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

coderabbitai · 2025-08-05T19:30:47Z

Walkthrough

This change introduces a new ModelRuntimeConfig struct and associated registration logic throughout the Rust backend, Python bindings, and engine worker code. It enables asynchronous registration and updating of runtime configuration parameters—such as memory usage and batching limits—within model deployment cards, with integration for both SGLang and vLLM backends. New Python bindings and Rust APIs are added to manage and expose these runtime settings.

Changes

Cohort / File(s)	Change Summary
SGLang Backend Integration `components/backends/sglang/src/dynamo/sglang/worker/main.py`	Adds async function to gather runtime config from engine and register it with endpoint, including retry logic and integration into worker initialization.
vLLM Backend Integration `components/backends/vllm/src/dynamo/vllm/main.py`	Adds logic to extract runtime parameters from engine and register them with the endpoint at startup for rank 0 workers.
Python Bindings: Core API `lib/bindings/python/rust/lib.rs`, `lib/bindings/python/src/dynamo/_core.pyi`, `lib/bindings/python/src/dynamo/llm/__init__.py`	Exposes `ModelRuntimeConfig` class and `register_runtime_config` async function to Python, with Rust implementation for updating deployment cards via etcd. Adds relevant imports and type hints.
Python Bindings: Model Card Extensions `lib/bindings/python/rust/llm/model_card.rs`	Adds Python class for `ModelRuntimeConfig`, methods for engine-specific data, and extends `ModelDeploymentCard` with runtime config registration and accessors.
Rust Model Card: Runtime Config Support `lib/llm/src/model_card.rs`, `lib/llm/src/model_card/model.rs`, `lib/llm/src/model_card/runtime_config.rs`, `lib/llm/src/model_card/create.rs`	Introduces `ModelRuntimeConfig` struct and module, adds it as an optional field to `ModelDeploymentCard`, and implements registration, update, and getter methods. Ensures initialization in model card creation.
Rust LocalModel Integration `lib/llm/src/local_model.rs`	Adds runtime config to builder and model, with methods to register config with endpoint and update deployment card in etcd.

Sequence Diagram(s)

sequenceDiagram
    participant Engine
    participant Worker
    participant Endpoint
    participant Etcd

    Worker->>Engine: Query runtime parameters
    Worker->>Worker: Build ModelRuntimeConfig
    Worker->>Endpoint: register_runtime_config(model_name, config)
    Endpoint->>Etcd: Update ModelDeploymentCard with runtime config
    Etcd-->>Endpoint: Ack
    Endpoint-->>Worker: Ack

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related issues

[FEATURE]: register runtime initialized parameters as metadata #2058: Implements the requested feature by introducing ModelRuntimeConfig, registration methods, and backend integration for runtime-initialized parameters.

Poem

In the warren of code, new configs appear,
Models declare what they need, loud and clear.
With engines and endpoints now chatting away,
Runtime secrets are shared, come what may.
Etcd keeps watch with a digital grin—
The bunnies rejoice: let the deployments begin! 🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Explain this complex logic.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
- @coderabbitai explain this code block.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read src/utils.ts and explain its main purpose.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

@coderabbitai pause to pause the reviews on a PR.
@coderabbitai resume to resume the paused reviews.
@coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
@coderabbitai full review to do a full review from scratch and review all the files again.
@coderabbitai summary to regenerate the summary of the PR.
@coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
@coderabbitai resolve resolve all the CodeRabbit review comments.
@coderabbitai configuration to show the current CodeRabbit configuration for the repository.
@coderabbitai help to get help.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

components/backends/sglang/src/dynamo/sglang/worker/main.py

…eantonio21/dynamo into feat/ja/runtime-configs-mdc

Co-authored-by: Yan Ru Pei <yanrpei@gmail.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

jorgeantonio21 added 3 commits August 5, 2025 13:28

first commit

b1e6eb4

register runtime config after engine initialization

8ffe717

add sglang runtime config values retrieval

58d73d2

jorgeantonio21 requested review from GuanLuo, alec-flowers, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners August 5, 2025 19:25

jorgeantonio21 requested review from a team and PeaBrane as code owners August 5, 2025 19:25

pull-request-size bot added the size/L label Aug 5, 2025

github-actions bot added external-contribution Pull request is from an external contributor feat labels Aug 5, 2025

jorgeantonio21 marked this pull request as draft August 5, 2025 19:25

merge main and resolve conflicts

dfc9154

PeaBrane changed the title ~~feat: add runtime config/metadata to etcd~~ feat: add RuntimeConfig to ModelEntry Aug 13, 2025

PeaBrane assigned jorgeantonio21 and PeaBrane Aug 13, 2025

PeaBrane linked an issue Aug 13, 2025 that may be closed by this pull request

[FEATURE]: register runtime initialized parameters as metadata #2058

Closed

PeaBrane requested a review from kthui August 13, 2025 03:28

tensorrtllm support (vibe coded)

8004bbd

pull-request-size bot added size/XL and removed size/L labels Aug 13, 2025

PeaBrane added 6 commits August 12, 2025 20:57

max_num_batched_tokens instead

ef3d419

fix sglang server_info args

6842494

direct access to server_Args

3b175cf

sglang: access total num tokens via scheduler info

10773c7

isort

69d5d80

trtllm: extract directly from config

e6de5a2

pull-request-size bot added size/L and removed size/XL labels Aug 13, 2025

trtllm: get total_kv_blocks from get_stats_async

09f1cb0

pull-request-size bot added size/XL and removed size/L labels Aug 13, 2025

PeaBrane approved these changes Aug 13, 2025

View reviewed changes

jorgeantonio21 commented Aug 13, 2025

View reviewed changes

components/backends/sglang/src/dynamo/sglang/worker/main.py Outdated Show resolved Hide resolved

jorgeantonio21 added 3 commits August 13, 2025 19:06

Merge branch 'feat/ja/runtime-configs-mdc' of https://github.com/jorg…

36a6fbb

…eantonio21/dynamo into feat/ja/runtime-configs-mdc

ceil division for sglang total_kv_blocks calculation

280e98a

hooks

3f8bcdd

PeaBrane merged commit d0a6363 into ai-dynamo:main Aug 14, 2025
10 checks passed

coderabbitai bot mentioned this pull request Aug 25, 2025

feat: enable --dyn-reasoning-parser flag to set reasoning parser for … #2700

Merged

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

feat: add RuntimeConfig to ModelEntry (#2311)

a96f631

Co-authored-by: Yan Ru Pei <yanrpei@gmail.com> Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

coderabbitai bot mentioned this pull request Sep 2, 2025

refactor: Split ModelType to ModelInput for request and response type; ModelType for the supported workloads #2714

Merged

This was referenced Sep 16, 2025

feat: Make part of discovery re-usable #3073

Merged

feat: add NIM FE (num_request_max) + runtime config metrics with periodic polling #3107

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add RuntimeConfig to ModelEntry #2311

feat: add RuntimeConfig to ModelEntry #2311

Uh oh!

jorgeantonio21 commented Aug 5, 2025 •

edited by PeaBrane

Loading

Uh oh!

copy-pr-bot bot commented Aug 5, 2025

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

coderabbitai bot commented Aug 5, 2025

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

feat: add RuntimeConfig to ModelEntry #2311

feat: add RuntimeConfig to ModelEntry #2311

Uh oh!

Conversation

jorgeantonio21 commented Aug 5, 2025 • edited by PeaBrane Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Summary by CodeRabbit

Uh oh!

copy-pr-bot bot commented Aug 5, 2025

Uh oh!

github-actions bot commented Aug 5, 2025

Uh oh!

coderabbitai bot commented Aug 5, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related issues

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR comments)

Other keywords and placeholders

Documentation and Community

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jorgeantonio21 commented Aug 5, 2025 •

edited by PeaBrane

Loading