Skip to content

Conversation

@jorgeantonio21
Copy link
Contributor

@jorgeantonio21 jorgeantonio21 commented Aug 5, 2025

Overview:

This PR implements runtime configuration registration for vLLM and SGLang backends, adding a middle layer between static configs and dynamic metrics. Runtime configs are computed during engine initialization but remain constant during the session, providing accurate resource information for better routing and resource management.

Details:

New Features:

  • Added ModelRuntimeConfig structure to store runtime-computed values like total_kv_blocks, max_num_seqs, and gpu_memory_utilization
  • The ModelRuntimeConfig is added as a new field to ModelEntry, so it would get registered to etcd on LocalModel.attach(), a shared routine.
  • The KvRouter will have a new prefix watcher on models/ to listen to the runtime configs of all workers in a push-based manner, in order to update its knowledge of the total load capacity of each worker. (Note that this is not yet used for request rejection yet, to be implemented in a future PR.)

vLLM Implementation:

  • Extracts actual KV blocks from engine_client.engine.cache_config.num_gpu_blocks
  • Gets max sequences from engine_client.vllm_config.scheduler_config.max_num_seqs
  • Computes GPU memory utilization from engine_client.engine.cache_config.gpu_memory_utilization
  • Registers runtime config after engine initialization, only for rank 0 workers

SGLang Implementation:

  • Uses engine.get_server_info() to get actual computed values from SGLang engine
  • Implements retry logic with exponential backoff for engine readiness
  • Runs runtime config registration asynchronously without blocking main initialization
  • Handles both aggregated and disaggregated worker modes

Mocker Implementation

  • total_kv_blocks and max_num_seqs are passed in as extra_engine_args for the mocker engine, so on LocalModelBuilder.build(), they will override the runtime config

Where should the reviewer start?

  1. Core Implementation:

    • lib/llm/src/local_model/runtime_config.rs - New runtime config structure
    • lib/llm/src/local_model.rs - register the runtime config as part of the ModelEntry on attach()
    • lib/llm/src/kv_router/scheduler.rs - handles both instance and model watchers
  2. vLLM Integration:

    • components/backends/vllm/src/dynamo/vllm/main.py
  3. SGLang Integration:

    • components/backends/sglang/src/dynamo/sglang/worker/main.py

Related Issues:

Summary by CodeRabbit

  • New Features

    • Introduced support for registering and managing model runtime configuration, including parameters such as total key-value cache blocks, maximum concurrent sequences, and GPU memory utilization.
    • Added the ability to view and update runtime configuration details for deployed models through the Python and Rust APIs.
    • Enhanced model deployment cards to display runtime metrics and allow runtime configuration updates during the worker session.
  • Bug Fixes

    • Ensured runtime configuration fields are initialized properly during model deployment card creation.

@copy-pr-bot
Copy link

copy-pr-bot bot commented Aug 5, 2025

This pull request requires additional validation before any workflows can run on NVIDIA's runners.

Pull request vetters can view their responsibilities here.

Contributors can view more details about this message here.

@jorgeantonio21 jorgeantonio21 requested review from a team and PeaBrane as code owners August 5, 2025 19:25
@github-actions
Copy link

github-actions bot commented Aug 5, 2025

👋 Hi jorgeantonio21! Thank you for contributing to ai-dynamo/dynamo.

Just a reminder: The NVIDIA Test Github Validation CI runs an essential subset of the testing framework to quickly catch errors.Your PR reviewers may elect to test the changes comprehensively before approving your changes.

🚀

@github-actions github-actions bot added external-contribution Pull request is from an external contributor feat labels Aug 5, 2025
@jorgeantonio21 jorgeantonio21 marked this pull request as draft August 5, 2025 19:25
@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 5, 2025

Walkthrough

This change introduces a new ModelRuntimeConfig struct and associated registration logic throughout the Rust backend, Python bindings, and engine worker code. It enables asynchronous registration and updating of runtime configuration parameters—such as memory usage and batching limits—within model deployment cards, with integration for both SGLang and vLLM backends. New Python bindings and Rust APIs are added to manage and expose these runtime settings.

Changes

Cohort / File(s) Change Summary
SGLang Backend Integration
components/backends/sglang/src/dynamo/sglang/worker/main.py
Adds async function to gather runtime config from engine and register it with endpoint, including retry logic and integration into worker initialization.
vLLM Backend Integration
components/backends/vllm/src/dynamo/vllm/main.py
Adds logic to extract runtime parameters from engine and register them with the endpoint at startup for rank 0 workers.
Python Bindings: Core API
lib/bindings/python/rust/lib.rs, lib/bindings/python/src/dynamo/_core.pyi, lib/bindings/python/src/dynamo/llm/__init__.py
Exposes ModelRuntimeConfig class and register_runtime_config async function to Python, with Rust implementation for updating deployment cards via etcd. Adds relevant imports and type hints.
Python Bindings: Model Card Extensions
lib/bindings/python/rust/llm/model_card.rs
Adds Python class for ModelRuntimeConfig, methods for engine-specific data, and extends ModelDeploymentCard with runtime config registration and accessors.
Rust Model Card: Runtime Config Support
lib/llm/src/model_card.rs, lib/llm/src/model_card/model.rs, lib/llm/src/model_card/runtime_config.rs, lib/llm/src/model_card/create.rs
Introduces ModelRuntimeConfig struct and module, adds it as an optional field to ModelDeploymentCard, and implements registration, update, and getter methods. Ensures initialization in model card creation.
Rust LocalModel Integration
lib/llm/src/local_model.rs
Adds runtime config to builder and model, with methods to register config with endpoint and update deployment card in etcd.

Sequence Diagram(s)

sequenceDiagram
    participant Engine
    participant Worker
    participant Endpoint
    participant Etcd

    Worker->>Engine: Query runtime parameters
    Worker->>Worker: Build ModelRuntimeConfig
    Worker->>Endpoint: register_runtime_config(model_name, config)
    Endpoint->>Etcd: Update ModelDeploymentCard with runtime config
    Etcd-->>Endpoint: Ack
    Endpoint-->>Worker: Ack
Loading

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~40 minutes

Possibly related issues

Poem

In the warren of code, new configs appear,
Models declare what they need, loud and clear.
With engines and endpoints now chatting away,
Runtime secrets are shared, come what may.
Etcd keeps watch with a digital grin—
The bunnies rejoice: let the deployments begin! 🐇✨

Note

⚡️ Unit Test Generation is now available in beta!

Learn more here, or try it out under "Finishing Touches" below.


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Explain this complex logic.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai explain this code block.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and explain its main purpose.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai generate sequence diagram to generate a sequence diagram of the changes in this PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@PeaBrane PeaBrane changed the title feat: add runtime config/metadata to etcd feat: add RuntimeConfig to ModelEntry Aug 13, 2025
@PeaBrane PeaBrane linked an issue Aug 13, 2025 that may be closed by this pull request
@PeaBrane PeaBrane requested a review from kthui August 13, 2025 03:28
@PeaBrane PeaBrane merged commit d0a6363 into ai-dynamo:main Aug 14, 2025
10 checks passed
hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Co-authored-by: Yan Ru Pei <yanrpei@gmail.com>
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

external-contribution Pull request is from an external contributor feat size/XL

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE]: register runtime initialized parameters as metadata

4 participants