feat: add NIM FE (num_request_max) + runtime config metrics with periodic polling #3107

keivenchang · 2025-09-18T03:52:35Z

Overview:

Add runtime configuration metrics for model deployment monitoring across all backend frameworks. Fix missing max_num_seqs metrics in SGLang and TensorRT-LLM backends while maintaining existing vLLM functionality.

Details:

Add periodic polling system for model runtime config metrics (max_num_seqs, total_kv_blocks, max_num_batched_tokens)
vLLM Backend: Maintain existing functionality where max_num_seqs is retrieved from engine.vllm_config.scheduler_config
TensorRT-LLM Backend: Add missing runtime config metric population by setting max_num_seqs from config.max_batch_size and max_num_batched_tokens from config.max_num_tokens
SGLang Backend: Fix max_num_seqs retrieval by accessing server_args instead of scheduler_info due to SGLang's architectural separation of configuration parameters from runtime statistics
Rename MODEL_WORKER_COUNT to MODEL_WORKERS_TOTAL following Prometheus conventions
Add configuration safety guard against invalid polling intervals
Update metrics documentation with new runtime configuration metrics

Where should the reviewer start?

lib/llm/src/http/service/metrics.rs - Core metrics implementation with periodic polling
components/backends/trtllm/src/dynamo/trtllm/main.py - TensorRT-LLM metric consistency improvements
components/backends/sglang/src/dynamo/sglang/register.py - SGLang architectural fix for config vs runtime separation

Related Issues:

DIS-607 DIS-641

/coderabbit profile chill

coderabbitai · 2025-09-18T03:59:49Z

Walkthrough

Adds per-model runtime-configuration and health metrics to HTTP metrics, registers new Prometheus gauges, and starts a background polling task from service initialization to periodically collect ModelManager and optional etcd (MDC) data, updating metrics and marking model health accordingly. Poll interval is set via DYN_RUNTIME_CONFIG_METRICS_POLL_INTERVAL_SECS.

Changes

Cohort / File(s)	Summary
Per-model runtime metrics & health `lib/llm/src/http/service/metrics.rs`	Adds IntGaugeVec fields for per-model runtime config and health; public APIs to update runtime config, MDC values, and health; helpers to sync from ModelEntry; async helper to fetch MDC via etcd; launches periodic polling task to refresh metrics and health states; registers new metrics.
Service wiring & polling `lib/llm/src/http/service/service_v2.rs`	Clones `etcd_client` in builder; reads poll interval from `DYN_RUNTIME_CONFIG_METRICS_POLL_INTERVAL_SECS` (default 30s); starts background runtime-config metrics polling via `Metrics::start_runtime_config_polling_task`; does not retain JoinHandle.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Env as Env
  participant Service as HttpService::build
  participant Metrics as Metrics
  participant Manager as ModelManager
  participant Etcd as etcd::Client
  participant Prom as Prometheus

  note over Service: Startup
  Env->>Service: Read DYN_RUNTIME_CONFIG_METRICS_POLL_INTERVAL_SECS
  Service->>Metrics: start_runtime_config_polling_task(metrics, manager, etcd?, interval)
  activate Metrics
  Metrics->>Metrics: Spawn background task (loop every interval)
  deactivate Metrics

  rect rgb(235, 245, 255)
  note right of Metrics: Polling cycle
  loop Every interval
    Metrics->>Manager: list/get ModelEntry items
    Manager-->>Metrics: ModelEntry[]
    alt etcd client present
      Metrics->>Etcd: Load MDC (context_length, kv_block_size, migration_limit)
      Etcd-->>Metrics: MDC values
      Metrics->>Metrics: update_mdc_metrics(model_name, ...)
    end
    Metrics->>Metrics: update_runtime_config_metrics(model_name, ...)
    Metrics->>Metrics: set_model_health_status(model_name, healthy=true)
    Metrics->>Prom: Update gauge vectors for models
    Metrics->>Metrics: mark inactive models healthy=false
    Metrics->>Prom: Update health gauges
  end
  end

Estimated code review effort

🎯 4 (Complex) | ⏱️ ~60 minutes

Possibly related PRs

fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* #2176 — Touches the same metrics module, adjusting metric prefixes and constructor behavior; intersects with newly added per-model gauges and registration.
feat: add RuntimeConfig to ModelEntry #2311 — Introduces/uses ModelRuntimeConfig; directly related to consuming runtime configs and exposing them via new metrics paths.

Poem

In burrows of bytes where models dwell,
I twitch my nose at gauges that tell—
Blocks, lengths, health, a metric stew,
With polls that hop each interval anew.
Etcd whispers, Manager hums,
Prometheus hears the gentle drums.
Thump-thump—runtime truth becomes!

Pre-merge checks

✅ Passed checks (3 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Title Check	✅ Passed	The title accurately highlights the main change—adding runtime configuration metrics with periodic polling—which matches the new metrics and the background polling task added in lib/llm/src/http/service/metrics.rs and service_v2.rs; the "NIM FE (num_request_max)" fragment refers to branch-level work but is not clearly reflected in the provided file summaries, so it is mildly noisy. Overall the title is specific and relevant to the changeset and not generic or misleading.
Description Check	✅ Passed	The pull request description largely follows the repository template: it contains an Overview, Details, Where should the reviewer start, and Related Issues, and it clearly summarizes the intent (new Prometheus metrics, periodic polling, MDC fields) and points reviewers to the primary files. It documents runtime behavior (historical-data preservation, DYN_RUNTIME_CONFIG_METRICS_POLL_INTERVAL_SECS) and backend-specific fixes so reviewers can assess scope and impact. The Related Issues entry lists internal issue IDs but does not use GitHub action keywords or explicit #issue references as suggested by the template.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🧹 Nitpick comments (5)

lib/llm/src/http/service/metrics.rs (4)
138-153: Doc nit: unknown health is supported but never emitted by the poller

You expose -1 = unknown and a setter, but the polling path never sets unknown. Consider when/if you want unknown surfaced (e.g., before first successful poll, or when the manager is temporarily unavailable).

262-325: Metric suffix constants: align with repo naming registry for consistency

To keep naming consistent across the codebase (per prior standardization), define suffix constants in prometheus_names::frontend_service and use them here instead of string literals.

Would you like a follow‑up patch adding constants like FRONTEND_MODEL_TOTAL_KV_BLOCKS, etc., and wiring them here?

470-491: Normalize model label for MDC metrics

Keep label casing consistent with other metrics to avoid duplicate series.

Apply:
 pub fn update_mdc_metrics(
     &self,
     model_name: &str,
     context_length: u32,
     kv_cache_block_size: u32,
     migration_limit: u32,
 ) {
-    self.model_context_length
-        .with_label_values(&[model_name])
+    let model = model_name.to_lowercase();
+    self.model_context_length
+        .with_label_values(&[&model])
         .set(context_length as i64);
 
-    self.model_kv_cache_block_size
-        .with_label_values(&[model_name])
+    self.model_kv_cache_block_size
+        .with_label_values(&[&model])
         .set(kv_cache_block_size as i64);
 
-    self.model_migration_limit
-        .with_label_values(&[model_name])
+    self.model_migration_limit
+        .with_label_values(&[&model])
         .set(migration_limit as i64);
 }
553-623: Poller robustness: missed ticks + initial run semantics

Set MissedTickBehavior::Delay to avoid burst catch‑up if the loop runs long.

Confirm whether the first tick is immediate in your Tokio version; if not, perform an initial poll before the loop.

Apply:
         tokio::spawn(async move {
-            let mut interval = tokio::time::interval(poll_interval);
+            let mut interval = tokio::time::interval(poll_interval);
+            interval.set_missed_tick_behavior(tokio::time::MissedTickBehavior::Delay);
Optional: run one poll before the loop to avoid waiting a full interval on startup.
lib/llm/src/http/service/service_v2.rs (1)

327-334: Background task lifecycle: consider graceful shutdown

The JoinHandle is dropped; the task runs past service shutdown and can’t be cancelled. Store the handle (e.g., on HttpService or State) and abort it when CancellationToken triggers.

If you want, I can draft a follow‑up patch that threads the handle into HttpService and aborts it in run() when observer is cancelled.

📜 Review details

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 6dd3326 and 83cb3fe.

📒 Files selected for processing (2)

lib/llm/src/http/service/metrics.rs (6 hunks)
lib/llm/src/http/service/service_v2.rs (2 hunks)

🧰 Additional context used

🧠 Learnings (2)

📓 Common learnings

Learnt from: keivenchang
PR: ai-dynamo/dynamo#3035
File: lib/runtime/src/metrics/prometheus_names.rs:49-53
Timestamp: 2025-09-16T00:26:37.092Z
Learning: keivenchang prefers consistency in metric naming standardization over strict adherence to Prometheus conventions about gauge vs counter suffixes. When standardizing metrics naming, prioritize consistency across the codebase rather than technical pedantry about individual metric type conventions.

📚 Learning: 2025-09-02T16:46:54.015Z

Learnt from: GuanLuo
PR: ai-dynamo/dynamo#2714
File: lib/llm/src/discovery/model_entry.rs:38-42
Timestamp: 2025-09-02T16:46:54.015Z
Learning: In lib/llm/src/discovery/model_entry.rs, GuanLuo prefers not to add serde defaults for model_type and model_input fields to keep the specification explicit and avoid user errors, relying on atomic deployment strategy to avoid backward compatibility issues.

Applied to files:

lib/llm/src/http/service/metrics.rs

🧬 Code graph analysis (2)

lib/llm/src/http/service/service_v2.rs (3)

lib/bindings/python/rust/lib.rs (1)

etcd_client (368-373)

lib/runtime/src/transports/etcd.rs (1)

etcd_client (131-133)

lib/llm/src/http/service/metrics.rs (1)

start_runtime_config_polling_task (556-623)

lib/llm/src/http/service/metrics.rs (5)

lib/llm/src/local_model.rs (2)

runtime_config (179-182)

runtime_config (376-378)

lib/llm/src/http/service/service_v2.rs (3)

new (74-86)

etcd_client (114-116)

manager (106-108)

lib/llm/src/discovery/model_manager.rs (1)

new (52-60)

lib/bindings/python/rust/llm/local_model.rs (2)

total_kv_blocks (56-58)

max_num_seqs (61-63)

lib/runtime/src/transports/etcd.rs (1)

etcd_client (131-133)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (6)

GitHub Check: Build and Test - sglang
GitHub Check: Build and Test - vllm
GitHub Check: Build and Test - dynamo
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: pre-merge-rust (lib/runtime/examples)
GitHub Check: pre-merge-rust (.)

🔇 Additional comments (2)

lib/llm/src/http/service/metrics.rs (1)

39-47: Good addition: clear, per‑model runtime config and health gauges

Fields look consistent and minimal in label cardinality (only "model"). Nice.

lib/llm/src/http/service/service_v2.rs (1)

296-299: Cloning etcd client once is fine

Local clone for passing into the poller while also storing in State looks correct.

lib/llm/src/http/service/metrics.rs

lib/llm/src/http/service/service_v2.rs

kthui

I think it would be nice to have some extra test case(s) on the new metrics - to get a better understanding on how the new metrics would look from end users.

deploy/metrics/README.md

ryan-lempka

Few minor comments (non-blocking). Otherwise LGTM.

deploy/metrics/README.md

lib/llm/src/http/service/metrics.rs

KrishnanPrash · 2025-09-23T19:21:18Z

Add periodic polling system for model runtime config metrics (max_num_seqs, total_kv_blocks, max_num_batched_tokens)

Is polling required here? Are there use-cases where the same MDC changes after initial publishing?

keivenchang · 2025-09-23T21:32:56Z

Add periodic polling system for model runtime config metrics (max_num_seqs, total_kv_blocks, max_num_batched_tokens)

Is polling required here? Are there use-cases where the same MDC changes after initial publishing?

Good point-- once MDC is registered it's always there. However, I do have one gauge called model_workers where it counts the # of workers available serving a particular model. The value can be anywhere between 0 to n. I tested this with multiple workers serving the same model. Other than that, if we can assume that workers that start/stop will always have the exact same MDC, then we don't have to have to keep polling for MDC values.

Are these valid assumptions to make?

- Add new Prometheus metrics for model runtime configuration - Include metrics for total_kv_blocks, max_num_seqs, max_num_batched_tokens - Add MDC metrics for context_length, kv_cache_block_size, migration_limit - Implement model health status tracking (healthy/unhealthy/unknown) - Add background polling task to keep metrics current as backends change - Preserve all historical data - metrics are never removed, only marked unhealthy - Configurable poll interval via DYN_RUNTIME_CONFIG_METRICS_POLL_INTERVAL_SECS Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

…Prometheus conventions - Update constant name from MODEL_WORKER_COUNT to MODEL_WORKERS_TOTAL - Change metric name from model_worker_count to model_workers_total - Update all references in metrics service implementation - Update documentation in README to reflect new metric name - Improve metrics polling task lifecycle management in service_v2 - Follows Prometheus naming convention using _total suffix for counters Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

…INTERVAL_SECS - Add validation to ensure poll interval is greater than 0 - Use filter() to reject zero or negative values - Update documentation to clarify validation requirement - Prevents potential issues with zero-duration polling intervals Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

…kends SGLang: get max_num_seqs from server_args since SGLang separates config from runtime stats TensorRT-LLM: populate max_num_seqs and max_num_batched_tokens from config for metrics consistency

- Fix metric name from model_workers_total to model_workers - Document model name deduplication behavior in README.md - Add comments explaining gauge vs counter usage for runtime config metrics - Clarify that some metrics use gauges because they're synchronized from upstream Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

…g intervals Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

keivenchang · 2025-09-24T16:42:47Z

Add periodic polling system for model runtime config metrics (max_num_seqs, total_kv_blocks, max_num_batched_tokens)

Is polling required here? Are there use-cases where the same MDC changes after initial publishing?

Discussed this on Slack. Will try to get this in before code-freeze, and then work on migrating polling to the Watcher method.

KrishnanPrash

LGTM! Left a note to document future work.

Note: Under the assumption that MDC are static, we can use a watcher pattern used by the frontend [reference] instead of the polling approach currently being utilized.

…odic polling (#3107) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Signed-off-by: Jason Zhou <jasonzho@nvidia.com>

…odic polling (#3107) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Signed-off-by: Kyle H <kylhuang@nvidia.com>

keivenchang requested a review from a team as a code owner September 18, 2025 03:52

pull-request-size bot added the size/L label Sep 18, 2025

keivenchang self-assigned this Sep 18, 2025

github-actions bot added the feat label Sep 18, 2025

coderabbitai bot reviewed Sep 18, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

lib/llm/src/http/service/service_v2.rs Outdated Show resolved Hide resolved

keivenchang requested review from atchernych, biswapanda, hhzhang16, hutm, ishandhanani, julienmancuso, mohammedabdulwahhab and nnshah1 as code owners September 19, 2025 01:01

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 01:01 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 01:02 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 01:06 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 19, 2025 01:07 Inactive

keivenchang requested review from a team as code owners September 20, 2025 00:02

copy-pr-bot bot temporarily deployed to GITLAB September 20, 2025 00:02 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 20, 2025 00:03 Inactive

keivenchang changed the title ~~feat: add runtime config metrics with periodic polling~~ feat: add NIM args + runtime config metrics with periodic polling Sep 20, 2025

keivenchang changed the title ~~feat: add NIM args + runtime config metrics with periodic polling~~ feat: add NIM FE args + runtime config metrics with periodic polling Sep 20, 2025

kthui reviewed Sep 20, 2025

View reviewed changes

deploy/metrics/README.md Show resolved Hide resolved

keivenchang changed the title ~~feat: add NIM FE args + runtime config metrics with periodic polling~~ feat: add NIM FE (num_request_max) + runtime config metrics with periodic polling Sep 22, 2025

ryan-lempka approved these changes Sep 22, 2025

View reviewed changes

deploy/metrics/README.md Outdated Show resolved Hide resolved

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

pull-request-size bot added size/XL and removed size/L labels Sep 22, 2025

copy-pr-bot bot temporarily deployed to GITLAB September 22, 2025 21:39 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 22, 2025 21:40 Inactive

keivenchang commented Sep 23, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Outdated Show resolved Hide resolved

grahamking approved these changes Sep 23, 2025

View reviewed changes

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 01:06 Inactive

keivenchang force-pushed the keivenchang/DIS_607__NIM_num_request_max_ branch from 8ca98e5 to 392cec1 Compare September 24, 2025 01:06

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 01:06 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 01:12 Inactive

keivenchang added 6 commits September 24, 2025 16:10

fix: add missing max_num_seqs metrics for SGLang and TensorRT-LLM bac…

a6884b2

…kends SGLang: get max_num_seqs from server_args since SGLang separates config from runtime stats TensorRT-LLM: populate max_num_seqs and max_num_batched_tokens from config for metrics consistency

test: add MDC registration integration test and fix fractional pollin…

3151cb1

…g intervals Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com>

keivenchang force-pushed the keivenchang/DIS_607__NIM_num_request_max_ branch from 392cec1 to 3151cb1 Compare September 24, 2025 16:14

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 16:14 Inactive

copy-pr-bot bot temporarily deployed to GITLAB September 24, 2025 16:17 Inactive

KrishnanPrash self-requested a review September 24, 2025 16:40

KrishnanPrash approved these changes Sep 24, 2025

View reviewed changes

hutm approved these changes Sep 24, 2025

View reviewed changes

keivenchang merged commit 116b9b4 into main Sep 24, 2025
17 of 19 checks passed

keivenchang deleted the keivenchang/DIS_607__NIM_num_request_max_ branch September 24, 2025 17:19

coderabbitai bot mentioned this pull request Sep 24, 2025

feat: replace polling with event-driven metrics updates #3207

Merged

kylehh pushed a commit that referenced this pull request Sep 25, 2025

feat: add NIM FE (num_request_max) + runtime config metrics with peri…

e29e5f7

…odic polling (#3107) Signed-off-by: Keiven Chang <keivenchang@users.noreply.github.com> Signed-off-by: Kyle H <kylhuang@nvidia.com>

feat: add NIM FE (num_request_max) + runtime config metrics with periodic polling #3107

feat: add NIM FE (num_request_max) + runtime config metrics with periodic polling #3107

Uh oh!

Conversation

keivenchang commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues:

Uh oh!

coderabbitai bot commented Sep 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Pre-merge checks

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

kthui left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

ryan-lempka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

KrishnanPrash commented Sep 23, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

keivenchang commented Sep 23, 2025

Uh oh!

keivenchang commented Sep 24, 2025

Uh oh!

KrishnanPrash left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

keivenchang commented Sep 18, 2025 •

edited

Loading

coderabbitai bot commented Sep 18, 2025 •

edited

Loading

KrishnanPrash commented Sep 23, 2025 •

edited

Loading