feat: enable custom metrics prefix #2432

ryan-lempka · 2025-08-13T19:47:19Z

Overview:

Allow for the specification of a custom metrics prefix for the frontend metrics names.

Details:

Make HTTP metrics prefix configurable (CLI flag and env var)
Default remains dynamo_frontend; set to nv_llm_http_service for NIM LLM compatibility or to use a preferred prefix
No behavior change unless flag/env provided
Added integration test(s) to verify default and env override

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

New Features
- Added a configurable metrics prefix for the frontend and HTTP service.
- New CLI flag: --metrics-prefix; if omitted, reads DYN_METRICS_PREFIX or defaults to dynamo_frontend.
- All exposed metrics now include the chosen prefix for easier identification and multi-instance separation.
Tests
- Added integration tests to verify default behavior and environment-based overrides for the metrics prefix.

coderabbitai · 2025-08-13T19:54:34Z

Walkthrough

Adds a runtime-configurable metrics prefix for the Dynamo frontend and HTTP service metrics. Introduces a CLI flag to set the prefix and propagates it via the DYN_METRICS_PREFIX environment variable. Metrics initialization now derives names using this prefix. Includes an integration test validating default and overridden prefixes.

Changes

Cohort / File(s)	Summary
Frontend CLI flag `components/frontend/src/dynamo/frontend/main.py`	Adds --metrics-prefix flag; if provided, sets DYN_METRICS_PREFIX before runtime initialization.
HTTP metrics prefixing `lib/llm/src/http/service/metrics.rs`	Implements env-driven metrics prefix (DYN_METRICS_PREFIX), adds METRICS_PREFIX_ENV constant, updates metric name construction to use runtime-derived prefix.
Integration tests for metrics prefix `lib/llm/tests/http_metrics.rs`	New test verifies default prefix and env override by starting services, emitting metrics, and asserting /metrics output.

Sequence Diagram(s)

sequenceDiagram
  participant User as User/Operator
  participant CLI as Frontend CLI
  participant Env as Environment
  participant RT as Runtime Init
  participant Svc as HTTP Service Metrics
  participant Prom as Prometheus

  User->>CLI: Run with --metrics-prefix (optional)
  CLI->>Env: Set DYN_METRICS_PREFIX (if flag provided)
  CLI->>RT: Initialize runtime
  RT->>Svc: Start HTTP service
  Svc->>Env: Read DYN_METRICS_PREFIX or default
  Svc->>Svc: Construct metric names with prefix
  Prom->>Svc: GET /metrics
  Svc-->>Prom: Expose prefixed metrics

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* #2176 — Also updates frontend metrics naming/initialization, modifying the same metrics module to manage the dynamo_frontend prefix.

Poem

I twitch my ears at metrics’ tune,
A prefix set beneath the moon;
Env or flag, I hop with glee,
Names aligned as they should be.
Scrape the fields, count every hop—
custom_prefix makes carrots pop! 🥕✨

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

Review comments: Directly reply to a review comment made by CodeRabbit. Example:
- I pushed a fix in commit <commit_id>, please review it.
- Open a follow-up GitHub issue for this discussion.
Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
- @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
- @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

Visit our Status Page to check the current availability of CodeRabbit.
Visit our Documentation for detailed information on how to use CodeRabbit.
Join our Discord Community to get help, request features, and share feedback.
Follow us on X/Twitter for updates and announcements.

coderabbitai

Actionable comments posted: 6

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c12c257 and d94e47c.

📒 Files selected for processing (3)

components/frontend/src/dynamo/frontend/main.py (2 hunks)
lib/llm/src/http/service/metrics.rs (2 hunks)
lib/llm/tests/http_metrics.rs (1 hunks)

🧰 Additional context used

🧬 Code Graph Analysis (2)

lib/llm/tests/http_metrics.rs (2)

lib/runtime/src/pipeline/network.rs (1)

metrics (338-340)

lib/llm/src/http/service/metrics.rs (3)

new (119-213)

new (315-336)

new (407-416)

lib/llm/src/http/service/metrics.rs (2)

lib/llm/src/http/service/service_v2.rs (1)

new (29-35)

lib/runtime/src/metrics.rs (1)

prefix (394-400)

⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)

GitHub Check: Mirror Repository to GitLab
GitHub Check: pre-merge-rust (.)
GitHub Check: pre-merge-rust (lib/bindings/python)
GitHub Check: Build and Test - dynamo

🔇 Additional comments (3)

components/frontend/src/dynamo/frontend/main.py (1)

136-141: Good addition: CLI flag and clear precedence semantics

Adding --metrics-prefix with explicit fallback to env var or default is clear and backward-compatible.

lib/llm/src/http/service/metrics.rs (2)

15-20: Public constants look good and align with intended defaults

FRONTEND_METRIC_PREFIX and METRICS_PREFIX_ENV are well-named and make downstream usage (e.g., tests) straightforward.

107-119: Docs are clear and comprehensive

The docstring enumerating the env var and metric names is helpful and accurate.

components/frontend/src/dynamo/frontend/main.py

lib/llm/src/http/service/metrics.rs

lib/llm/tests/http_metrics.rs

lib/llm/src/http/service/metrics.rs

lib/llm/tests/http_metrics.rs

grahamking · 2025-08-13T20:10:15Z

@ryan-lempka This looks good. Can you do all the Code Rabbit suggestions? Particularly the one about avoiding sleep in unit tests, but honestly they are all good. Code Rabbit's reviews are so helpful.

ryan-lempka · 2025-08-13T20:14:31Z

@ryan-lempka This looks good. Can you do all the Code Rabbit suggestions? Particularly the one about avoiding sleep in unit tests, but honestly they are all good. Code Rabbit's reviews are so helpful.

@grahamking agreed, these suggestions are solid. I'll add them all now.

keivenchang · 2025-08-13T21:55:25Z

Is it the case that NIM LLM will only use nv_llm_http_service*? We're trying to make the names consistent throughout Dynamo so that debugging is easier.

keivenchang · 2025-08-13T21:52:16Z

lib/llm/tests/http_metrics.rs

+async fn metrics_prefix_default_then_env_override() {
+    // Case 1: default prefix
+    env::remove_var(metrics::METRICS_PREFIX_ENV);
+    let svc1 = HttpService::builder().port(9101).build().unwrap();


I see this looks like it's following examples in http-service.rs, and a bit sad that many tests are not using random available ports (port 0). For tests, it's best to use random available ports so that tests running in parallel, don't run into collisions with other existing services and/or tests. Something like this would be preferred:

async fn create_http_service_with_random_port() -> (HttpService, u16) { // Bind to port 0 to get a random available port let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap(); let actual_port = listener.local_addr().unwrap().port(); // Drop the listener since HttpService will create its own drop(listener); // Create service with the actual port let service = HttpService::builder().port(actual_port).build().unwrap(); (service, actual_port) }

I will use random ports in the future.

rlempka · 2025-08-13T23:21:17Z

Is it the case that NIM LLM will only use nv_llm_http_service*? We're trying to make the names consistent throughout Dynamo so that debugging is easier.

nv_llm_http_service* is what is used currently. Being able to set the prefix as such will allow NIM LLM customers to transition more easily to Dynamo based NIM LLM.

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

ryan-lempka requested review from a team, GuanLuo, PeaBrane, alec-flowers, biswapanda, grahamking, ishandhanani, jthomson04, kkranen, nnshah1, paulhendricks, piotrm-nvidia, ptarasiewiczNV, rmccorm4, ryanolson, tanmayv25, tedzhouhk and tmonty12 as code owners August 13, 2025 19:47

ryan-lempka self-assigned this Aug 13, 2025

pull-request-size bot added the size/M label Aug 13, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 19:47 Inactive

github-actions bot added the feat label Aug 13, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 19:50 Inactive

coderabbitai bot reviewed Aug 13, 2025

View reviewed changes

rmccorm4 requested a review from keivenchang August 13, 2025 19:54

grahamking reviewed Aug 13, 2025

View reviewed changes

lib/llm/src/http/service/metrics.rs Show resolved Hide resolved

grahamking reviewed Aug 13, 2025

View reviewed changes

lib/llm/tests/http_metrics.rs Outdated Show resolved Hide resolved

pull-request-size bot added size/L and removed size/M labels Aug 13, 2025

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 20:50 Inactive

grahamking approved these changes Aug 13, 2025

View reviewed changes

ryan-lempka force-pushed the rlempka/enable-custom-metrics-prefix branch from 82120ba to bfac89b Compare August 13, 2025 20:55

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 20:55 Inactive

ryan-lempka added 4 commits August 13, 2025 20:57

feat: enable custom metrics prefix

d1d1884

chore: simplify env var approach

1163865

chore: clean up licensing comment

d917534

fix: sanitize metric prefix and improve test reliability

cd3b8e2

ryan-lempka force-pushed the rlempka/enable-custom-metrics-prefix branch from bfac89b to cd3b8e2 Compare August 13, 2025 20:58

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 20:58 Inactive

ryan-lempka enabled auto-merge (squash) August 13, 2025 21:00

copy-pr-bot bot temporarily deployed to GITLAB August 13, 2025 21:00 Inactive

ryan-lempka merged commit 3411bda into main Aug 13, 2025
10 of 11 checks passed

ryan-lempka deleted the rlempka/enable-custom-metrics-prefix branch August 13, 2025 21:22

keivenchang reviewed Aug 13, 2025

View reviewed changes

grahamking mentioned this pull request Aug 14, 2025

feat(metrics): add NATS client metrics to prometheus_metrics_fmt #2292

Merged

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025

feat: enable custom metrics prefix (#2432)

6b060bf

Signed-off-by: Hannah Zhang <hannahz@nvidia.com>

feat: enable custom metrics prefix #2432

feat: enable custom metrics prefix #2432

Uh oh!

Conversation

ryan-lempka commented Aug 13, 2025 • edited by coderabbitai bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Overview:

Details:

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

Uh oh!

coderabbitai bot commented Aug 13, 2025

Walkthrough

Changes

Sequence Diagram(s)

Estimated code review effort

Possibly related PRs

Poem

Chat

Support

CodeRabbit Commands (Invoked using PR/Issue comments)

Other keywords and placeholders

Status, Documentation and Community

Uh oh!

coderabbitai bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

grahamking commented Aug 13, 2025

Uh oh!

ryan-lempka commented Aug 13, 2025

Uh oh!

Uh oh!

keivenchang commented Aug 13, 2025

Uh oh!

keivenchang Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

rlempka Aug 13, 2025

Choose a reason for hiding this comment

Uh oh!

rlempka commented Aug 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

ryan-lempka commented Aug 13, 2025 •

edited by coderabbitai bot

Loading

rlempka commented Aug 13, 2025 •

edited

Loading