Skip to content

Conversation

@ryan-lempka
Copy link
Contributor

@ryan-lempka ryan-lempka commented Aug 13, 2025

Overview:

Allow for the specification of a custom metrics prefix for the frontend metrics names.

Details:

  • Make HTTP metrics prefix configurable (CLI flag and env var)
  • Default remains dynamo_frontend; set to nv_llm_http_service for NIM LLM compatibility or to use a preferred prefix
  • No behavior change unless flag/env provided
  • Added integration test(s) to verify default and env override

Where should the reviewer start?

Related Issues: (use one of the action keywords Closes / Fixes / Resolves / Relates to)

Summary by CodeRabbit

  • New Features

    • Added a configurable metrics prefix for the frontend and HTTP service.
    • New CLI flag: --metrics-prefix; if omitted, reads DYN_METRICS_PREFIX or defaults to dynamo_frontend.
    • All exposed metrics now include the chosen prefix for easier identification and multi-instance separation.
  • Tests

    • Added integration tests to verify default behavior and environment-based overrides for the metrics prefix.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Aug 13, 2025

Walkthrough

Adds a runtime-configurable metrics prefix for the Dynamo frontend and HTTP service metrics. Introduces a CLI flag to set the prefix and propagates it via the DYN_METRICS_PREFIX environment variable. Metrics initialization now derives names using this prefix. Includes an integration test validating default and overridden prefixes.

Changes

Cohort / File(s) Summary
Frontend CLI flag
components/frontend/src/dynamo/frontend/main.py
Adds --metrics-prefix flag; if provided, sets DYN_METRICS_PREFIX before runtime initialization.
HTTP metrics prefixing
lib/llm/src/http/service/metrics.rs
Implements env-driven metrics prefix (DYN_METRICS_PREFIX), adds METRICS_PREFIX_ENV constant, updates metric name construction to use runtime-derived prefix.
Integration tests for metrics prefix
lib/llm/tests/http_metrics.rs
New test verifies default prefix and env override by starting services, emitting metrics, and asserting /metrics output.

Sequence Diagram(s)

sequenceDiagram
  participant User as User/Operator
  participant CLI as Frontend CLI
  participant Env as Environment
  participant RT as Runtime Init
  participant Svc as HTTP Service Metrics
  participant Prom as Prometheus

  User->>CLI: Run with --metrics-prefix (optional)
  CLI->>Env: Set DYN_METRICS_PREFIX (if flag provided)
  CLI->>RT: Initialize runtime
  RT->>Svc: Start HTTP service
  Svc->>Env: Read DYN_METRICS_PREFIX or default
  Svc->>Svc: Construct metric names with prefix
  Prom->>Svc: GET /metrics
  Svc-->>Prom: Expose prefixed metrics
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~15 minutes

Possibly related PRs

Poem

I twitch my ears at metrics’ tune,
A prefix set beneath the moon;
Env or flag, I hop with glee,
Names aligned as they should be.
Scrape the fields, count every hop—
custom_prefix makes carrots pop! 🥕✨


Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.

Support

Need help? Create a ticket on our support page for assistance with any issues or questions.

CodeRabbit Commands (Invoked using PR/Issue comments)

Type @coderabbitai help to get the list of available commands.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

Status, Documentation and Community

  • Visit our Status Page to check the current availability of CodeRabbit.
  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 6

📜 Review details

Configuration used: .coderabbit.yaml
Review profile: CHILL
Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between c12c257 and d94e47c.

📒 Files selected for processing (3)
  • components/frontend/src/dynamo/frontend/main.py (2 hunks)
  • lib/llm/src/http/service/metrics.rs (2 hunks)
  • lib/llm/tests/http_metrics.rs (1 hunks)
🧰 Additional context used
🧬 Code Graph Analysis (2)
lib/llm/tests/http_metrics.rs (2)
lib/runtime/src/pipeline/network.rs (1)
  • metrics (338-340)
lib/llm/src/http/service/metrics.rs (3)
  • new (119-213)
  • new (315-336)
  • new (407-416)
lib/llm/src/http/service/metrics.rs (2)
lib/llm/src/http/service/service_v2.rs (1)
  • new (29-35)
lib/runtime/src/metrics.rs (1)
  • prefix (394-400)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (4)
  • GitHub Check: Mirror Repository to GitLab
  • GitHub Check: pre-merge-rust (.)
  • GitHub Check: pre-merge-rust (lib/bindings/python)
  • GitHub Check: Build and Test - dynamo
🔇 Additional comments (3)
components/frontend/src/dynamo/frontend/main.py (1)

136-141: Good addition: CLI flag and clear precedence semantics

Adding --metrics-prefix with explicit fallback to env var or default is clear and backward-compatible.

lib/llm/src/http/service/metrics.rs (2)

15-20: Public constants look good and align with intended defaults

FRONTEND_METRIC_PREFIX and METRICS_PREFIX_ENV are well-named and make downstream usage (e.g., tests) straightforward.


107-119: Docs are clear and comprehensive

The docstring enumerating the env var and metric names is helpful and accurate.

@rmccorm4 rmccorm4 requested a review from keivenchang August 13, 2025 19:54
@grahamking
Copy link
Contributor

@ryan-lempka This looks good. Can you do all the Code Rabbit suggestions? Particularly the one about avoiding sleep in unit tests, but honestly they are all good. Code Rabbit's reviews are so helpful.

@ryan-lempka
Copy link
Contributor Author

@ryan-lempka This looks good. Can you do all the Code Rabbit suggestions? Particularly the one about avoiding sleep in unit tests, but honestly they are all good. Code Rabbit's reviews are so helpful.

@grahamking agreed, these suggestions are solid. I'll add them all now.

@ryan-lempka ryan-lempka force-pushed the rlempka/enable-custom-metrics-prefix branch from 82120ba to bfac89b Compare August 13, 2025 20:55
@ryan-lempka ryan-lempka force-pushed the rlempka/enable-custom-metrics-prefix branch from bfac89b to cd3b8e2 Compare August 13, 2025 20:58
@ryan-lempka ryan-lempka enabled auto-merge (squash) August 13, 2025 21:00
@ryan-lempka ryan-lempka merged commit 3411bda into main Aug 13, 2025
10 of 11 checks passed
@ryan-lempka ryan-lempka deleted the rlempka/enable-custom-metrics-prefix branch August 13, 2025 21:22
@keivenchang
Copy link
Contributor

Is it the case that NIM LLM will only use nv_llm_http_service*? We're trying to make the names consistent throughout Dynamo so that debugging is easier.

async fn metrics_prefix_default_then_env_override() {
// Case 1: default prefix
env::remove_var(metrics::METRICS_PREFIX_ENV);
let svc1 = HttpService::builder().port(9101).build().unwrap();
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see this looks like it's following examples in http-service.rs, and a bit sad that many tests are not using random available ports (port 0). For tests, it's best to use random available ports so that tests running in parallel, don't run into collisions with other existing services and/or tests. Something like this would be preferred:

async fn create_http_service_with_random_port() -> (HttpService, u16) {
    // Bind to port 0 to get a random available port
    let listener = tokio::net::TcpListener::bind("127.0.0.1:0").await.unwrap();
    let actual_port = listener.local_addr().unwrap().port();
    
    // Drop the listener since HttpService will create its own
    drop(listener);
    
    // Create service with the actual port
    let service = HttpService::builder().port(actual_port).build().unwrap();
    
    (service, actual_port)
}

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will use random ports in the future.

@rlempka
Copy link

rlempka commented Aug 13, 2025

Is it the case that NIM LLM will only use nv_llm_http_service*? We're trying to make the names consistent throughout Dynamo so that debugging is easier.

nv_llm_http_service* is what is used currently. Being able to set the prefix as such will allow NIM LLM customers to transition more easily to Dynamo based NIM LLM.

hhzhang16 pushed a commit that referenced this pull request Aug 27, 2025
Signed-off-by: Hannah Zhang <hannahz@nvidia.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants