Skip to content

Commit 8c75ed7

Browse files
authored
fix: frontend metrics to be renamed from nv_llm_http_service_* => dynamo_frontend_* (#2176)
Co-authored-by: Keiven Chang <keivenchang@users.noreply.github.com>
1 parent 66231cf commit 8c75ed7

File tree

11 files changed

+115
-95
lines changed

11 files changed

+115
-95
lines changed

components/metrics/Cargo.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -38,4 +38,4 @@ tracing = { workspace = true }
3838
# TODO: Update axum to 0.8
3939
axum = { version = "0.6" }
4040
clap = { version = "4.5", features = ["derive", "env"] }
41-
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
41+
reqwest = { version = "0.12.22", default-features = false, features = ["json", "rustls-tls"] }

components/metrics/README.md

Lines changed: 27 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -1,13 +1,23 @@
11
# Metrics
22

3-
The `metrics` component is a utility that can collect, aggregate, and publish
4-
metrics from a Dynamo deployment. After collecting and aggregating metrics from
5-
workers, it exposes them via an HTTP `/metrics` endpoint in Prometheus format
6-
that other applications or visualization tools like Prometheus server and Grafana can
7-
pull from.
8-
9-
**Note**: This is a demo implementation. The metrics component is currently under active development and this documentation will change as the implementation evolves.
10-
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "nv_llm" (e.g., the HTTP `/metrics` endpoint will serve metrics with "nv_llm" prefixes)
3+
⚠️ **DEPRECATION NOTICE** ⚠️
4+
5+
**This `metrics` component is unmaintained and being deprecated.**
6+
7+
The deprecated `metrics` component is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The `MetricsRegistry` provides:
8+
9+
**For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of this component.**
10+
11+
This component may be migrated to the MetricsRegistry in the future.
12+
13+
**📖 See the [Dynamo MetricsRegistry Guide](../../docs/guides/metrics.md) for detailed information on using the new metrics system.**
14+
15+
---
16+
17+
The deprecated `metrics` component is a utility for collecting, aggregating, and publishing metrics from a Dynamo deployment, but it is unmaintained and being deprecated in favor of `MetricsRegistry`.
18+
19+
**Note**: This is a demo implementation. The deprecated `metrics` component is no longer under active development.
20+
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "dynamo" (e.g., the HTTP `/metrics` endpoint will serve metrics with "dynamo" prefixes)
1121
- This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work
1222

1323
<div align="center">
@@ -16,7 +26,7 @@ pull from.
1626

1727
## Quickstart
1828

19-
To start the `metrics` component, simply point it at the `namespace/component/endpoint`
29+
To start the deprecated `metrics` component, simply point it at the `namespace/component/endpoint`
2030
trio for the Dynamo workers that you're interested in monitoring metrics on.
2131

2232
This will:
@@ -45,14 +55,14 @@ will get automatically discovered and the warnings will stop.
4555

4656
## Workers
4757

48-
The `metrics` component needs running workers to gather metrics from,
58+
The deprecated `metrics` component needs running workers to gather metrics from,
4959
so below are some examples of workers and how they can be monitored.
5060

5161
### Mock Worker
5262

53-
To try out how `metrics` works, there is a demo Rust-based
63+
To try out how the deprecated `metrics` component works, there is a demo Rust-based
5464
[mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms:
55-
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from `metrics`) with randomly generated `ForwardPassMetrics` data
65+
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from the deprecated `metrics` component) with randomly generated `ForwardPassMetrics` data
5666
2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics
5767

5868
Step 1: Launch a mock workers via the following command (if already built):
@@ -99,11 +109,11 @@ docker compose -f deploy/docker-compose.yml --profile metrics up -d
99109

100110
## Metrics Collection Modes
101111

102-
The metrics component supports two modes for exposing metrics in a Prometheus format:
112+
The deprecated `metrics` component supports two modes for exposing metrics in a Prometheus format:
103113

104114
### Pull Mode (Default)
105115

106-
When running in pull mode (the default), the metrics component will expose a
116+
When running in pull mode (the default), the deprecated `metrics` component will expose a
107117
Prometheus metrics endpoint on the specified host and port that a
108118
Prometheus server or curl client can pull from:
109119

@@ -136,7 +146,7 @@ curl localhost:9091/metrics
136146
### Push Mode
137147

138148
For ephemeral or batch jobs, or when metrics need to be pushed through a firewall,
139-
you can use Push mode. In this mode, the metrics component will periodically push
149+
you can use Push mode. In this mode, the deprecated `metrics` component will periodically push
140150
metrics to an externally hosted
141151
[Prometheus PushGateway](https://prometheus.io/docs/instrumenting/pushing/):
142152

@@ -145,7 +155,7 @@ Start a prometheus push gateway service via docker:
145155
docker run --rm -d -p 9091:9091 --name pushgateway prom/pushgateway
146156
```
147157

148-
Start the metrics component in `--push` mode, specifying the host and port of your PushGateway:
158+
Start the deprecated `metrics` component in `--push` mode, specifying the host and port of your PushGateway:
149159
```bash
150160
# Push metrics to a Prometheus PushGateway every --push-interval seconds
151161
metrics \
@@ -173,7 +183,7 @@ curl 127.0.0.1:9091/metrics
173183
```
174184
## Building/Running from Source
175185

176-
For easy iteration while making edits to the metrics component, you can use `cargo run`
186+
For easy iteration while making edits to the deprecated `metrics` component, you can use `cargo run`
177187
to build and run with your local changes:
178188

179189
```bash

components/planner/src/dynamo/planner/utils/prometheus.py

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -35,15 +35,16 @@ def _get_average_metric(
3535
increase(metric_sum[interval])/increase(metric_count[interval])
3636
3737
Args:
38-
metric_name: Base metric name (e.g., 'nv_llm_http_service_inter_token_latency_seconds')
38+
metric_name: Base metric name (e.g., 'inter_token_latency_seconds')
3939
interval: Time interval for the query (e.g., '60s')
4040
operation_name: Human-readable name for error logging
4141
4242
Returns:
4343
Average metric value or 0 if no data/error
4444
"""
4545
try:
46-
query = f"increase({metric_name}_sum[{interval}])/increase({metric_name}_count[{interval}])"
46+
full_metric_name = f"dynamo_frontend_{metric_name}"
47+
query = f"increase({full_metric_name}_sum[{interval}])/increase({full_metric_name}_count[{interval}])"
4748
result = self.prom.custom_query(query=query)
4849
if not result:
4950
# No data available yet (no requests made) - return 0 silently
@@ -55,21 +56,21 @@ def _get_average_metric(
5556

5657
def get_avg_inter_token_latency(self, interval: str):
5758
return self._get_average_metric(
58-
"nv_llm_http_service_inter_token_latency_seconds",
59+
"inter_token_latency_seconds",
5960
interval,
6061
"avg inter token latency",
6162
)
6263

6364
def get_avg_time_to_first_token(self, interval: str):
6465
return self._get_average_metric(
65-
"nv_llm_http_service_time_to_first_token_seconds",
66+
"time_to_first_token_seconds",
6667
interval,
6768
"avg time to first token",
6869
)
6970

7071
def get_avg_request_duration(self, interval: str):
7172
return self._get_average_metric(
72-
"nv_llm_http_service_request_duration_seconds",
73+
"request_duration_seconds",
7374
interval,
7475
"avg request duration",
7576
)
@@ -78,7 +79,7 @@ def get_avg_request_count(self, interval: str):
7879
# This function follows a different query pattern than the other metrics
7980
try:
8081
raw_res = self.prom.custom_query(
81-
query=f"increase(nv_llm_http_service_requests_total[{interval}])"
82+
query=f"increase(dynamo_frontend_requests_total[{interval}])"
8283
)
8384
total_count = 0.0
8485
for res in raw_res:
@@ -91,14 +92,14 @@ def get_avg_request_count(self, interval: str):
9192

9293
def get_avg_input_sequence_tokens(self, interval: str):
9394
return self._get_average_metric(
94-
"nv_llm_http_service_input_sequence_tokens",
95+
"input_sequence_tokens",
9596
interval,
9697
"avg input sequence tokens",
9798
)
9899

99100
def get_avg_output_sequence_tokens(self, interval: str):
100101
return self._get_average_metric(
101-
"nv_llm_http_service_output_sequence_tokens",
102+
"output_sequence_tokens",
102103
interval,
103104
"avg output sequence tokens",
104105
)

deploy/metrics/README.md

Lines changed: 10 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -60,7 +60,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container
6060

6161
- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
6262
- Uncomment the appropriate lines in prometheus.yml to poll port 9091.
63-
- Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics.
63+
- Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.
6464

6565

6666
## Configuration
@@ -95,16 +95,19 @@ The following configuration files should be present in this directory:
9595
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
9696
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.
9797

98-
## Running the example `metrics` component
98+
## Running the deprecated `metrics` component
9999

100-
IMPORTANT: This section is being phased out, and some metrics may not function as expected. A new solution is under development.
100+
⚠️ **DEPRECATION NOTICE** ⚠️
101101

102-
When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the followings (defined in [../../components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
103-
- `llm_requests_active_slots`: Number of currently active request slots per worker
102+
When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
103+
104+
**⚠️ The following `llm_kv_*` metrics are deprecated:**
105+
106+
- `llm_requests_active_slots`: Active request slots per worker
104107
- `llm_requests_total_slots`: Total available request slots per worker
105-
- `llm_kv_blocks_active`: Number of active KV blocks per worker
108+
- `llm_kv_blocks_active`: Active KV blocks per worker
106109
- `llm_kv_blocks_total`: Total KV blocks available per worker
107-
- `llm_kv_hit_rate_percent`: Cumulative KV Cache hit percent per worker
110+
- `llm_kv_hit_rate_percent`: KV Cache hit percent per worker
108111
- `llm_load_avg`: Average load across workers
109112
- `llm_load_std`: Load standard deviation across workers
110113

deploy/metrics/grafana_dashboards/grafana-dynamo-dashboard.json

Lines changed: 11 additions & 11 deletions
Original file line numberDiff line numberDiff line change
@@ -27,7 +27,7 @@
2727
"type": "prometheus",
2828
"uid": "P1809F7CD0C75ACF3"
2929
},
30-
"description": "nv_llm_http_service_requests_total (1m)",
30+
"description": "dynamo_frontend_requests_total (1m)",
3131
"fieldConfig": {
3232
"defaults": {
3333
"color": {
@@ -106,7 +106,7 @@
106106
"targets": [
107107
{
108108
"editorMode": "code",
109-
"expr": "rate(nv_llm_http_service_requests_total[30s])",
109+
"expr": "rate(dynamo_frontend_requests_total[30s])",
110110
"legendFormat": "{{request_type}}, {{status}},",
111111
"range": true,
112112
"refId": "A"
@@ -120,7 +120,7 @@
120120
"type": "prometheus",
121121
"uid": "P1809F7CD0C75ACF3"
122122
},
123-
"description": "nv_llm_http_service_time_to_first_token_seconds (sum/count)",
123+
"description": "dynamo_frontend_time_to_first_token_seconds (sum/count)",
124124
"fieldConfig": {
125125
"defaults": {
126126
"color": {
@@ -199,7 +199,7 @@
199199
"targets": [
200200
{
201201
"editorMode": "code",
202-
"expr": "1000*(nv_llm_http_service_time_to_first_token_seconds_sum/nv_llm_http_service_time_to_first_token_seconds_count)",
202+
"expr": "1000*(dynamo_frontend_time_to_first_token_seconds_sum/dynamo_frontend_time_to_first_token_seconds_count)",
203203
"legendFormat": "{{model}}",
204204
"range": true,
205205
"refId": "A"
@@ -213,7 +213,7 @@
213213
"type": "prometheus",
214214
"uid": "P1809F7CD0C75ACF3"
215215
},
216-
"description": "nv_llm_http_service_inter_token_latency_seconds (sum/count)",
216+
"description": "dynamo_frontend_inter_token_latency_seconds (sum/count)",
217217
"fieldConfig": {
218218
"defaults": {
219219
"color": {
@@ -292,7 +292,7 @@
292292
"targets": [
293293
{
294294
"editorMode": "code",
295-
"expr": "1000*(nv_llm_http_service_inter_token_latency_seconds_sum/nv_llm_http_service_inter_token_latency_seconds_count)",
295+
"expr": "1000*(dynamo_frontend_inter_token_latency_seconds_sum/dynamo_frontend_inter_token_latency_seconds_count)",
296296
"legendFormat": "{{model}}",
297297
"range": true,
298298
"refId": "A"
@@ -306,7 +306,7 @@
306306
"type": "prometheus",
307307
"uid": "P1809F7CD0C75ACF3"
308308
},
309-
"description": "nv_llm_http_service_request_duration (sum/count)",
309+
"description": "dynamo_frontend_request_duration (sum/count)",
310310
"fieldConfig": {
311311
"defaults": {
312312
"color": {
@@ -385,7 +385,7 @@
385385
"targets": [
386386
{
387387
"editorMode": "code",
388-
"expr": "1000*(nv_llm_http_service_request_duration_seconds_sum / nv_llm_http_service_request_duration_seconds_count)",
388+
"expr": "1000*(dynamo_frontend_request_duration_seconds_sum / dynamo_frontend_request_duration_seconds_count)",
389389
"legendFormat": "{{model}}",
390390
"range": true,
391391
"refId": "A"
@@ -399,7 +399,7 @@
399399
"type": "prometheus",
400400
"uid": "P1809F7CD0C75ACF3"
401401
},
402-
"description": "The length is the number of tokens. nv_llm_http_service_input_sequence_tokens",
402+
"description": "The length is the number of tokens. dynamo_frontend_input_sequence_tokens",
403403
"fieldConfig": {
404404
"defaults": {
405405
"color": {
@@ -478,7 +478,7 @@
478478
"targets": [
479479
{
480480
"editorMode": "code",
481-
"expr": "nv_llm_http_service_input_sequence_tokens_sum / nv_llm_http_service_input_sequence_tokens_count",
481+
"expr": "dynamo_frontend_input_sequence_tokens_sum / dynamo_frontend_input_sequence_tokens_count",
482482
"legendFormat": "ISL",
483483
"range": true,
484484
"refId": "A"
@@ -489,7 +489,7 @@
489489
"uid": "P1809F7CD0C75ACF3"
490490
},
491491
"editorMode": "code",
492-
"expr": "nv_llm_http_service_output_sequence_tokens_sum / nv_llm_http_service_output_sequence_tokens_count",
492+
"expr": "dynamo_frontend_output_sequence_tokens_sum / dynamo_frontend_output_sequence_tokens_count",
493493
"hide": false,
494494
"instant": false,
495495
"legendFormat": "OSL",

deploy/metrics/grafana_dashboards/grafana-llm-metrics.json

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -26,7 +26,13 @@
2626
"distributed under the License is distributed on an \"AS IS\" BASIS,",
2727
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
2828
"See the License for the specific language governing permissions and",
29-
"limitations under the License."
29+
"limitations under the License.",
30+
"",
31+
"DEPRECATION NOTICE:",
32+
"This dashboard uses deprecated llm_kv_* metrics (llm_kv_blocks_active, llm_kv_blocks_total, llm_kv_hit_rate_percent)",
33+
"that are part of the deprecated metrics aggregation service. These metrics will be removed in a future release.",
34+
"Please migrate to the new MetricsRegistry system which provides dynamo_* metrics instead.",
35+
"See docs/guides/metrics.md for migration guidance."
3036
],
3137
"editable": true,
3238
"fiscalYearStartMonth": 0,

deploy/metrics/prometheus.yml

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -47,6 +47,8 @@ scrape_configs:
4747
static_configs:
4848
- targets: ['host.docker.internal:8081']
4949

50+
# DEPRECATED: This metrics aggregation service is being deprecated in favor of MetricsRegistry
51+
# The new system uses the 'dynamo-backend' job above instead of this separate service
5052
# This is another demo aggregator that needs to be launched manually. See components/metrics/README.md
5153
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 9091/tcp
5254
- job_name: 'metrics-aggregation-service'

0 commit comments

Comments
 (0)