|
1 | 1 | # Metrics Visualization with Prometheus and Grafana |
2 | 2 |
|
3 | | -⚠️ **DEPRECATION NOTICE** ⚠️ |
4 | | - |
5 | | -**The `metrics-aggregation-service` (port 9091) is being deprecated and will be removed in a future release.** |
6 | | - |
7 | | -The metrics aggregation service is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The new system provides: |
8 | | - |
9 | | -- **Built-in Prometheus HTTP endpoint** accessible via `DYN_SYSTEM_ENABLED=true` and `DYN_SYSTEM_PORT=<port>` (default: 8081) |
10 | | -- **Automatic metric registration** when creating metrics via endpoint factory methods |
11 | | -- **Automatic labeling** with namespace, component, and endpoint information |
12 | | -- **Simplified deployment** - no separate metrics component required |
13 | | - |
14 | | -**For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of the metrics aggregation service.** |
15 | | - |
16 | | -The Prometheus configuration in this directory has been updated to scrape from the new `dynamo-backend` job (port 8081) instead of the deprecated `metrics-aggregation-service` (port 9091). |
17 | | - |
18 | | - |
19 | | - |
20 | | ---- |
21 | | - |
22 | 3 | This directory contains configuration for visualizing metrics from the metrics aggregation service using Prometheus and Grafana. |
23 | 4 |
|
24 | 5 | ## Components |
@@ -79,7 +60,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container |
79 | 60 |
|
80 | 61 | - Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`. |
81 | 62 | - Uncomment the appropriate lines in prometheus.yml to poll port 9091. |
82 | | - - Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics. |
| 63 | + - Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics. |
83 | 64 |
|
84 | 65 |
|
85 | 66 | ## Configuration |
@@ -114,25 +95,22 @@ The following configuration files should be present in this directory: |
114 | 95 | - [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics |
115 | 96 | - [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development. |
116 | 97 |
|
117 | | -## Running the example `metrics` component |
| 98 | +## Running the deprecated `metrics` component |
118 | 99 |
|
119 | | -IMPORTANT: This section is being phased out, and some metrics may not function as expected. A new solution is under development. |
| 100 | +⚠️ **DEPRECATION NOTICE** ⚠️ |
120 | 101 |
|
121 | | -⚠️ **DEPRECATED METRICS NOTICE** ⚠️ |
| 102 | +When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)): |
122 | 103 |
|
123 | | -**The following `llm_kv_*` metrics are deprecated and will be removed in a future release:** |
| 104 | +**⚠️ The following `llm_kv_*` metrics are deprecated:** |
124 | 105 |
|
125 | | -When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the followings (defined in [../../components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)): |
126 | | -- `llm_requests_active_slots`: Number of currently active request slots per worker |
| 106 | +- `llm_requests_active_slots`: Active request slots per worker |
127 | 107 | - `llm_requests_total_slots`: Total available request slots per worker |
128 | | -- `llm_kv_blocks_active`: Number of active KV blocks per worker ⚠️ **DEPRECATED** |
129 | | -- `llm_kv_blocks_total`: Total KV blocks available per worker ⚠️ **DEPRECATED** |
130 | | -- `llm_kv_hit_rate_percent`: Cumulative KV Cache hit percent per worker ⚠️ **DEPRECATED** |
| 108 | +- `llm_kv_blocks_active`: Active KV blocks per worker |
| 109 | +- `llm_kv_blocks_total`: Total KV blocks available per worker |
| 110 | +- `llm_kv_hit_rate_percent`: KV Cache hit percent per worker |
131 | 111 | - `llm_load_avg`: Average load across workers |
132 | 112 | - `llm_load_std`: Load standard deviation across workers |
133 | 113 |
|
134 | | -**These `llm_kv_*` metrics are being replaced by the new `dynamo_*` metrics from the MetricsRegistry system. Please migrate to the new system.** |
135 | | - |
136 | 114 | ## Troubleshooting |
137 | 115 |
|
138 | 116 | 1. Verify services are running: |
|
0 commit comments