Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion components/metrics/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -38,4 +38,4 @@ tracing = { workspace = true }
# TODO: Update axum to 0.8
axum = { version = "0.6" }
clap = { version = "4.5", features = ["derive", "env"] }
reqwest = { version = "0.12", default-features = false, features = ["json", "rustls-tls"] }
reqwest = { version = "0.12.22", default-features = false, features = ["json", "rustls-tls"] }
44 changes: 27 additions & 17 deletions components/metrics/README.md
Original file line number Diff line number Diff line change
@@ -1,13 +1,23 @@
# Metrics

The `metrics` component is a utility that can collect, aggregate, and publish
metrics from a Dynamo deployment. After collecting and aggregating metrics from
workers, it exposes them via an HTTP `/metrics` endpoint in Prometheus format
that other applications or visualization tools like Prometheus server and Grafana can
pull from.

**Note**: This is a demo implementation. The metrics component is currently under active development and this documentation will change as the implementation evolves.
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "nv_llm" (e.g., the HTTP `/metrics` endpoint will serve metrics with "nv_llm" prefixes)
⚠️ **DEPRECATION NOTICE** ⚠️

**This `metrics` component is unmaintained and being deprecated.**

The deprecated `metrics` component is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The `MetricsRegistry` provides:

**For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of this component.**

This component may be migrated to the MetricsRegistry in the future.

**📖 See the [Dynamo MetricsRegistry Guide](../../docs/guides/metrics.md) for detailed information on using the new metrics system.**

---

The deprecated `metrics` component is a utility for collecting, aggregating, and publishing metrics from a Dynamo deployment, but it is unmaintained and being deprecated in favor of `MetricsRegistry`.

**Note**: This is a demo implementation. The deprecated `metrics` component is no longer under active development.
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "dynamo" (e.g., the HTTP `/metrics` endpoint will serve metrics with "dynamo" prefixes)
- This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work

<div align="center">
Expand All @@ -16,7 +26,7 @@ pull from.

## Quickstart

To start the `metrics` component, simply point it at the `namespace/component/endpoint`
To start the deprecated `metrics` component, simply point it at the `namespace/component/endpoint`
trio for the Dynamo workers that you're interested in monitoring metrics on.

This will:
Expand Down Expand Up @@ -45,14 +55,14 @@ will get automatically discovered and the warnings will stop.

## Workers

The `metrics` component needs running workers to gather metrics from,
The deprecated `metrics` component needs running workers to gather metrics from,
so below are some examples of workers and how they can be monitored.

### Mock Worker

To try out how `metrics` works, there is a demo Rust-based
To try out how the deprecated `metrics` component works, there is a demo Rust-based
[mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms:
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from `metrics`) with randomly generated `ForwardPassMetrics` data
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from the deprecated `metrics` component) with randomly generated `ForwardPassMetrics` data
2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics

Step 1: Launch a mock workers via the following command (if already built):
Expand Down Expand Up @@ -99,11 +109,11 @@ docker compose -f deploy/docker-compose.yml --profile metrics up -d

## Metrics Collection Modes

The metrics component supports two modes for exposing metrics in a Prometheus format:
The deprecated `metrics` component supports two modes for exposing metrics in a Prometheus format:

### Pull Mode (Default)

When running in pull mode (the default), the metrics component will expose a
When running in pull mode (the default), the deprecated `metrics` component will expose a
Prometheus metrics endpoint on the specified host and port that a
Prometheus server or curl client can pull from:

Expand Down Expand Up @@ -136,7 +146,7 @@ curl localhost:9091/metrics
### Push Mode

For ephemeral or batch jobs, or when metrics need to be pushed through a firewall,
you can use Push mode. In this mode, the metrics component will periodically push
you can use Push mode. In this mode, the deprecated `metrics` component will periodically push
metrics to an externally hosted
[Prometheus PushGateway](https://prometheus.io/docs/instrumenting/pushing/):

Expand All @@ -145,7 +155,7 @@ Start a prometheus push gateway service via docker:
docker run --rm -d -p 9091:9091 --name pushgateway prom/pushgateway
```

Start the metrics component in `--push` mode, specifying the host and port of your PushGateway:
Start the deprecated `metrics` component in `--push` mode, specifying the host and port of your PushGateway:
```bash
# Push metrics to a Prometheus PushGateway every --push-interval seconds
metrics \
Expand Down Expand Up @@ -173,7 +183,7 @@ curl 127.0.0.1:9091/metrics
```
## Building/Running from Source

For easy iteration while making edits to the metrics component, you can use `cargo run`
For easy iteration while making edits to the deprecated `metrics` component, you can use `cargo run`
to build and run with your local changes:

```bash
Expand Down
17 changes: 9 additions & 8 deletions components/planner/src/dynamo/planner/utils/prometheus.py
Original file line number Diff line number Diff line change
Expand Up @@ -35,15 +35,16 @@ def _get_average_metric(
increase(metric_sum[interval])/increase(metric_count[interval])

Args:
metric_name: Base metric name (e.g., 'nv_llm_http_service_inter_token_latency_seconds')
metric_name: Base metric name (e.g., 'inter_token_latency_seconds')
interval: Time interval for the query (e.g., '60s')
operation_name: Human-readable name for error logging

Returns:
Average metric value or 0 if no data/error
"""
try:
query = f"increase({metric_name}_sum[{interval}])/increase({metric_name}_count[{interval}])"
full_metric_name = f"dynamo_frontend_{metric_name}"
query = f"increase({full_metric_name}_sum[{interval}])/increase({full_metric_name}_count[{interval}])"
result = self.prom.custom_query(query=query)
if not result:
# No data available yet (no requests made) - return 0 silently
Expand All @@ -55,21 +56,21 @@ def _get_average_metric(

def get_avg_inter_token_latency(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_inter_token_latency_seconds",
"inter_token_latency_seconds",
interval,
"avg inter token latency",
)

def get_avg_time_to_first_token(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_time_to_first_token_seconds",
"time_to_first_token_seconds",
interval,
"avg time to first token",
)

def get_avg_request_duration(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_request_duration_seconds",
"request_duration_seconds",
interval,
"avg request duration",
)
Expand All @@ -78,7 +79,7 @@ def get_avg_request_count(self, interval: str):
# This function follows a different query pattern than the other metrics
try:
raw_res = self.prom.custom_query(
query=f"increase(nv_llm_http_service_requests_total[{interval}])"
query=f"increase(dynamo_frontend_requests_total[{interval}])"
)
total_count = 0.0
for res in raw_res:
Expand All @@ -91,14 +92,14 @@ def get_avg_request_count(self, interval: str):

def get_avg_input_sequence_tokens(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_input_sequence_tokens",
"input_sequence_tokens",
interval,
"avg input sequence tokens",
)

def get_avg_output_sequence_tokens(self, interval: str):
return self._get_average_metric(
"nv_llm_http_service_output_sequence_tokens",
"output_sequence_tokens",
interval,
"avg output sequence tokens",
)
17 changes: 10 additions & 7 deletions deploy/metrics/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -60,7 +60,7 @@ As of Q2 2025, Dynamo HTTP Frontend metrics are exposed when you build container

- Start the [components/metrics](../../components/metrics/README.md) application to begin monitoring for metric events from dynamo workers and aggregating them on a Prometheus metrics endpoint: `http://localhost:9091/metrics`.
- Uncomment the appropriate lines in prometheus.yml to poll port 9091.
- Start worker(s) that publishes KV Cache metrics: [examples/rust/service_metrics/bin/server](../../lib/runtime/examples/service_metrics/README.md)` can populate dummy KV Cache metrics.
- Start worker(s) that publishes KV Cache metrics: [lib/runtime/examples/service_metrics/README.md](../../lib/runtime/examples/service_metrics/README.md) can populate dummy KV Cache metrics.


## Configuration
Expand Down Expand Up @@ -95,16 +95,19 @@ The following configuration files should be present in this directory:
- [grafana_dashboards/grafana-dcgm-metrics.json](./grafana_dashboards/grafana-dcgm-metrics.json): Contains Grafana dashboard configuration for DCGM GPU metrics
- [grafana_dashboards/grafana-llm-metrics.json](./grafana_dashboards/grafana-llm-metrics.json): This file, which is being phased out, contains the Grafana dashboard configuration for LLM-specific metrics. It requires an additional `metrics` component to operate concurrently. A new version is under development.

## Running the example `metrics` component
## Running the deprecated `metrics` component

IMPORTANT: This section is being phased out, and some metrics may not function as expected. A new solution is under development.
⚠️ **DEPRECATION NOTICE** ⚠️

When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the followings (defined in [../../components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):
- `llm_requests_active_slots`: Number of currently active request slots per worker
When you run the example [components/metrics](../../components/metrics/README.md) component, it exposes a Prometheus /metrics endpoint with the following metrics (defined in [components/metrics/src/lib.rs](../../components/metrics/src/lib.rs)):

**⚠️ The following `llm_kv_*` metrics are deprecated:**

- `llm_requests_active_slots`: Active request slots per worker
- `llm_requests_total_slots`: Total available request slots per worker
- `llm_kv_blocks_active`: Number of active KV blocks per worker
- `llm_kv_blocks_active`: Active KV blocks per worker
- `llm_kv_blocks_total`: Total KV blocks available per worker
- `llm_kv_hit_rate_percent`: Cumulative KV Cache hit percent per worker
- `llm_kv_hit_rate_percent`: KV Cache hit percent per worker
- `llm_load_avg`: Average load across workers
- `llm_load_std`: Load standard deviation across workers

Expand Down
22 changes: 11 additions & 11 deletions deploy/metrics/grafana_dashboards/grafana-dynamo-dashboard.json
Original file line number Diff line number Diff line change
Expand Up @@ -27,7 +27,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_requests_total (1m)",
"description": "dynamo_frontend_requests_total (1m)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -106,7 +106,7 @@
"targets": [
{
"editorMode": "code",
"expr": "rate(nv_llm_http_service_requests_total[30s])",
"expr": "rate(dynamo_frontend_requests_total[30s])",
"legendFormat": "{{request_type}}, {{status}},",
"range": true,
"refId": "A"
Expand All @@ -120,7 +120,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_time_to_first_token_seconds (sum/count)",
"description": "dynamo_frontend_time_to_first_token_seconds (sum/count)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -199,7 +199,7 @@
"targets": [
{
"editorMode": "code",
"expr": "1000*(nv_llm_http_service_time_to_first_token_seconds_sum/nv_llm_http_service_time_to_first_token_seconds_count)",
"expr": "1000*(dynamo_frontend_time_to_first_token_seconds_sum/dynamo_frontend_time_to_first_token_seconds_count)",
"legendFormat": "{{model}}",
"range": true,
"refId": "A"
Expand All @@ -213,7 +213,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_inter_token_latency_seconds (sum/count)",
"description": "dynamo_frontend_inter_token_latency_seconds (sum/count)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -292,7 +292,7 @@
"targets": [
{
"editorMode": "code",
"expr": "1000*(nv_llm_http_service_inter_token_latency_seconds_sum/nv_llm_http_service_inter_token_latency_seconds_count)",
"expr": "1000*(dynamo_frontend_inter_token_latency_seconds_sum/dynamo_frontend_inter_token_latency_seconds_count)",
"legendFormat": "{{model}}",
"range": true,
"refId": "A"
Expand All @@ -306,7 +306,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "nv_llm_http_service_request_duration (sum/count)",
"description": "dynamo_frontend_request_duration (sum/count)",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -385,7 +385,7 @@
"targets": [
{
"editorMode": "code",
"expr": "1000*(nv_llm_http_service_request_duration_seconds_sum / nv_llm_http_service_request_duration_seconds_count)",
"expr": "1000*(dynamo_frontend_request_duration_seconds_sum / dynamo_frontend_request_duration_seconds_count)",
"legendFormat": "{{model}}",
"range": true,
"refId": "A"
Expand All @@ -399,7 +399,7 @@
"type": "prometheus",
"uid": "P1809F7CD0C75ACF3"
},
"description": "The length is the number of tokens. nv_llm_http_service_input_sequence_tokens",
"description": "The length is the number of tokens. dynamo_frontend_input_sequence_tokens",
"fieldConfig": {
"defaults": {
"color": {
Expand Down Expand Up @@ -478,7 +478,7 @@
"targets": [
{
"editorMode": "code",
"expr": "nv_llm_http_service_input_sequence_tokens_sum / nv_llm_http_service_input_sequence_tokens_count",
"expr": "dynamo_frontend_input_sequence_tokens_sum / dynamo_frontend_input_sequence_tokens_count",
"legendFormat": "ISL",
"range": true,
"refId": "A"
Expand All @@ -489,7 +489,7 @@
"uid": "P1809F7CD0C75ACF3"
},
"editorMode": "code",
"expr": "nv_llm_http_service_output_sequence_tokens_sum / nv_llm_http_service_output_sequence_tokens_count",
"expr": "dynamo_frontend_output_sequence_tokens_sum / dynamo_frontend_output_sequence_tokens_count",
"hide": false,
"instant": false,
"legendFormat": "OSL",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -26,7 +26,13 @@
"distributed under the License is distributed on an \"AS IS\" BASIS,",
"WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.",
"See the License for the specific language governing permissions and",
"limitations under the License."
"limitations under the License.",
"",
"DEPRECATION NOTICE:",
"This dashboard uses deprecated llm_kv_* metrics (llm_kv_blocks_active, llm_kv_blocks_total, llm_kv_hit_rate_percent)",
"that are part of the deprecated metrics aggregation service. These metrics will be removed in a future release.",
"Please migrate to the new MetricsRegistry system which provides dynamo_* metrics instead.",
"See docs/guides/metrics.md for migration guidance."
],
"editable": true,
"fiscalYearStartMonth": 0,
Expand Down
2 changes: 2 additions & 0 deletions deploy/metrics/prometheus.yml
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ scrape_configs:
static_configs:
- targets: ['host.docker.internal:8081']

# DEPRECATED: This metrics aggregation service is being deprecated in favor of MetricsRegistry
# The new system uses the 'dynamo-backend' job above instead of this separate service
# This is another demo aggregator that needs to be launched manually. See components/metrics/README.md
# Note that you may need to disable the firewall on your host. On Ubuntu: sudo ufw allow 9091/tcp
- job_name: 'metrics-aggregation-service'
Expand Down
Loading
Loading