Skip to content

Commit c38b756

Browse files
committed
Update the dynamo_component_ prefix, and add deprecation message to component/metrics
1 parent 0cbcb99 commit c38b756

File tree

4 files changed

+179
-167
lines changed

4 files changed

+179
-167
lines changed

components/metrics/README.md

Lines changed: 12 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -4,22 +4,17 @@
44

55
**This `metrics` component is being deprecated and will be removed in a future release.**
66

7-
The `metrics` component is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The `MetricsRegistry` provides:
8-
9-
- **Automatic metric registration** when creating metrics via endpoint factory methods
10-
- **Built-in Prometheus HTTP endpoint** accessible via `DYN_SYSTEM_ENABLED=true` and `DYN_SYSTEM_PORT=<port>`
11-
- **Automatic labeling** with namespace, component, and endpoint information
12-
- **Simplified API** that eliminates the need for manual Prometheus setup
7+
The deprecated `metrics` component is being replaced by the **`MetricsRegistry`** built-in functionality that is now available directly in the `DistributedRuntime` framework. The `MetricsRegistry` provides:
138

149
**For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of this component.**
1510

1611
See the [Dynamo MetricsRegistry Guide](../../docs/guides/metrics.md) for detailed information on using the new metrics system.
1712

1813
---
1914

20-
The `metrics` component is a utility for collecting, aggregating, and publishing metrics from a Dynamo deployment, but it is being deprecated and replaced by `MetricsRegistry`.
15+
The deprecated `metrics` component is a utility for collecting, aggregating, and publishing metrics from a Dynamo deployment, but it is being deprecated and replaced by `MetricsRegistry`.
2116

22-
**Note**: This is a demo implementation. The metrics component is currently under active development and this documentation will change as the implementation evolves.
17+
**Note**: This is a demo implementation. The deprecated `metrics` component is currently under active development and this documentation will change as the implementation evolves.
2318
- In this demo the metrics names use the prefix "llm", but in production they will be prefixed with "dynamo" (e.g., the HTTP `/metrics` endpoint will serve metrics with "dynamo" prefixes)
2419
- This demo will only work when using examples/llm/configs/agg.yml-- other configurations will not work
2520

@@ -29,7 +24,7 @@ The `metrics` component is a utility for collecting, aggregating, and publishing
2924

3025
## Quickstart
3126

32-
To start the `metrics` component, simply point it at the `namespace/component/endpoint`
27+
To start the deprecated `metrics` component, simply point it at the `namespace/component/endpoint`
3328
trio for the Dynamo workers that you're interested in monitoring metrics on.
3429

3530
This will:
@@ -58,14 +53,14 @@ will get automatically discovered and the warnings will stop.
5853

5954
## Workers
6055

61-
The `metrics` component needs running workers to gather metrics from,
56+
The deprecated `metrics` component needs running workers to gather metrics from,
6257
so below are some examples of workers and how they can be monitored.
6358

6459
### Mock Worker
6560

66-
To try out how `metrics` works, there is a demo Rust-based
61+
To try out how the deprecated `metrics` component works, there is a demo Rust-based
6762
[mock worker](src/bin/mock_worker.rs) that provides sample data through two mechanisms:
68-
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from `metrics`) with randomly generated `ForwardPassMetrics` data
63+
1. Exposes a stats handler at `dynamo/MyComponent/my_endpoint` that responds to polling requests (from the deprecated `metrics` component) with randomly generated `ForwardPassMetrics` data
6964
2. Publishes mock `KVHitRateEvent` data every second to demonstrate event-based metrics
7065

7166
Step 1: Launch a mock workers via the following command (if already built):
@@ -112,11 +107,11 @@ docker compose -f deploy/docker-compose.yml --profile metrics up -d
112107

113108
## Metrics Collection Modes
114109

115-
The metrics component supports two modes for exposing metrics in a Prometheus format:
110+
The deprecated `metrics` component supports two modes for exposing metrics in a Prometheus format:
116111

117112
### Pull Mode (Default)
118113

119-
When running in pull mode (the default), the metrics component will expose a
114+
When running in pull mode (the default), the deprecated `metrics` component will expose a
120115
Prometheus metrics endpoint on the specified host and port that a
121116
Prometheus server or curl client can pull from:
122117

@@ -149,7 +144,7 @@ curl localhost:9091/metrics
149144
### Push Mode
150145

151146
For ephemeral or batch jobs, or when metrics need to be pushed through a firewall,
152-
you can use Push mode. In this mode, the metrics component will periodically push
147+
you can use Push mode. In this mode, the deprecated `metrics` component will periodically push
153148
metrics to an externally hosted
154149
[Prometheus PushGateway](https://prometheus.io/docs/instrumenting/pushing/):
155150

@@ -158,7 +153,7 @@ Start a prometheus push gateway service via docker:
158153
docker run --rm -d -p 9091:9091 --name pushgateway prom/pushgateway
159154
```
160155

161-
Start the metrics component in `--push` mode, specifying the host and port of your PushGateway:
156+
Start the deprecated `metrics` component in `--push` mode, specifying the host and port of your PushGateway:
162157
```bash
163158
# Push metrics to a Prometheus PushGateway every --push-interval seconds
164159
metrics \
@@ -186,7 +181,7 @@ curl 127.0.0.1:9091/metrics
186181
```
187182
## Building/Running from Source
188183

189-
For easy iteration while making edits to the metrics component, you can use `cargo run`
184+
For easy iteration while making edits to the deprecated `metrics` component, you can use `cargo run`
190185
to build and run with your local changes:
191186

192187
```bash

deploy/metrics/README.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ The metrics aggregation service is being replaced by the **`MetricsRegistry`** b
88

99
- **Built-in Prometheus HTTP endpoint** accessible via `DYN_SYSTEM_ENABLED=true` and `DYN_SYSTEM_PORT=<port>` (default: 8081)
1010
- **Automatic metric registration** when creating metrics via endpoint factory methods
11-
- **Automatic labeling** with namespace, component, and endpoint information
11+
- **Automatic prefix and labeling** with `dynamo_component_` name prefix, as well as auto labels: dynamo_namespace, dynamo_component, and dynamo_endpoint information. These labels are prefixed to avoid Kubernetes label collisions.
1212
- **Simplified deployment** - no separate metrics component required
1313

1414
**For new projects and existing deployments, please migrate to using `MetricsRegistry` instead of the metrics aggregation service.**

lib/runtime/examples/system_metrics/README.md

Lines changed: 53 additions & 53 deletions
Original file line numberDiff line numberDiff line change
@@ -29,7 +29,7 @@ let ingress = Ingress::for_engine(my_handler)?;
2929
ingress.add_metrics(&endpoint)?;
3030
```
3131

32-
The endpoint automatically provides proper labeling (namespace, component, endpoint) for all metrics.
32+
The endpoint automatically provides prefix of `dynamo_component_*` in the name, and automatic labeling (dynamo_namespace, dynamo_component, dynamo_endpoint) for all metrics. These labels are prefixed with "dynamo_" to avoid collisions with Kubernetes and other monitoring system labels.
3333

3434
## Available Methods
3535

@@ -44,13 +44,13 @@ The `Ingress` struct provides methods for metrics:
4444
The following Prometheus metrics are automatically created for all work handlers:
4545

4646
### Counters
47-
- `requests_total` - Total requests processed
48-
- `request_bytes_total` - Total bytes received in requests
49-
- `response_bytes_total` - Total bytes sent in responses
50-
- `errors_total` - Total errors encountered (with error_type labels)
47+
- `dynamo_component_requests_total` - Total requests processed
48+
- `dynamo_component_request_bytes_total` - Total bytes received in requests
49+
- `dynamo_component_response_bytes_total` - Total bytes sent in responses
50+
- `dynamo_component_errors_total` - Total errors encountered (with error_type labels)
5151

5252
### Error Types
53-
The `errors_total` metric includes the following error types:
53+
The `dynamo_component_errors_total` metric includes the following error types:
5454
- `deserialization` - Errors parsing request messages
5555
- `invalid_message` - Unexpected message format
5656
- `response_stream` - Errors creating response streams
@@ -59,65 +59,65 @@ The `errors_total` metric includes the following error types:
5959
- `publish_final` - Errors publishing final response
6060

6161
### Histograms
62-
- `request_duration_seconds` - Request processing time
62+
- `dynamo_component_request_duration_seconds` - Request processing time
6363

6464
### Gauges
65-
- `concurrent_requests` - Number of requests currently being processed
65+
- `dynamo_component_concurrent_requests` - Number of requests currently being processed
6666

6767
### Custom Metrics (Optional)
68-
- `my_custom_bytes_processed_total` - Total data bytes processed by system handler (example)
68+
- `dynamo_component_my_custom_bytes_processed_total` - Total data bytes processed by system handler (example)
6969

7070
### Labels
7171
All metrics automatically include these labels from the endpoint:
72-
- `namespace` - The namespace name
73-
- `component` - The component name
74-
- `endpoint` - The endpoint name
72+
- `dynamo_namespace` - The namespace name
73+
- `dynamo_component` - The component name
74+
- `dynamo_endpoint` - The endpoint name
7575

7676
## Example Metrics Output
7777

7878
When the system is running, you'll see metrics from the /metrics HTTP path like this:
7979

8080
```prometheus
81-
# HELP concurrent_requests Number of requests currently being processed by work handler
82-
# TYPE concurrent_requests gauge
83-
concurrent_requests{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace"} 0
84-
85-
# HELP my_custom_bytes_processed_total Example of a custom metric. Total number of data bytes processed by system handler
86-
# TYPE my_custom_bytes_processed_total counter
87-
my_custom_bytes_processed_total{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace"} 42
88-
89-
# HELP request_bytes_total Total number of bytes received in requests by work handler
90-
# TYPE request_bytes_total counter
91-
request_bytes_total{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace"} 1098
92-
93-
# HELP request_duration_seconds Time spent processing requests by work handler
94-
# TYPE request_duration_seconds histogram
95-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="0.005"} 3
96-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="0.01"} 3
97-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="0.025"} 3
98-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="0.05"} 3
99-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="0.1"} 3
100-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="0.25"} 3
101-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="0.5"} 3
102-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="1"} 3
103-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="2.5"} 3
104-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="5"} 3
105-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="10"} 3
106-
request_duration_seconds_bucket{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace",le="+Inf"} 3
107-
request_duration_seconds_sum{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace"} 0.00048793700000000003
108-
request_duration_seconds_count{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace"} 3
109-
110-
# HELP requests_total Total number of requests processed by work handler
111-
# TYPE requests_total counter
112-
requests_total{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace"} 3
113-
114-
# HELP response_bytes_total Total number of bytes sent in responses by work handler
115-
# TYPE response_bytes_total counter
116-
response_bytes_total{component="dyn_example_component",endpoint="dyn_example_endpoint9881",namespace="dyn_example_namespace"} 1917
117-
118-
# HELP uptime_seconds Total uptime of the DistributedRuntime in seconds
119-
# TYPE uptime_seconds gauge
120-
uptime_seconds{namespace="http_server"} 1.8226759879999999
81+
# HELP dynamo_component_concurrent_requests Number of requests currently being processed by work handler
82+
# TYPE dynamo_component_concurrent_requests gauge
83+
dynamo_component_concurrent_requests{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 0
84+
85+
# HELP dynamo_component_my_custom_bytes_processed_total Example of a custom metric. Total number of data bytes processed by system handler
86+
# TYPE dynamo_component_my_custom_bytes_processed_total counter
87+
dynamo_component_my_custom_bytes_processed_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 42
88+
89+
# HELP dynamo_component_request_bytes_total Total number of bytes received in requests by work handler
90+
# TYPE dynamo_component_request_bytes_total counter
91+
dynamo_component_request_bytes_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 1098
92+
93+
# HELP dynamo_component_request_duration_seconds Time spent processing requests by work handler
94+
# TYPE dynamo_component_request_duration_seconds histogram
95+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.005"} 3
96+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.01"} 3
97+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.025"} 3
98+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.05"} 3
99+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.1"} 3
100+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.25"} 3
101+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="0.5"} 3
102+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="1"} 3
103+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="2.5"} 3
104+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="5"} 3
105+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="10"} 3
106+
dynamo_component_request_duration_seconds_bucket{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace",le="+Inf"} 3
107+
dynamo_component_request_duration_seconds_sum{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 0.00048793700000000003
108+
dynamo_component_request_duration_seconds_count{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 3
109+
110+
# HELP dynamo_component_requests_total Total number of requests processed by work handler
111+
# TYPE dynamo_component_requests_total counter
112+
dynamo_component_requests_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 3
113+
114+
# HELP dynamo_component_response_bytes_total Total number of bytes sent in responses by work handler
115+
# TYPE dynamo_component_response_bytes_total counter
116+
dynamo_component_response_bytes_total{dynamo_component="example_component",dynamo_endpoint="example_endpoint9881",dynamo_namespace="example_namespace"} 1917
117+
118+
# HELP dynamo_component_uptime_seconds Total uptime of the DistributedRuntime in seconds
119+
# TYPE dynamo_component_uptime_seconds gauge
120+
dynamo_component_uptime_seconds{dynamo_namespace="http_server"} 1.8226759879999999
121121
```
122122

123123
## Examples
@@ -211,7 +211,7 @@ Once running, you can query the metrics:
211211
curl http://localhost:8081/metrics | grep -E "(requests_total|request_bytes_total|response_bytes_total|errors_total|request_duration_seconds|concurrent_requests)"
212212

213213
# Get request count for specific endpoint
214-
curl http://localhost:8081/metrics | grep 'requests_total{endpoint="dyn_example_endpoint"}'
214+
curl http://localhost:8081/metrics | grep 'requests_total{endpoint="example_endpoint"}'
215215

216216
# Get request duration histogram
217217
curl http://localhost:8081/metrics | grep 'request_duration_seconds'

0 commit comments

Comments
 (0)