Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 3 additions & 3 deletions A78-grpc-metrics-wrr-pf-xds.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,9 +4,9 @@ A78: gRPC OTel Metrics for WRR, Pick First, and XdsClient
* Approver: @ejona86, @dfawley
* Status: {Draft, In Review, Ready for Implementation, Implemented}
* Implemented in: <language, ...>
* Last updated: 2024-09-24
* Last updated: 2025-07-01
* Discussion at: https://groups.google.com/g/grpc-io/c/A2Mqz8OMDys
* Updated by: [A88: xDS Data Error Handling](A88-xds-data-error-handling.md)
* Updated by: [A88: xDS Data Error Handling](A88-xds-data-error-handling.md), [A94: OTel metrics for Subchannels](A94-subchannel-otel-metrics.md)

## Abstract

Expand Down Expand Up @@ -103,7 +103,7 @@ The following metrics will be exported:
| grpc.lb.wrr.endpoint_weight_stale | Counter | {endpoint} | grpc.target, grpc.lb.locality | Number of endpoints from each scheduler update whose latest weight is older than the expiration period. |
| grpc.lb.wrr.endpoint_weights | Histogram | {weight} | grpc.target, grpc.lb.locality | Weight of each endpoint, recorded on every scheduler update. Endpoints without usable weights will be recorded as weight 0. |

### Pick First LB Policy
### [Outdated] Pick First LB Policy (Updated by [A94](A94-subchannel-otel-metrics.md))

The Pick First LB policy predates the gRFC process but was updated in
[A62]. We propose to add the following metrics to it.
Expand Down
151 changes: 151 additions & 0 deletions A94-subchannel-otel-metrics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,151 @@
## A94: OTel metrics for Subchannels

* Author(s): Yash Tibrewal (@yashykt)
* Approver: Mark Roth (@markdroth), Eric Anderson (@ejona86), Doug Fawley
(@dfawley)
* Status: Ready for Implementation
* Implemented in:
* Last updated: 2025-08-12
* Discussion at: https://groups.google.com/g/grpc-io/c/iMdK7r4E5tU

## Abstract

Introduce OpenTelemetry metrics for subchannels. These metrics will replace the
existing pick-first metrics.

## Background

In [A78], metrics for PickFirst load-balancing policy were proposed that provide
observability on disconnections for subchannels and connection attempts made for
those subchannels. These metrics do not currently contain information on the
reason for disconnection, the xds locality or the cluster information.

[A89] is a proposal to introduce a new optional label `grpc.lb.backend_service`
to client-side per-attempt metrics. This label has xds cluster information.

### Related Proposals:

* [A8]: Client-side Keepalive
* [A18]: TCP User Timeout
* [A61]: IPv4 and IPv6 Dualstack Backend Support
* [A66]: OpenTelemetry Metrics
* [A74]: xDS Config Tears
* [A78]: gRPC OTel Metrics for WRR, Pick First, and XdsClient
* [A79]: Non-per-call Metrics Architecture
* [A89]: Backend Service Metric Label
* [L62]: gRPC security level negotiation between call credentials and channels

[A8]: A8-client-side-keepalive.md
[A18]: A18-tcp-user-timeout.md
[A61]: A61-IPv4-IPv6-dualstack-backends.md
[A66]: A66-otel-stats.md
[A74]: A74-xds-config-tears.md
[A78]: A78-grpc-metrics-wrr-pf-xds.md
[A79]: A79-non-per-call-metrics-architecture.md
[A89]: A89-backend-service-metric-label.md
[L62]: L62-core-call-credential-security-level.md

## Proposal

Move the existing pick-first metrics to subchannel metrics
(`grpc.lb.pick_first.*` to `grpc.subchannel.*`) with the addition of optional
labels as shown below -

Metric Name | Type | Unit | Labels | Description
------------------------------------------------------------------------------------------------------ | -------------- | --------------- | -------------------------------------------------------------------------------------------------------------- | -----------
grpc.subchannel.disconnections (Old - grpc.lb.pick_first.disconnections) | Counter | {disconnection} | grpc.target, grpc.lb.backend_service (optional), grpc.lb.locality (optional), grpc.disconnect_error (optional) | Number of times the selected subchannel becomes disconnected.
grpc.subchannel.connection_attempts_succeeded (Old - grpc.lb.pick_first.connection_attempts_succeeded) | Counter | {attempt} | grpc.target, grpc.lb.backend_service (optional), grpc.lb.locality (optional) | Number of successful connection attempts.
grpc.subchannel.connection_attempts_failed (Old - grpc.lb.pick_first.connection_attempts_failed) | Counter | {attempt} | grpc.target, grpc.lb.backend_service (optional), grpc.lb.locality (optional) | Number of failed connection attempts.
grpc.subchannel.open_connections | UpDown Counter | {connection} | grpc.target, grpc.security_level (optional), grpc.lb.backend_service (optional), grpc.lb.locality (optional) | Number of open connections.

If we end up discarding connection attempts as we do with the “happy eyeballs”
algorithm (as per [A61]), we should not record the connection attempt or the
disconnection.

Implementations that have already implemented the pick-first metrics should give
enough time for users to transition to the new metrics. For example,
implementations should report both the old pick-first metrics and the new
subchannel metrics for 2 releases, and then remove the old pick-first metrics.

Label Name | Disposition | Description
----------------------- | ----------- | -----------
grpc.target | Required | Indicates the target of the gRPC channel (defined in [A66].)
grpc.lb.backend_service | Optional | The backend service to which the RPC was routed (defined in [A89].)
grpc.lb.locality | Optional | The locality to which the traffic is being sent. This will be set to the resolver attribute passed down from the weighted_target policy, or the empty string if the resolver attribute is unset (defined in [A78].)
grpc.disconnect_error | Optional | Reason for disconnection.
grpc.security_level | Optional | Denotes the security level of the connection. Allowed values - "none", "integrity_only" and "privacy_and_integrity".

The subchannel needs to be passed attributes with the values for the
`grpc.lb.backend_service` and `grpc.lb.locality` labels (defined in [A89] and
[A78] respectively). This implies that the subchannel will be recreated when
these attributes change. Since currently, only xDS is using these labels, the
attributes will be set for each endpoint or address by cds (post-[A74]) or
xds_cluster_resolver (pre-[A74]) LB policies.

List of allowed values for `grpc.disconnect_error` -

Error string | Description
-------------------- | -----------
GOAWAY <ERROR_CODE> | HTTP2 GOAWAY frame with error code for example (“GOAWAY NO_ERROR”, “GOAWAY PROTOCOL_ERROR”, “GOAWAY ENHANCE_YOUR_CALM”). The list of error codes is available in [RFC 9113](https://www.rfc-editor.org/rfc/rfc9113.html#name-error-codes).
subchannel shutdown | The subchannel was shutdown. This can happen due to reasons such as the parent channel shutting down, channel becoming idle, the load balancing policy changing due to a resolver update, or a change in list of endpoint addresses.
connection reset | Connection was reset (eg. ECONNRESET, WSAECONNERESET.)
connection timed out | Connection timed out (eg. ETIMEDOUT, WSAETIMEDOUT), also includes connections closed due to [A8]: gRPC keepalives.
connection aborted | Connection was aborted (eg. ECONNABORTED, WSAECONNABORTED.)
socket error | Any socket error not covered by “connection reset”, “connection timed out” and “connection aborted”. Implementations that are not able to differentiate between the different socket error codes should also use this.
unknown | Catch-all for all other reasons.

For a given connection, there can be multiple reasons reported to the subchannel
for disconnection. For example, a connection could have seen a GOAWAY frame with
`ENHANCE_YOUR_CALM` and then a socket error Broken Pipe. In such cases, the
first seen reason should be chosen, `GOAWAY ENHANCE_YOUR_CALM` in this case.

We might add more error cases to this in the future.

### Stability

As recommended by [A79], these metrics will start off as experimental, and hence
off-by-default. The decision on whether these metrics will be on-by-default or
off-by-default on de-experimentalization will be made at the same time as the
de-experimentalization.

## Rationale

### Renaming pick-first metrics

The existing pick-first metrics provides stats on subchannel disconnections and
connection attempts as viewed from the perspective of the pick-first lb policy.
[A61] made pick-first lb policy the universal leaf policy. For users unfamiliar
with this, it will come as a surprise when metrics for pick-first lb policy are
populated when round_robin lb policy is configured (for example). Additionally,
the pick-first metrics are defined from the perspective of the channel. This
means that if subchannels are shared between multiple channels (as is the case
for gRPC Core and its wrapped languages - C++, Python), we will double-count the
disconnections/connection attempts.

Renaming/moving the pick-first metrics to subchannel makes this more intuitive,
and fixes the double-counting problem.

### Metric for open connections

Moving the metrics down to subchannel potentially allows us to calculate the
number of open connections by subtracting `grpc.subchannel.disconnections` from
`grpc.subchannel.connection_attempts_succeeded`. This method does not work for
exporters recording counters per period in a way that does not allow for a
simple subtraction of the two counters
(https://github.com/grpc/grpc/issues/34886).

Adding an explicit metric that records the number of open connections avoids
this.

### Combining connection timeouts and keepalives into a single disconnection error

We expect most implementations of [A8] to also set the POSIX socket option
`TCP_USER_TIMEOUT` with the same timeout value as stated in [A18]. As such, in
cases where the connection is broken, the keepalive timeout will race with
sockets being closed due to `TCP_USER_TIMEOUT`. Since the motive of the two
timers is essentially the same, we choose to combine them into a single error,
instead of trying to differentiate between them.

## Implementation

TBD