Skip to content

Conversation

yashykt
Copy link
Member

@yashykt yashykt commented Apr 3, 2025

No description provided.

grpc.client.call.retry_delay | Histogram | s | grpc.method (required), grpc.target (required) | Total time of delay while there is no active attempt during the client call.
Metric Name | Type | Unit | Labels | Description
------------------------------------ | --------- | ------------------- | ---------------------------------------------- | -----------
grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might need more than 5 as the upper bound here. When we implement estubs retries, there are modes where there is no limit on the number of retry attempts.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we end up changing the limit on retries, we can update the advice here.

Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.

Metric Name | Type | Unit | Labels | Description
------------------------------------ | --------- | ------------------- | ---------------------------------------------- | -----------
grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In principle, there is no upper bound on the number of transparent retry attempts if each attempt never actually goes out on the wire.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a 10 taking that in consideration. I think this should be enough. I'm not sure it makes sense to distinguish between cases where the number if larger than 10. (I'm willing to change the boundaries here if there are strong opinions. These are just recommendations anyway.)

------------------------------------ | --------- | ------------------- | ---------------------------------------------- | -----------
grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10].
grpc.client.call.hedges | Histogram | {hedge} | grpc.method (required), grpc.target (required) | Number of hedges during the client call. If there were no hedges, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In estubs, the upper bound for the number of hedged requests can be quite large, so I think we'll want to support larger numbers here too.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we end up changing the limit on retries, we can update the advice here.

Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.

Copy link
Member Author

@yashykt yashykt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we've talked about this before, but I don't understand how expontential histograms can possibly work for aggregation. And in any case, we don't support them in the non-per-call metrics APIs, so I don't see how we can recommend that here.

Exponential Histograms are just a type of histograms. Our APIs do not need anything more to support exponential histograms. The OpenTelemetry SDK user can simply provide the buckets to use when creating the histogram view.

expontential histograms can possibly work for aggregation

Aggregation for histograms, which in most cases is calculating percentiles (50% / 95% / 99 %) is always approximate since we no longer have the exact values.

grpc.client.call.retry_delay | Histogram | s | grpc.method (required), grpc.target (required) | Total time of delay while there is no active attempt during the client call.
Metric Name | Type | Unit | Labels | Description
------------------------------------ | --------- | ------------------- | ---------------------------------------------- | -----------
grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we end up changing the limit on retries, we can update the advice here.

Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.

Metric Name | Type | Unit | Labels | Description
------------------------------------ | --------- | ------------------- | ---------------------------------------------- | -----------
grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added a 10 taking that in consideration. I think this should be enough. I'm not sure it makes sense to distinguish between cases where the number if larger than 10. (I'm willing to change the boundaries here if there are strong opinions. These are just recommendations anyway.)

------------------------------------ | --------- | ------------------- | ---------------------------------------------- | -----------
grpc.client.call.retries | Histogram | {retry} | grpc.method (required), grpc.target (required) | Number of retries during the client call. If there were no retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
grpc.client.call.transparent_retries | Histogram | {transparent_retry} | grpc.method (required), grpc.target (required) | Number of transparent retries during the client call. If there were no transparent retries, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5,10].
grpc.client.call.hedges | Histogram | {hedge} | grpc.method (required), grpc.target (required) | Number of hedges during the client call. If there were no hedges, 0 is not reported. Recommended histogram bucket boundaries are [1,2,3,4,5].
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think if we end up changing the limit on retries, we can update the advice here.

Otherwise, if we've already made enough progress on the new changes, please let me know what the better boundaries would be.

@yashykt yashykt merged commit 67e6ea1 into grpc:master Jul 1, 2025
1 check passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants