-
Notifications
You must be signed in to change notification settings - Fork 238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
A66: OpenTelemetry Metrics #380
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks like a great start!
Please let me know if you have any questions. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for getting started on this!
I was a little surprised that this proposal doesn't reference the existing OpenTelemetry specification for RPC metrics, especially since that doc includes a section just for gRPC. Fully acknowledging that the OTel doc is still experimental, could this proposal adopt/extend OTel's semantic conventions rather than starting from scratch? |
@akshayjshah There were a bunch of conversations about this with the OTel folks, and the understanding was that it would be hard for different RPC systems to export a single set of metrics given the different nuances that each protocol would have. For example, gRPC semantics make a difference between attempts and calls. There are also differences between the various metrics exporting compressed/uncompressed sizes. |
I'm poking through issues on
Every RPC (and REST) framework I'm aware of supports compression, so this doesn't strike me as gRPC-specific. Amending the OTel metrics schema to distinguish compressed from uncompressed sizes (and clarifying that protocol-specific framing shouldn't be counted) would be a nice upstream improvement. If gRPC's notion of "attempts" is unique, it seems reasonable to handle that as an additive set of metrics beyond what's in the OTel semantic conventions. Many of the metrics proposed here - e.g., call latency - do have direct analogs in other systems. If the gRPC team feels that the metrics schema designed by the OpenTelemetry committers isn't appropriate, it would be nice to explain why in the Background and Related Proposals sections of this document. As written, this proposal suggests (by omission) that there's no prior art for collecting OpenTelemetry metrics from gRPC. |
No, these were on video conferences unfortunately. I'll tag @jsuereth as the OTel Metrics Conventions spokesperson.
Thanks! Yeah, I'll update the rationale. I don't know why I skipped that part. |
We initially added authority as an attribute to servers, but the value add is not clear at the moment, and there is a potential of opening the servers to attack since the clients can send arbitrary authority values. (We could potentially build out an API where servers provide the ability to register which authorities should be accepted by the stats plugin, but we punt this for the future, for when we get a more clear idea as to the benefit here.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some comments.
A66-otel-stats.md
Outdated
* `grpc.target` : Canonicalized target URI used when creating gRPC Channel, | ||
e.g. "dns:///pubsub.googleapis.com:443", "xds:///helloworld-gke:8000". |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Did we decide this needs to have 3(+) slashes, or that it was fine to use the user's text if they leave out the authority, e.g. "dns:host:port"
, or that it doesn't matter?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see an authoritative document on this, but I think it'd be nice to use the same format across languages. Core uses the three slash format even without authority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The naming doc lists the DNS name format as dns:[//authority/]host[:port]
, which implies to me that the slashes should only be included if there is an authority.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, but it prefixes that with Most gRPC implementations support the following URI schemes
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is accepted by the library and what we consider "canonical" are two different things. Someone should make a gRFC for "canonical target URI strings" I guess
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I think that makes sense, or maybe this gRFC will be setting "precedence" for what canonical means
// provide an implementation of a MeterProvider. If the passed in Meter Provider | ||
// does not have the view configured for an individual metric turned on, the API | ||
// call in this component will create a default view for that metric. | ||
func DialOption(mo MetricsOptions) grpc.DialOption {} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DialOptionOptions
? 😆
Seriously though I think we'll want this to take a struct that contains MetricsOptions
instead so that we can include tracing options in the future. Or a name besides MetricsOptions
that includes tracing options as well. Probably just Options
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, I will scale this up to Options { metrics options, tracing options } (OpenCensus just had tracing options). Or should I change this and scale upward to not break users eventually (i.e. have an options struct with metrics options as a knob but no tracing options until tracing)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
to not break users eventually
Since we're not going to 1.0 it until after it's all done, it doesn't really matter, but it's probably a good idea to figure out how it's all going to look (what knobs we know we need for all features) before we start implementing.
Implements method attribute filtering as defined by the gRFC grpc/proposal#380
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Few minor comments / clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM modulo precedence nit
A66-otel-stats.md
Outdated
// Any implementation knobs (i.e. views, bounds) set in the passed in object | ||
// take precedence over the API calls from the interface in this component | ||
// (i.e. it will create default views for unset views). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: this precedence logic is only mentioned in this Go docstring. Can you please mention this somewhere in top level API documentation or a section earlier in the document?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This statement should actually be removed since we no longer API to override that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that there is a statement up above that says - Users of the gRPC OpenTelemetry plugin will use the OTel SDK's MeterProvider to [control the views](https://opentelemetry.io/docs/specs/otel/metrics/sdk/#view) and customize the metrics that will be exported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What? I still call out into API. I thought we needed to since these metrics are unconditionally recorded? That docstring doesn't mention API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, you use the OTel API, but there is no longer any knob to control which instrument gets created and recorded on. I think the statement "Any implementation knobs (i.e. views, bounds) set in the passed in object take precedence over the API calls from the interface in this component (i.e. it will create default views for unset views)." is just going to confuse people. The term "API" is pretty loaded in this context, and if we want to explain this we should probably mention the whole thing as stated above in the Language-Specific Details
section.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we want to mention it, instead of mentioning precedence, we should use the word "customize", same as the OTel documentation at https://opentelemetry.io/docs/specs/otel/metrics/sdk/#view. Let me know if you want to update the text here.
OpenTelemetry plugin. Overall, the APIs should have the following capabilities - | ||
|
||
* Allow installing multiple OpenTelemetry plugins. | ||
* Implementations must provide an option to set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why this mutate API? Would it be fine (e.g., Java) to just pass it to the constructor and then it's guaranteed to be set? I see that is the case in the Java section below, but that seems counter to this requirement.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that sounds fine. Whether it is a builder API or just passed through the constructor seems like detail. Do you prefer some different wording here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to keep in mind here is that when we later add tracing and logging support, it is conceivably possible that a user may want to enable tracing or logging without enabling metrics. In that case, the API should not force them to specify a MeterProvider
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Java takes in an OpenTelemetry
object on which you, I believe, you can choose whether to set MeterProvider or not, so yeah, both constructor and builder patterns work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, this is a single unified API for the various OTel components. Gotcha. I didn't piece that together.
I think the way earlier discussion of "don't stabilize the API until Metric and Tracing are both implemented" applies. But this does seem fair for now.
If targetFilter is not set, target is recorded as is. */ | ||
public OpenTelemetryBuilder targetFilter(Predicate<String> targetFilter); | ||
|
||
public OpenTelemetryModule build(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We won't want simple "build" to install globally (although it's not clear if that is the intention here). I'm fine with fixing up this API afterward, as long as others are. The sketch is fair, but the spellings will probably morph a bit.
(Similarly, I don't think we'll use the method name openTelemetry()
above.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds fine to me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I avoided using global in the build method, because buildAndRegisterGlobal
has different meaning wrt OpenTelemetry java. In OpenTelemetry it also sets the sdk instance invoking the instance as Global instance which I didn't want to inferred here as well.
I named method as openTelemetry()
since we are taking an openTelemetry instance. How does openTelemetrySdk()
or sdk()
sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great!
OpenTelemetry plugin. Overall, the APIs should have the following capabilities - | ||
|
||
* Allow installing multiple OpenTelemetry plugins. | ||
* Implementations must provide an option to set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to keep in mind here is that when we later add tracing and logging support, it is conceivably possible that a user may want to enable tracing or logging without enabling metrics. In that case, the API should not force them to specify a MeterProvider
.
Thanks for all the reviews! |
|
No description provided.