-
Notifications
You must be signed in to change notification settings - Fork 101
Lang metrics export #544
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Lang metrics export #544
Conversation
| /// Implementors of this trait are expected to be defined in each language's bridge. | ||
| /// The implementor is responsible for the allocation/instantiation of new metric meters which | ||
| /// Core has requested. | ||
| pub trait CoreMeter: Send + Sync + Debug { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Most public API stuff is in this file
core-api/src/telemetry/metrics.rs
Outdated
| /// Core has requested. | ||
| pub trait CoreMeter: Send + Sync + Debug { | ||
| fn new_attributes(&self, attribs: MetricsAttributesOptions) -> MetricAttributes; | ||
| fn counter(&self, name: &str) -> Arc<dyn Counter>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any perf concerns with the cost of dyn for metrics? Is this dynamically dispatched for each metrics call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See other answer. Kinda need to to keep things sane. That's not the overhead I'm worried about so much as attribute passing.
core-api/src/telemetry/metrics.rs
Outdated
| /// Core has requested. | ||
| pub trait CoreMeter: Send + Sync + Debug { | ||
| fn new_attributes(&self, attribs: MetricsAttributesOptions) -> MetricAttributes; | ||
| fn counter(&self, name: &str) -> Arc<dyn Counter>; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you just return Counter here and make thread-safety the implementer's problem? I don't think we need to force Arc here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't. I actually had quite the struggle with the type system on trying to get this to be as little overhead as possible. Traits cannot return existential types (can't return impl Counter) yet. I would need to make the trait generic, or have an associated type, but that would infect the entire rest of the code with generics just for this purpose which would be no bueno.
Thus, these methods have to return some kind of pointer, so might as well be Arc here since Box is just going to get shoved in one anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only other option would make the instrument types also be enums, which is another spot where there can be a mis-match between the attribute type and the instrument type, which is sorta sad. But, given that can already happen, I'd probably be fine with enum-ing them too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Word, seems like a language limitation then. Definitely don't want generic "metric" enum with generic value setters or any of that. If you don't believe the cost of Arc + dyn would be more than, say, having counter_add(&self, name: &str, value: u64, attributes &MetricAttributes) on this trait directly (expecting lang to memoize counter based on name and maybe attributes), then no prob.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mean enum instrument, I mean enum counter (& histogram, gauge) which is either otel-in-core backed, or lang-backed.
The lang-backed version would essentially do what you suggest there, except it would store an internal u64 or something which would be used for a pre-hashed or indexed lookup lang side rather than hashing the name every time. (just like the attributes are)
| } | ||
|
|
||
| pub trait Counter: Send + Sync { | ||
| fn add(&self, value: u64, attributes: &MetricAttributes); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing attributes at metric record time can get a bit expensive for naive implementations that need to group by collective attributes. There is a cost associated with not pre-binding attributes. Can we consider an alteration in the API (even if the primary impl of that doesn't use it that way) that associates metric attributes with the metric instead of the individual value? And can you have with_new_attrs on each of these metrics instead? This way implementers can implement these metric impls having already created the counter with its attributes. And it also discourages users of these metrics from adding any attribute willy-nilly at record time.
The default OTel impl can just store these attrs with the counter and provide them at record time. I know that shifting Arc to be a caller problem and having you keep the attributes until record time is going to require you make your own struct wrapper for the default OTel impl of each metric instead of the easier impl Counter for metrics::Counter<u64>, but I think it makes for a cleaner general-purpose API.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Having thought about this, I now fear that the way the metrics are recorded may preclude this. While counter.with_new_attrs(attrs).add(123) is just as cheap as counter.add(123, attrs), if you are always calling with_new_attrs on every record time anyways then there's no value here. So this can be ignored.
Just know that many non-OTel implementations are required to group by these attributes before applying the value. But otherwise, this can be ignored.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this will work with the Otel JS metrics API: https://github.com/open-telemetry/opentelemetry-js/blob/b8b63083869fb5850fd4830ded0fc509a2cd0f0a/api/src/metrics/Metric.ts#L65
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'll lose the otel context but other than that everything else should work the way it is.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. This passing is actually the part I'm more concerned about perf wise. But, I forgot to write this in the description, you'll notice I make it so lang allocates attributes on it's side and then they can be re-used: https://github.com/temporalio/sdk-core/pull/544/files#diff-3d2660d932700d89b3484aac58d79baa90dc4ae978995144d0fa9dceb1e0ce1eR91
So new attributes will be created like that for things which change infrequently. Things which potentially change every call have to get passed anyway. The downside of course is lang has to maintain that cache, but it should be fine for it to be add-only, so pretty simple.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That lang controls creating this set (e.g. w/ pre-built hashing) does seem to help, so long as the Rust side doesn't commonly create this attribute set on each recording (here and there can make sense though).
Go Prometheus for instance takes the labels (what we call attributes here) and uses fnv1-a to hash them for grouping purposes, but is optimized to do that ahead of time if the caller does it ahead of time. And so they don't take labels at record time to discourage dynamic labelling.
I'm ok with this as is so long as core code tries its best within reason to reuse the attribute set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm missing how lang is supposed to maintain this cache, aren't most attributes defined core side?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When new attributes need to get defined, they are send over as strings -- and then the return value from that is an ID that is used to refer to them in the future. Attributes that aren't expected to be re-used may also be passed as strings on a per-call basis.
core-api/src/telemetry/metrics.rs
Outdated
| // TODO: Pass in? Hard to make work with trait | ||
| ctx: opentelemetry::Context::current(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just documenting that users can use thread-locals or thread contexts or whatever is enough I think.
bergundy
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM, I'll try to integrate this with TS when I get some non-versioning time.
Maybe let's leave this PR open until we have the first use case.
5710c0f to
ce24c9d
Compare
980edcd to
748343d
Compare
|
|
||
| /// Buffers [MetricEvent]s for periodic consumption by lang | ||
| #[derive(Debug)] | ||
| pub struct MetricsCallBuffer { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this visible outside the crate?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Indeed it is
|
Needs #544 to get merged first |
748343d to
e2773a0
Compare
| } | ||
|
|
||
| pub trait Gauge: Send + Sync { | ||
| // When referring to durations, this value is in millis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just crawling this PR again...is there a use of gauge for durations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think so at the moment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍 I vote we remove the comment and when we do have duration based gauges (I can't think of when) we can revisit how to expose.
| } | ||
|
|
||
| pub trait Histogram: Send + Sync { | ||
| // When referring to durations, this value is in millis |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a case where we use histograms for anything but durations? In the Go SDK, we were able to get away with three metric types: counter, gauge, and timer (which is histogram, but we called it timer and clearly used durations). I wonder if a similar thing would make sense here.
I am worried that people will want to know whether you are recording a duration or a numeric value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They very easily could be non-durations, and either way it's a histogram. The language is clear with "when" - I don't think this is an issue. At the end of the day it's still a histogram regardless of what unit you've chosen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But I think it's helpful for our users to know whether they're recording a duration or not. We can say "may be milliseconds or just some other number" but then they're gonna have some extra conditionals checking string metric name against a known list to know it's a duration. I think we should have a separate concept for duration-based histograms and non-duration-based histograms (and if there are only duration-based histograms ever used, we can wait until we need the non-duration-based to add it).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. There aren't a ton of other metrics APIs that do this. There are metric types, which generally correlate to how things are aggregated, and there are units. Confusing units with the method of aggregation is I think not an appropriate design. Or, perhaps more importantly, at the end of the day these are the aggregation methods we have available to us on the backend, so it makes sense to be clear about which one's getting used.
If we would like to accept units, sure, that's totally reasonable, but IMO it's a definite no-no to consider two metric types to be different simply because they are using different units.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we would like to accept units, sure, that's totally reasonable, but IMO it's a definite no-no to consider two metric types to be different simply because they are using different units.
👍 I will happily take units. I just figured separate traits was the best way to represent separate units instead of int + enum or something. I wasn't wanting different metric types on the downstream system (they both still become the same histograms). For example, with Prometheus in some languages people create either a "timer" or a "histogram", but the former is just sugar over the latter with the type as duration.
Any way you can let the user know the histogram is for a duration vs non-duration would be helpful to the potential implementer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYIW I feels like Prometheus encourage us to take the wrong direction, and we should not follow it's lead simply on the principle that "most engines out there" are doing it that way. Metric emitters should focus on what those values really means (ie to them), rather than how they will be stored or presented.
The fact is that some place in the code is either emitting time values, or something else, but in the end, the unit for that metric is unambiguous for the caller and should therefore also be clearly specified as part the "metric" contract. The data is a duration, expressed in ms. The fact that it will likely be presented as an histogram is of lesser significancy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYIW I feels like Prometheus encourage us to take the wrong direction, and we should not follow it's lead simply on the principle that "most engines out there" are doing it that way.
It's not about a specific backend, that was just an example. It's not taking me in a certain direction. Very simply, we should type our datatypes as specifically as is useful. Knowing that something is a duration and not just an integer is useful. Being able to translate durations separately from integers is useful.
I gave an example of a backend where that translation is useful. Surely we agree that knowing something is a duration is useful to a user? And if we agree on that, surely we agree a duration type (or enum or different trait or whatever) is a better way of knowing that than checking a predefined string literal?
the unit for that metric is unambiguous for the caller
Not true if someone has to == "milliseconds" in their code. That's pretty ambiguous. Hope we don't add "seconds" next. Or is it "Milliseconds" or "ms", ug, who knows, hopefully the docs are less ambiguous than our data typing.
The data is a duration, expressed in ms.
More like the data is an integer, expressed with an arbitrary string you must know to assume ms.
Having said all of that, if the team believes people need to compare against a string literal if they need to do anything with the duration value specifically, ok. But please document what the literal is that we use in core for this value. I know we're gonna have to document that lang side.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYIW I feels like Prometheus encourage us to take the wrong direction, and we should not follow it's lead simply on the principle that "most engines out there" are doing it that way.
It's not about a specific backend, that was just an example. It's not taking me in a certain direction. Very simply, we should type our datatypes as specifically as is useful. Knowing that something is a duration and not just an integer is useful. Being able to translate durations separately from integers is useful.
Hey, easy! We're pretty much saying the same 😄. What we have today is what I see as wrong, notably because it treats metrics data as raw numbers to be aggregated into some buckets, instead of focussing on the actual meaning of these values from the emitter's POV. And I think that mainly come from our Prometheus heritage.
the unit for that metric is unambiguous for the caller
Not true if someone has to == "milliseconds" in their code. That's pretty ambiguous. Hope we don't add "seconds" next. Or is it "Milliseconds" or "ms", ug, who knows, hopefully the docs are less ambiguous than our data typing.
Again, I'm saying the same. At any specific code location that emit metrics, the unit in which values are provided are very likely non-ambiguous. For example, a specific code line emitting a metric entry knows that the value it is pushing is a duration, expressed in milliseconds. But ambiguity begins immediately after that point. Different call sites may be emitting entries for the same metric... Are all of these call sites emitting values in the same unit? Supposing that the metric is getting presented as an histogram, are bucket boundaries defined using the correct unit? Is that unit clear to the end user?
All of these assume some form of API contract, which is purely implicit at the moment. With pluggable, lang-side metric exporters, this implicitness will only become more penalizing than before, especially considering that the target system may have different ways of aggregating/presenting values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we at least document all the string units we set
Core doesn't even set any units, right now, but I can add that to this PR.
Anyway, my main point here is it doesn't really make much sense to expose the unit type safety stuff at the Core level, because lang can do that, and the translation to calls into core is quite easy to get right. I agree of course that if users want to specify unit & associated value types in lang that that makes sense.
The only thing I could do in Core would maybe be something like accepting std::time::Duration for some kinda time-specific histogram, but even then I'm just immediately going to toss that out the window when I make the real metric call, and lang should've already done that with it's type.
IE: I don't think any of this should result in any change to this PR, besides specifying units & descriptions in Core's own metrics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The only thing I could do in Core would maybe be something like accepting std::time::Duration for some kinda time-specific histogram, but even then I'm just immediately going to toss that out the window when I make the real metric call,
But I, the language dev, am not going to toss that out the window. I am going to give it to the user's metric impl and they're gonna be happy I differentiated.
But we've discussed enough, string is good enough I suppose. Let's just document the string core uses for its durations so I can have a hardcoded conditional for stronger typing lang side.
|
I realized the underling library update seems to have imposed some naming changes on gauges too. I want to make sure those are resolved before I merged this. |
| pub description: Cow<'static, str>, | ||
| /// Unit information that will appear in metadata if the backend supports it | ||
| #[builder(setter(into), default = "\"\".into()")] | ||
| pub unit: Cow<'static, str>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we document the unit string core uses for its durations? We will have to document this lang side so that when users use durations or when users write a backend, they know to match (though lang side may be more strongly typed than Rust here I suspect for differing unit types, which is a weird statement).
88c1053 to
c4f88a3
Compare
This takes a crack at what it will look like to export metrics to lang.
DRAFT since it's still subject to change, though the existing implementation being backed by the new interface does work.