-
Notifications
You must be signed in to change notification settings - Fork 164
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add metrics view API #89
Conversation
I will do my best to make the metrics sig meeting on the 11th where this might be easier to discuss, but my first issue with this proposal is simply allowing Having an explicit Counter instrument type allows for optimizations that are lost if the Counter has to be recorded the same as a Measure, where within a collection interval all values are recorded and then aggregated. If the user wants this I'd expect them to use a Measure and if they want a Sum they'd use a Counter, something that can rely on high performance counters on some runtimes. |
The values should actually be continuously aggregated, so we update the aggregate sum with each new measurement. Other aggregations would work this way too, e.g. a histogram aggregation would update a bucket count for each new recorded value, and wouldn't keep a reference to the measurement. Only the "exact" aggregation would store measurements. It is odd that the view API lets you register a non-sum aggregation for a counter instrument though. What do you recommend here? |
I would just remove the non-sum aggregation from the view API for counter instrument. |
|
||
The potentially contentious changes this OTEP proposes include: | ||
|
||
- Require views to be registered: don't record measurements from API for metric instruments that don't appear in any view. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If views are registered by the operator/application developer, does this mean that that person will get no metric data if they don't add additional configuration? How can they know a priori what metrics they they need to register views for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The way I've done it is each library exports a function with predefined views. In the simplest case the application developer either enables them or not, while still having the ability to instead define their own, or take a subset of the default views a library exports.
So the operator/application developer has to allow metrics from each library, instead of them being automatic and having to have a way to disable what you don't want.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm now thinking of the auto-instrumentation projects, where there may be dozens of libraries being instrumented that the operator may or may not be familiar with. This is going to be a very surprising thing for "normal" auto-instrumentation agent users to have to do, as they are used to getting things OOTB without extra configuration. Would the auto-instrumentation agent then be expected to just register all the defaults provided with the instrumented libraries?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this mean that that person will get no metric data if they don't add additional configuration?
It does!
Would the auto-instrumentation agent then be expected to just register all the defaults provided with the instrumented libraries?
That's what I would expect, and IIRC it's what we did in OC. For example, the predefined default views for gRPC are here: https://github.com/census-instrumentation/opencensus-java/blob/8b1fd5bbf98b27d0ad27394891e0c64c1171cb2b/contrib/grpc_metrics/src/main/java/io/opencensus/contrib/grpc/metrics/RpcViewConstants.java.
Auto-instrumentation could enable all these views when it loads the gRPC integration.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@trask @prydin @tylerbenson thoughts on this? It definitely seems on the surface like it would make the auto-instrumentation job significantly more difficult. Especially since auto-instrumentation still has to support multiple backends with potentially different aggregations/metric types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@c24t This only seems viable to me if there's a way for an exporter author to say (for example) "for everything that's a Measure, create a view with a MinMaxSumCount aggregation.". Otherwise, I feel like this is a recipe for users thinking that nothing is working, causing a support nightmare.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like a good solution to me, and since measures already have unique names I don't see any reason in principle that we couldn't do this.
@bogdandrutu might be able to shed some light on the design of OC views.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I added a note about this in 46948ac.
Does this proposed Views API live at the same level as the other APIs, or is it a default-SDK-only thing? I asked, because this would be the first "API" that didn't target instrumentors, and instead targets operators/app developers. It feels like that will lead to confusion (it confuses me!) from the user base, especially the instrumentation-building user. |
Views would live in the API package. Developers (as I understand the persona) would use metric instruments in application code, operators (same caveat) would specify views in either static config or code. Aggregators are SDK objects, and a view might rely on some third party package to define a particular aggregator class. |
Yes I believe the comment was made before the discussion in the SIG meeting. You are correct. |
|
||
Each view describes a relationship between a metric instrument and an aggregation, and specifies the set of label keys to preserve in the aggregated data. | ||
|
||
Views are tracked in a registry, which exporters may use to get the current aggregate values for all registered aggregations at export time. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is Views a global thing, and hence must be registered with MeterProvider, so that all meters created the MeterProvider will get access to View.?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Additional thought around this: would it be valuable to enable specific aggregations to be tied to specific exporters?
I feel like this is another reason why aggregations should be tied to a MeterProvider. Then you can choose different meterprovider sets for different metrics, as necessary.
Something like:
MeterProvider
- exporters
- aggregators
- metrics (this may be unnecessary)
Because we don't require the user to specify the set of label keys up front, and because we don't prevent users from recording measurements with missing labels in the API, some label values may be undefined. | ||
Aggregators should preserve undefined label values, and exporters may convert them as required by the backend. | ||
|
||
For example, consider a Sum-aggregated Counter instrument that captures four consecutive measurements: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also add a note about boundinstruments and views? Boundinstruments are supposed to be the fastest way to record a value, as it avoids a lookup for the time series. With Views, bound instruments should still be the fastest, and should not be doing lookups.
This maybe implementation detail, but calling it out in spec as well.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @c24t for getting this started!
Have left some comments - mostly after looking at Python view prototype, and .NET prototype I am working on.
Most important things I'd like to see addressed:
-
Where is Views registered? MeterProvider? And Should we allow views to added/removed after MeterProvider is initialized?
-
Recommendation about View being configured though static files/json etc.
-
Default behavior if no Views registered. I believe the consensus is to do default aggregation and retain all labels, unless a user configured View overrides this.
-
Should Views be part of SDK only?
|
||
The aggregation may be configured with options specific to its type. | ||
For example, a _Histogram_ aggregation may be configured with bucket boundaries, e.g. `{0, 1, 10, 100, 200, 1000, inf}`. | ||
A _Sketch_ aggregation that estimates order statistics (i.e. quantiles), may be configured with a set of predetermined quantiles, e.g. `{.5, .95, .99, 1.00}`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could this be called "quantile" aggregation? A sketch doesn't immediately make me think of quantiles.
Implementations should refuse to register two views with the same name. | ||
|
||
Note that a view does not describe the type (e.g. `int`, `float`) or unit of measurement (e.g. "bytes", "milliseconds") of the metric to be exported. | ||
The unit is determined by the metric instrument, and the aggregation and exporter may preserve or change the unit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It seems like there's only two operations which are appropriate for a unit modification on a metric instrument:
- preserve the unit
- change the unit entirely (e.g. to count)
I feel like it may be valuable to call out that there won't be a modification such as "seconds" to "milliseconds". But maybe that's just obvious.
- A **float**-valued _mean_ metric with "ms" units | ||
- An **int** valued, unitless (i.e. unit "1") _count_ metric | ||
|
||
This OTEP doesn't propose a particular API for Aggregators, just that the API is sufficient for exporters to get all this information, including: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the otep recommend an API? reading the examples, it seems like something that may work would be (name, value, unit). I believe it seems it satisfies the examples below.
min([1, 2, 3, 4, 5, 6]) = min([min([1]), min([2, 3]), min([4, 5, 6])]) | ||
``` | ||
|
||
but quantile aggregations, e.g. _p95_ and _p99_ are not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this mean that quantiles are not supported as an aggregation? There's a strong need for quantiles, so I assume there's some way to ensure that it is measured and exported, but not clear how.
but quantile aggregations, e.g. _p95_ and _p99_ are not. | ||
Applications that export quantile metrics should use a mergeable aggregations such as [DDSketch](https://arxiv.org/abs/1908.10693), which estimates quantile values with bounded errors, or export raw measurements without aggregation and compute exact quantiles on the backend. | ||
|
||
We require aggregations to be mergeable so that they produce the same results regardless of the collection interval, or the number of collection events per export interval. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess again looking at quantiles as the edge case: what's the need for ensuring that the produced result are the same, regardless of collection interval? Is there some caveat where the collection interval cannot be guaranteed? I guess in current API design it's in the pushController side of things.
|
||
Every measurement is associated with a _LabelSet_, a set of key-value pairs that describes the environment in which the measurement was captured. | ||
Label keys and values may be extracted from the [correlation context](https://github.com/open-telemetry/opentelemetry-specification/blob/master/specification/api-correlationcontext.md) at the time of measurement. | ||
They may also be set by the user [at the time of measurement](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#direct-instrument-calling-convention) or at the time at which [the metric instrument is bound](https://github.com/open-telemetry/opentelemetry-specification/blob/f4aeb318a5b77c9c39132a8cbc5d995e222d124f/specification/api-metrics-user.md#bound-instrument-calling-convention). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How specifically are values from the correlation context piped in to the labelset? I figured that the user would still be responsible for that, but as this is calling out label extraction as a separate line, it sounds like that's not the case.
My opinions on this:
This OTEP seems to range far and wide outside of the scope for a SDK-only basic views API that I think we should shoot for for GA. |
Closing this as obsolete. |
Excuse me, it's unclear what's the current vision on metrics aggregation. Are there other OTEPs with the same target? |
This OTEP addresses the Future Work: Configurable Aggregations / View API section of api-metrics.md. It proposes a specification for a view API that allows users to configure aggregations for individual metric instruments.
It covers some of the same ground as open-telemetry/opentelemetry-specification#347, and summarizes a few weeks of conversation in metrics SIG meetings.
See also the google doc draft.