Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exemplar clarifications #2421

Closed
Tracked by #3756
jack-berg opened this issue Mar 17, 2022 · 13 comments · Fixed by #3760
Closed
Tracked by #3756

Exemplar clarifications #2421

jack-berg opened this issue Mar 17, 2022 · 13 comments · Fixed by #3760
Assignees
Labels
area:sdk Related to the SDK release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:metrics Related to the specification/metrics directory

Comments

@jack-berg
Copy link
Member

Doing a pass on the java exemplar implementation and found some places in the spec that need additional clarification IMO:

More clearly define built-in exemplar filters
The spec says to "See Defaults and Configuration for built-in filters", which eventually leads to a list of "none", "all", and "with_sampled_trace". These should be defined in the metric SDK document for improved clarify.

Define how an ExemplarFilter gets configured in the SDK
The Java SDK currently allows ExemplarFilter to be configured on a meter provider instance, which is presumably the right place, but that isn't explicitly stated anywhere.

Clarify whether views can configure exemplar reservoirs
There's a TODO that states "after we release the initial Stable version of Metrics SDK specification, we will explore how to allow configuring custom ExemplarReservoirs with the View API.", yet the view section specifies that exemplar reservoirs can optionally be specified.

Is the intention that views can be configured with variations of the built-in reservoirs, but custom reservoirs are disallowed for the time being?

Define the default size of SimpleFixedSizeExemplarReservoir
If exemplars are enabled, SimpleFixedSizeExemplarReservoir is used by all aggregations except explicit bucket histogram, yet the default size of the reservoir is left unspecified.

@jack-berg jack-berg added the spec:metrics Related to the specification/metrics directory label Mar 17, 2022
@reyang reyang added area:sdk Related to the SDK release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs labels Mar 18, 2022
@cijothomas
Copy link
Member

@jack-berg #2919 has addressed the 1st part, that can be cut now.

@MrAlias
Copy link
Contributor

MrAlias commented Mar 8, 2023

Regarding point 2 and 3, this statement from the specification needs to be clarified:

A Metric SDK MUST provide a mechanism to sample Exemplars from measurements via the ExemplarFilter and ExemplarReservoir hooks.

@MrAlias
Copy link
Contributor

MrAlias commented Mar 8, 2023

The Java SDK currently allows ExemplarFilter to be configured on a meter provider instance, which is presumably the right place, but that isn't explicitly stated anywhere.

That sounds appropriate to me. I could also see in addition to this having views set an override.

@MrAlias
Copy link
Contributor

MrAlias commented Mar 8, 2023

yet the view section specifies that exemplar reservoirs can optionally be specified.

That is indeed confusing given there are only two option and configuring them differently would lead to incorrect behavior.

Is the intention that views can be configured with variations of the built-in reservoirs, but custom reservoirs are disallowed for the time being?

If we can verify the reservoir definition is mostly stable and extensible in a backwards compatible manner I think allowing users to provide their own would be a big benefit. This space is open to a lot of complex and tailored algorithms that I expect users want to include.

@MrAlias
Copy link
Contributor

MrAlias commented Apr 23, 2023

The scope of the reservoir sampling needs to also be clarified.

It is not explicitly stated, but it seems safe to assume an exemplar reservoir is scoped to a single instrument and not the entire SDK. Right?

Within the scope of an instrument, is an exemplar reservoir sampling scoped by attributes? For example, if an instrument measures a value with attributes {"user" -> "Alice"} and another measurement with attributes {"user" -> "Bob", "admin" -> true} are both measurements sampled by the same reservoir, or are they each sampled by their own reservoir?

Based on the Java implementation, I think the reservoir scope is all attribute sets across an instrument. However, the OTLP data model makes me wonder if that is correct. Messages like the NumberDataPoint contains a set of exemplars that all apply to same attribute set that scope NumberDataPoint.

@jack-berg
Copy link
Member Author

Based on the Java implementation, I think the reservoir scope is all attribute sets across an instrument.

Its a bit hard to see in the source code, but the java SDK does actually create a reservoir for each unique set of attributes.

@MrAlias
Copy link
Contributor

MrAlias commented Apr 24, 2023

Its a bit hard to see in the source code, but the java SDK does actually create a reservoir for each unique set of attributes.

Ah, gotcha.

So if you have a fixed size exemplar reservoir that samples N exemplars, each unique set of attributes will have N exemplars (assuming more than N measurements are made for each)?

Is this the behavior we want? I like this strategy because it is easier to implement, but I worry it is going to generate a lot of exemplar data, right?

@jack-berg
Copy link
Member Author

So if you have a fixed size exemplar reservoir that samples N exemplars, each unique set of attributes will have N exemplars (assuming more than N measurements are made for each)?

Correct.

Is this the behavior we want? I like this strategy because it is easier to implement, but I worry it is going to generate a lot of exemplar data, right?

Depends on the size of N. In java, we choose N to be equal to the number of available processors. This decision isn't specified anywhere, but it was included (I believe by @jsuereth) in the initial exemplar implementation and has stuck.

This is typically smaller than the number of exemplars for histograms, which is one per bucket.

Not sure what exactly I would expect the default to be. I suppose I would expect each unique set of attributes to typically receive at least on the order of hundreds of measurements for collection period. (Given that the default interval for PMR is 30s, 100 measurements would be >= 3.3 / second.) I think something in the range of 1-10 example measurements seems appropriate.

@MrAlias
Copy link
Contributor

MrAlias commented Apr 24, 2023

I think something in the range of 1-10 example measurements seems appropriate.

Isn't this then multiplied by the number of unique attribute sets though? So if a users is using the random fixed size reservoir with a size of 10, they measure across N unique attribute sets, and they make more than 10 measurements per unique attribute set per collection cycle they will have 10*N exemplars.

I could see users asking for a random fixed size reservoir with a size of 10 to expect they will get at most 10 exemplars per collection cycle.

@jack-berg
Copy link
Member Author

Isn't this then multiplied by the number of unique attribute sets though?

Yes. 1 seems more appropriate if the number of measurements is around 100-1000. 10 seems more appropriate if the number of measurements is much larger, say 10_000+.

Might be good to specify that the default size of the fixed size reservoir is 1.

@MrAlias
Copy link
Contributor

MrAlias commented Apr 24, 2023

Yes. 1 seems more appropriate if the number of measurements is around 100-1000. 10 seems more appropriate if the number of measurements is much larger, say 10_000+.

Might be good to specify that the default size of the fixed size reservoir is 1.

Yeah, this kind of cardinality issues was what lead me to ask the original scope question. If we define the sampling scope to be the instrument (across all unique attributes) instead a user will have a better ability to set the output size they want.

@MrAlias
Copy link
Contributor

MrAlias commented Apr 24, 2023

If we want to stick with the sampling scope being "per unique attribute", my next question was if we want to keep passing the attributes to the Offer method? It seems like these will always be the same value and it won't be needed if the exemplar filter can continue to handle these.

MrAlias added a commit to MrAlias/opentelemetry-specification that referenced this issue Aug 21, 2023
This reservoir type is used for all aggregations other than a histogram
with more than one bucket. Each attribute set the aggregation records
will have reservoir. Therefore, limiting this to a small value by
default when enabled is preferable.

This does not address the way a user will configure this value. That is
left for a future PR/Issue.

Part of open-telemetry#2421
@jsuereth
Copy link
Contributor

jsuereth commented Nov 3, 2023

Just commenting on:

If we want to stick with the sampling scope being "per unique attribute", my next question was if we want to keep passing the attributes to the Offer method? It seems like these will always be the same value and it won't be needed if the exemplar filter can continue to handle these.

For Java at least (and I suspect it may be true in Go), passing the full set of attributes can lead to more optimal overall throughput.

Specifically:

  • You're not guaranteed to keep any particular exemplar. So if you delay diff-ing, you should diff O(m) attribute sets, where m=size of the reservoir vs O(n) where n=the number of measurements seen.
  • You're likely using a "shared reference" to the attribute set, so you're only paying the cost of a pointer.

jsuereth added a commit to jsuereth/opentelemetry-specification that referenced this issue Nov 10, 2023
@jmacd jmacd closed this as completed in bb3d0a0 Dec 1, 2023
carlosalberto pushed a commit to carlosalberto/opentelemetry-specification that referenced this issue Oct 31, 2024
…n-telemetry#3760)

Fixes open-telemetry#2205
Fixes open-telemetry#3674 
Fixes open-telemetry#3669
Partially fixes open-telemetry#2421

## Changes

- Update example exemplar algorithm to account for initial reservoir
fill
- Update fixed-size defaults to account for memory contention /
optimization in Java impl
- Set a default for exponential histogram aggregation
- Clarify that ExemplarFilter should be configured on MeterProvider
- Make it clear that ONE reservoir is create PER timeseries datapoint
(not one reservoir per view or metric name).
- Allow flexibility in Reservoir `offer` definition based on feedback
from Go impl.

* Related issues open-telemetry#3756

---------

Co-authored-by: David Ashpole <dashpole@google.com>
Co-authored-by: Joshua MacDonald <jmacd@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK release:allowed-for-ga Editorial changes that can still be added before GA since they don't require action by SIGs spec:metrics Related to the specification/metrics directory
Projects
None yet
6 participants