-
Notifications
You must be signed in to change notification settings - Fork 259
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Exemplar support to Metrics proto #159
Conversation
Closing in favor of #162 (which implements exemplars) |
updated to not make structual changes and to add raw_value_data_points. Still need to specify the type of RawValue in some way, which will be unblocked after #168 |
@@ -104,6 +133,7 @@ message Metric { | |||
repeated DoubleDataPoint double_data_points = 3; | |||
repeated HistogramDataPoint histogram_data_points = 4; | |||
repeated SummaryDataPoint summary_data_points = 5; | |||
repeated RawValue raw_data_points = 6; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this should be removed for the moment, this is not exemplar, is supporting "RawMeasurements" which is out of scope for this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
removed
The generated code was removed, please remove that as well. Rebase the PR please. |
message RawValue { | ||
// The set of labels that were dropped by the aggregator, but recorded | ||
// alongside the original measurement. Only labels that were dropped by the aggregator should be included | ||
repeated opentelemetry.proto.common.v1.StringKeyValue labels = 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the relationship between these labels and the labels in the DataPoint:
- Do we duplicate them?
- Do we extract these labels and the actual set of labels is the combination of these + datapoint.lables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For exemplars these labels are only the labels not included in the DataPoint's labels. I would change this field to be dropped_labels
but if RawValue is used as a data point itself the labels would include all labels
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jmacd I know you tried to share the messages, can you help define the behavior here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you recommended calling these "dropped_labels". This sounds good to me.
|
||
// (Optional) List of exemplars collected from | ||
// measurements that were used to form the data point | ||
repeated RawValue exemplars = 7; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this just be a list of random samples from the whole window? Open question in the OTEP:
We don’t have a strong grasp on how the sketch aggregator works in terms of implementation - so we don’t have enough information to design how exemplars should work properly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proto does not define how the exemplars were sampled, not sure your question?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nvm, I see now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
At the recent OTLP discussion meeting, we agreed to remove the sample_count
field from the current proposal. We also agreed to move the Exemplars into the DataPoints so that they can refer to dropped labels, not include full label sets in each point.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bogdandrutu does this sound right to you, for now?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We also agreed to move the Exemplars into the DataPoints so that they can refer to dropped labels, not include full label sets in each point.
I think we agreed that you will evaluate what is better for performance/semantics:
- Having a
repeated RawValue exemplars
in theMetric
that applies to all data points (user may need to do another remapping to every data point) vsrepeated RawValue exemplars
in every point (if we go with every point then "dropped_labels" is better name for that).
My point is that I don't have a strong opinion between the both, and was trying to make you investigate and decide which way. Here are my thoughts:
- Having
repeated RawValue exemplars
in the Metrics:- Pros:
- Saves some memory in the internal representation (have extra 24 bytes per data point).
- Same message may be able to be re-used with raw-measurements because labels don't represent dropped.
- Cons:
- Duplicate labels on the wire.
- User needs to re-map every exemplar to the data point by doing the labels matching.
- Pros:
I feel cons
are more "significant" than pros
, so personally I would go with exemplars in every DataPoint as you suggested.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we go with exemplars in every DataPoint I would say to rename the message to "Exemplar" :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This discussion makes me want a way to intern label sets to avoid the "re-map every exemplar" problem.
I don't feel inclined to invest time in this now, so we should probably choose "repeated RawValue exemplars in every point".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This discussion makes me want a way to intern label sets to avoid the "re-map every exemplar" problem.
Even with an "intern label" you still need to map every exemplar to a point (that mapping may be faster if we have "intern label" but still needs some work)
I left a lengthy remark on this topic here: I am worried that my request for @cnnradams to explore and implement statistical sampling for exemplars has led to some confusion, and (with my apologies) I am willing to omit it, but as noted in the comment, there are many related questions and even if we take the statistical question out of it, we're left with tough questions. |
@cnnradams I know your internship is over, but let us know if you're willing to make these changes. |
sure. from reading the discussion, it seems like the only things I need to change are |
done all 3. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🚀
* Add exemplars to proto * handle just exemplars, nit fixes * comments * rawvalue -> exemplar, remove sample_count
* Add exemplars to proto * handle just exemplars, nit fixes * comments * rawvalue -> exemplar, remove sample_count
Adds support for OTEP#113. This should also handle duplicate labels in exemplars by bringing labels up a level and making exemplars only hold
additional_labels
.Proto question (since I'm new to this): I defined a
measurement_type
enum for the RawValue type, but couldn't create a new enum with just INT64 and DOUBLE (because I can't share names with other enums). So right now I'm usingType
, which means things other than INT64 and DOUBLE can be picked formeasurement_type
. What is the right solution for this?