Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update SummaryDataPoint percentile comment #127

Merged
merged 4 commits into from
Mar 15, 2020
Merged
Show file tree
Hide file tree
Changes from 1 commit
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions gen/go/metrics/v1/metrics.pb.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

12 changes: 12 additions & 0 deletions opentelemetry/proto/metrics/v1/metrics.proto
Original file line number Diff line number Diff line change
Expand Up @@ -340,6 +340,18 @@ message SummaryDataPoint {
double sum = 5;

// Represents the value at a given percentile of a distribution.
//
// To support the Min and Max values of a MinMaxSumCount aggregation the
// following conventions are used:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would not make a direct reference to MinMaxSumCount:

To record Min and Max values following conventions are used:
// - The 100th percentile is equivalent to the maximum value observed.
// - The 0th percentile is equivalent to the minimum value observed.

Also I couldn't find a good source to confirm that 0/100 are not mathematically correct (I found some sources saying that they may not be generally accepted), but I would not put a strong phrase here without a good proof.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not entirely sure why we wouldn't include a reference to the MinMaxSumCount aggregation as OpenTelemetry implementations will expect guidance from the OTLP as to what should be done with this native aggregation. It seems like I've missed the target audience of who will be using this OTLP. Who do you envision using this?

As for the mathematical correctness, from the linked ticket in the comment:

the 0th percentile is the value which 0% of events occured, where as the minimum is the minimal value where at least 1 event occured

It is based on the definition of a percentile that the incorrectness is determined.

Copy link
Member

@bogdandrutu bogdandrutu Mar 14, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the source of the comment in the issue?

What I found so far:

So I am not sure where the sentence "0th percentile is mathematically incorrect" comes from.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@MrAlias I don't care that you include a reference to that, but what about DDSketch do we include a reference to that as well? Or any other algorithm that can produce percentiles?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't care that you include a reference to that, but what about DDSketch do we include a reference to that as well? Or any other algorithm that can produce percentiles?

Ah! Makes sense. I'll remove it.

With regard to the source of the comment in the issue, it was primarily based my prior work with percentiles, but the Wikipedia article you linked also defines them the same way:

A percentile (or a centile) is a measure used in statistics indicating the value below which a given percentage of observations in a group of observations falls. For example, the 20th percentile is the value (or score) below which 20% of the observations may be found. Similarly, 80% of the observations are found above the 20th percentile.

From that it follow that the 0th percentile is the value below which 0% of the observations may be found, meaning zero events.

Again, I was pulling the definition of a minimum from past work, but as defined by Wikipedia:

In statistics, the sample maximum and sample minimum, also called the largest observation and smallest observation, are the values of the greatest and least elements of a sample.

From this definition it follows that at least one event had this value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also just realized it might not be so much the correctness of the statement as it is the strength of it which is the main reason you wanted me to take it out. Sorry, I think I missed that on the first read through. Updated to the suggested language.

// - The 100th percentile is equivalent to the maximum value observed
// so can be used as a stand in for the Max value.
// - Set the 0th percentile to the Min value. This is mathematically
// incorrect, but the most suitable way to transmit this data with the
// existing metric kinds.
//
// For more information about work underway to better support the
// MinMaxSumCount aggregation refer to:
// https://github.com/open-telemetry/opentelemetry-proto/issues/125
message ValueAtPercentile {
// The percentile of a distribution. Must be in the interval
// [0.0, 100.0].
Expand Down