Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Metric standard SDK behavior specification #259

Closed
jmacd opened this issue Sep 24, 2019 · 15 comments
Closed

Metric standard SDK behavior specification #259

jmacd opened this issue Sep 24, 2019 · 15 comments
Assignees
Labels
area:sdk Related to the SDK
Milestone

Comments

@jmacd
Copy link
Contributor

jmacd commented Sep 24, 2019

In #250 we've tried to steer away from specifying the behavior of the library (as there are so many possibilities) and instead focus on the semantics of the events.

It's become clear that people are confused about the behavior of the SDK because it's literally unspecified as part of the API. I believe we should prepare a companion specification to explain what the initial OpenTelemetry SDK is targeting for metrics export. This could go as far as describing an API to configure views and aggregations, or a YAML syntax to support standard configuration.

For a prometheus exporter, we need to decide on certain behaviors that are "not semantic" and therefore haven't been described in the API specification. For each of the 6 sub-kinds of metric, we need to choose an export type for Prometheus. For example, our monotonic counter becomes a prometheus counter, but our non-monotonic counter needs a firm mapping. For measure metrics, I believe we should choose a histogram metric export w/ default bucket ranges. I could imagine making the choice of Histogram vs. Summary metric be an option for the API or YAML syntax mentioned above, with support for configurable bucket ranges and configurable quantiles.

@bogdandrutu
Copy link
Member

I am confused "non-monotonic counter needs a firm mapping", this is the Prometheus Gauge and probably that's why people are also confused about our definition of Gauge.

@jmacd
Copy link
Contributor Author

jmacd commented Sep 25, 2019

  1. OTel monotonic Counter->Prometheus Counter
  2. OTel non-monotonic Counter->Prometheus Gauge using its Add() and Dec() methods
  3. OTel monotonic Gauge->Prometheus Gauge using its Set() method (?)
  4. OTel non-monotonic Gauge->Prometheus Gauge using its Set() method
  5. OTel non-negative Measure->Prometheus Histogram
  6. OTel signed Measure->(not sure)

@bogdandrutu what do you think about case 3?

@bogdandrutu
Copy link
Member

  1. Observer non-monotonic -> Prometheus Gauge
  2. Observer monotonic -> Prometheus Counter

For 3 that is a Prometheus Counter.

Everything that is monotonic is a Counter in Prometheus. We have two options to implement keep the last value and record delta if we want to use their main API or implement our own registry in Prometheus that gathers the right metricsfamilies.

@bogdandrutu
Copy link
Member

Also Prometheus is moving to a non-negative histogram/summary based on OpenMetrics.

@jmacd
Copy link
Contributor Author

jmacd commented Sep 25, 2019

"For 3 that is a Prometheus Counter". The problem is that Prometheus Counter doesn't support Set(), only Add(). To turn a monotonic Gauge into a Prometheus Counter we have to remember the last value.

@bogdandrutu
Copy link
Member

Correct, but we need to understand the difference between API and data model in Prometheus. From data model perspective that is a Counter and in order to achieve this as I mentioned in the previous post we can use a Registry instead of using directly the API. This is how they also do it for another counter cpu-usage.

@bogdandrutu
Copy link
Member

Sorry I used the wrong term "Registry". We will use Collector

Here is an example for the cpu-usage and few other metrics: https://github.com/prometheus/client_golang/blob/3525612fea19680dd3d49c768802ec082ca853b2/prometheus/process_collector.go

@jmacd
Copy link
Contributor Author

jmacd commented Sep 25, 2019

Cool, this makes sense.

@bogdandrutu
Copy link
Member

Here is a table that tries to summarize the discussion so far:

OpenTelemetry Prometheus
Monotonic Counter Counter
Non-Monotonic Counter Gauge
Monotonic Gauge Counter
Non-Monotonic Gauge Gauge
Monotonic Observer Counter
Non-Monotonic Observer Gauge
Signed Measure Histogram or Summary
Non-Negative Measure Histogram or Summary

@SergeyKanzhelev SergeyKanzhelev added this to the Alpha v0.2 milestone Sep 27, 2019
@c24t
Copy link
Member

c24t commented Oct 3, 2019

I believe we should prepare a companion specification to explain what the initial OpenTelemetry SDK is targeting for metrics export

It sounds to me like this should just be part of the spec, in the same way that sdk-tracing describes the span exporter interface now.

If the argument for keeping aggregation and export logic out of the API package is the same as for tracing then it seems like the spec would be incomplete without a description of the SDK. We've effectively got two APIs, one for library devs in the "API" package, and another for vendors in the "SDK" package.

@jmacd
Copy link
Contributor Author

jmacd commented Oct 4, 2019

I've posted a prototype metrics SDK to help with this issue.
open-telemetry/opentelemetry-go#172

@bogdandrutu bogdandrutu added the area:sdk Related to the SDK label Oct 10, 2019
@bogdandrutu bogdandrutu modified the milestones: Alpha v0.2, Alpha v0.3 Oct 10, 2019
@jmacd
Copy link
Contributor Author

jmacd commented Oct 31, 2019

I've posted a complete metric exporter:
open-telemetry/opentelemetry-go#265

@dyladan
Copy link
Member

dyladan commented Nov 12, 2019

JS now also has a complete metric exporter open-telemetry/opentelemetry-js#483

"For 3 that is a Prometheus Counter". The problem is that Prometheus Counter doesn't support Set(), only Add(). To turn a monotonic Gauge into a Prometheus Counter we have to remember the last value.

In the JS version, if it is a counter we just destroy and recreate it with the same name and set the new value. There is no internal ID so the prometheus backend has no way of knowing that we are resetting the counter each time it changes and it "just works".

@dyladan
Copy link
Member

dyladan commented Nov 21, 2019

@mayurkale22 asked that I post a summary of our discussions in the JS SIG here.

It is my opinion that we should be only mapping to a different data type in cases where it causes issues. Too many behind the scenes changes could be confusing for the users.

For gauges and counters I would suggest the following mapping, see open-telemetry/opentelemetry-js#553 (comment) for context:

OT Internal Type Export Type
Monotonic Counter Counter
Non-Monotonic Counter Gauge
Monotonic Gauge Gauge
Non-Monotonic Gauge Gauge

See open-telemetry/opentelemetry-js#561 for an example of this behavior.

@jmacd
Copy link
Contributor Author

jmacd commented Dec 10, 2019

This issue is dangerously ambiguous and covers many topics. For the time being, note that #347 contains the current draft of the metrics SDK specification that was requested in this issue. I will close this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:sdk Related to the SDK
Projects
None yet
Development

No branches or pull requests

5 participants