Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expand event-model description to more clearly delineate instrument v… #1614

Merged
merged 12 commits into from
Apr 22, 2021
48 changes: 26 additions & 22 deletions specification/metrics/datamodel.md
Original file line number Diff line number Diff line change
Expand Up @@ -146,31 +146,35 @@ OpenTelemetry fragments metrics into three interacting models:

### Event Model

This specification uses as its foundation a
[Metrics API consisting of 6 model instruments](api.md), each having distinct
semantics, that were prototyped in several OpenTelemetry SDKs between July 2019
and June 2020. The model instruments and their specific use-cases are meant to
anchor our understanding of the OpenTelemetry data model and are divided into
three categories:

- Synchronous vs. Asynchronous. The act of calling a Metrics API in a
synchronous context means the application/library calls the SDK, typically having
associated trace context and baggage; an Asynchronous instrument is called at
collection time, through a callback, and lacks context.
- Adding vs. Grouping. Whereas adding instruments express a sum, grouping
instruments characterize a group of measurements. The numbers passed to adding
instruments define division, in the algebraic sense, while the numbers passed
to grouping instruments are generally not. Adding instrument values are always
parts of a sum, while grouping instrument values are individual measurements.
- Monotonic vs. Non-Monotonic. The adding instruments are categorized by whether
the derivative of the quantity they express is non-negative. Monotonic
instruments are primarily useful for monitoring a rate value, whereas
non-monotonic instruments are primarily useful for monitoring a total value.
The event model is where recording of data happens. Its foundation is made of
[Instruments](api.md), which are used to record data observations via events.
These raw events are then transformed in some fashion before being sent to some
other system. OpenTelemetry metrics are designed such that the same instrument
and events can be used in different ways to generate metric streams.

![Events → Streams](img/model-event-layer.png)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does total_latency aggregation ever makes sense? It may not be the best idea to include example like this

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do want an example that shows how a single instrument could turn into all of Sum, Histogram + Gauge.

While I can synthesize hair-brained scenarios where I think "total latency" might make sense (e.g. ridiculous statistics like How many days have users waited for our website in aggregate), You're right it's not a super useful example. Going to take a day to brainstorm and looking for other ideas on this example.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the example was a count (e.g., request_size), then max, sum, and histogram outputs all make sense.

However, one drawback with this example (sum alongside a histogram) is that a histogram data point already contains a sum, so exposing a separate metric with the sum is somehow not useful. The max function makes a better example, you could export the maximum over [1m], [10m], [1hr], and so on. (Related: open-telemetry/opentelemetry-proto#279)

The example export one histogram by metric_by_a_and_b{attributeA,attributeB}, one one histogram by metric_by_a_and_c{attributeA,attributeC}, maybe? I would emphasize that when doing this kind of output, separate metric names MUST be used to avoid metric data being recombined with itself.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated to use request size. I agree that histogram already has sum. I've also added some caveats to the image to denote it's meant to show the power, not a practical sceanrio.

PTAL at the new verbage. Also, I don't want to throw the baby out with the bathwater here. Look at the whole context of the doc as an introduction to the space of metrics. We need to both:

  • Give people a mental model of why/how Instruments differ from Metric.
  • Give people a grounding in what mapping from Event => Metric Stream looks like and the decisions that need to be made.

We do not need to specify the Otel API or its behavior here. We only need to specify what concepts are allowed in OTel metric streams. Lots of these nuances and details belong in the API docs.

I see these docs server two users primarily:

  • API authors looking to generate OTLP by mapping their Events/Instruments into streams.
  • Exporter authors looking to consume OTLP by mapping streams into their backend timeseries.

Instrumentation users who are generating metrics should be using the API specification.


Even though observation events could be reported directly to a backend, in
practice this would be infeasible due to the sheer volume of data used in
observability systems, and the limited amount of network/cpu telemetry
collection resources available for telemetry collection purposes. The best
example of this is the Histogram metric where raw events are recorded in a
compressed format rather than individual timeseries.

> Note: The above picture shows how one instrument can transform events into
> more than one type of metric stream. There are caveats and nuances for when
> and how to do this. Instrument and metric configuration are outlined
> in the [metrics API specification](api.md).

While OpenTelemetry provides flexibility in how instruments can be transformed
into metric streams, the instruments are defined such that a reasonable default
mapping can be provided. The exact
[OpenTelemetry instruments](api.md##metric-instruments) are more fully
detailed in the API specification.

In the Event model, the primary data are (instrument, number) points, originally
observed in real time or on demand (for the synchronous and asynchronous cases,
respectively). The instruments and model use-cases will be described in greater
detail as we link the event model with the other two.
respectively).

### Timeseries Model

Expand Down
Binary file added specification/metrics/img/model-event-layer.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified specification/metrics/img/model-layers.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.