Cumulative data points should set `StartTimeUnixNano` per timeseries #4184

ArthurSens · 2024-08-13T18:54:37Z

Current behavior

As mentioned by the Metrics Data Model, the goals of StartTimeUnixNano is to help measure the increase rate of unbroken data points and identifying resets and gaps.

The current implementation of most OTel SDKs sets the value of StartTimeUnixNano as the timestamp representing the start of the instrumented process. This works pretty well when all known data points start being pushed/exposed right at the beginning of the process lifetime, but not so much if unknown data points appear sometime after the process starts.

Problem statement

Let me try to explain with an example:

Let's say an application starts at a time T = 0, some HTTP requests are happening and they are all successful (i.e. status code 200). After 30s (T+30), the first request with code 404 happens and it continues to happen every 1s (therefore 1 request per second is the increase rate).

The increase rate can be measured with a formula like this:

$rate = \frac{Cumulative Value} {Current Time - StartTimeUnixNano}$

1 minute after T, HTTP requests with status code 404 would have happened 30 times, but let's see how different the measurement would be if we use T or T+30 as the start time.

If StartTimeUnixNano = T, $rate = \frac{30} {60 - 0} = 0.5$ req/sec
If StartTimeUnixNano = T+30, $rate = \frac{30} {60-30} = 1$ req/sec

As mentioned in Current Behavior, the measurement works well when the series is initialized alongside the process, but not so when the series is initialized after.

Requested change

The requested change is that StartTimeUnixNano is set separately per time series.

But of course, this comes with performance drawbacks. For that to be possible, SDKs will need to store in memory some sort of time series ID + StartTimeUnixNano for all time series ever created. This can be a huge drawback for processes wishing to expose high cardinality metrics.

I believe the appropriate approach is to allow users to configure the SDK, where they can opt-in to the current behavior or to the behavior I'm suggesting here.

Additional context

The discussion comes from an issue in OpenTelemetry-Collector-Contrib, where we're implementing support for Prometheus/OpenMetrics's created timestamp: open-telemetry/opentelemetry-collector-contrib#32521

The text was updated successfully, but these errors were encountered:

dashpole · 2024-08-14T15:12:51Z

This is important for compatibility between Prometheus and OpenTelemetry going forward. @ArthurSens it might be helpful for folks here to describe what issues this causes when pushing OTLP to Prometheus today.

jmacd · 2024-08-21T15:37:59Z

See also #2232
CC/ @zeitlinger

bboreham · 2024-09-04T17:30:18Z

Is this really a specification issue? I read the spec as saying that implementations should set StartTime to match when the unbroken series of data points started.

To me this is an implementation issue.

dashpole · 2024-09-04T17:40:56Z

@bboreham I believe it is a spec issue (my initial analysis of the spec is here). The spec says:

The StartTimeUnixNano of each point matches the StartTimeUnixNano of the initial observation.

Which just means that the start time is repeated, not that it matches when the series of data points started.

bboreham · 2024-09-05T09:10:03Z

I am referring to this text in the spec:

The second, optional timestamp is used to indicate when a sequence of points is unbroken, and is referred to as StartTimeUnixNano.

EDIT:

After discussion with @jesusvazquez, I wish to point to this part:

When StartTimeUnixNano is less than TimeUnixNano, a new unbroken sequence of observations begins with a "true" reset at a known start time.

I agree the spec could be more explicit, but using a time hours or days earlier as the "known start time" seems perverse to me.

ArthurSens added the spec:metrics Related to the specification/metrics directory label Aug 13, 2024

ArthurSens changed the title ~~Cumulative temporality should set StartTimeUnixNano per timeseries~~ Cumulative data points should set StartTimeUnixNano per timeseries Aug 13, 2024

trask added the triage:deciding:community-feedback Open to community discussion. If the community can provide sufficient reasoning, it may be accepted label Aug 13, 2024

This was referenced Aug 21, 2024

[Meta Issue] Created Timestamp Plan prometheus/prometheus#14217

Open

Handle OTLP StartTime (similar to OpenMetrics Created timestamp) prometheus/prometheus#14600

Open

jesusvazquez mentioned this issue Sep 5, 2024

OTLP: Insert created timestamp if starttimestamp is within interval grafana/mimir-prometheus#689

Merged

github-actions bot added the triage:followup Needs follow up during triage label Sep 19, 2024

mtwo removed the triage:followup Needs follow up during triage label Oct 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cumulative data points should set `StartTimeUnixNano` per timeseries #4184

Cumulative data points should set `StartTimeUnixNano` per timeseries #4184

ArthurSens commented Aug 13, 2024 •

edited

Loading

dashpole commented Aug 14, 2024

jmacd commented Aug 21, 2024

bboreham commented Sep 4, 2024

dashpole commented Sep 4, 2024

bboreham commented Sep 5, 2024 •

edited

Loading

Cumulative data points should set StartTimeUnixNano per timeseries #4184

Cumulative data points should set StartTimeUnixNano per timeseries #4184

Comments

ArthurSens commented Aug 13, 2024 • edited Loading

Current behavior

Problem statement

Requested change

Additional context

dashpole commented Aug 14, 2024

jmacd commented Aug 21, 2024

bboreham commented Sep 4, 2024

dashpole commented Sep 4, 2024

bboreham commented Sep 5, 2024 • edited Loading

Cumulative data points should set `StartTimeUnixNano` per timeseries #4184

Cumulative data points should set `StartTimeUnixNano` per timeseries #4184

ArthurSens commented Aug 13, 2024 •

edited

Loading

bboreham commented Sep 5, 2024 •

edited

Loading