Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[connector/spanmetrics] Delta span metric StartTimeUnixNano doesn't follow specification, causing unbounded memory usage with prometheusexporter #31671

Closed
swar8080 opened this issue Mar 9, 2024 · 5 comments · Fixed by #31780
Labels
bug Something isn't working connector/spanmetrics

Comments

@swar8080
Copy link
Contributor

swar8080 commented Mar 9, 2024

Component(s)

connector/spanmetrics

What happened?

Expected Behaviour

The specification on metric temporality expects uninterrupted series of delta data points to have timestamps like below:

image

Actual Behaviour

When configuring spanmetricsconnctor to use delta temporarily, the timestamps of successive data points follow the pattern (T1, T1), (T2, T2), (T3, T3). Basically the StartTimeUnixNano and TimeUnixNano are always set to the current timestamp.

When configured to use cumulative temporality, the connector caches StartTimeUnixNano so that it can be used in the next data point. However, when using delta temporality that cache (resourceMetrics) gets wiped after each round of exporting metrics:

func (p *connectorImp) resetState() {
// If delta metrics, reset accumulated data
if p.config.GetAggregationTemporality() == pmetric.AggregationTemporalityDelta {
p.resourceMetrics.Purge()
p.metricKeyToDimensions.Purge()
} else {
p.resourceMetrics.RemoveEvictedItems()
p.metricKeyToDimensions.RemoveEvictedItems()
// Exemplars are only relevant to this batch of traces, so must be cleared within the lock
if p.config.Histogram.Disable {
return
}
p.resourceMetrics.ForEach(func(_ resourceKey, m *resourceMetrics) {
m.histograms.Reset(true)
})
}
}

Collector version

v0.94.0

Environment information

Testing done locally on mac OS

OpenTelemetry Collector configuration

connectors:
  spanmetrics:
    dimensions_cache_size: 5000
    resource_metrics_key_attributes:
      - service.name
    aggregation_temporality: "AGGREGATION_TEMPORALITY_DELTA"
    metrics_flush_interval: 5s
    exemplars:
      enabled: true
    histogram:
      explicit:
        buckets: ...

exporters:
  prometheus/exporter:
    endpoint: "0.0.0.0:9465"
    enable_open_metrics: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: []
      exporters: [spanmetrics]
    metrics:
      receivers: [otlp, spanmetrics]
      processors: []
      exporters: [prometheus/exporter]

Log output

No response

Additional context

Why fix this?

This could be a different way of fixing #30688 and #30559. It could also significantly reduce memory usage when combining spanmetricsconnector with exporters like prometheusexporter.

I'm currently seeing high memory use in the collectors at my work with this set-up

Issue with Prometheus Exporter

prometheusexporter supports conversion from delta to cumulative temporality. However, it only works if the series follows the specification:

// Delta-to-Cumulative
if doubleSum.AggregationTemporality() == pmetric.AggregationTemporalityDelta && ip.StartTimestamp() == mv.value.Sum().DataPoints().At(0).Timestamp() {
ip.SetStartTimestamp(mv.value.Sum().DataPoints().At(0).StartTimestamp())
switch ip.ValueType() {
case pmetric.NumberDataPointValueTypeInt:
ip.SetIntValue(ip.IntValue() + mv.value.Sum().DataPoints().At(0).IntValue())
case pmetric.NumberDataPointValueTypeDouble:
ip.SetDoubleValue(ip.DoubleValue() + mv.value.Sum().DataPoints().At(0).DoubleValue())
}
}

I think the author's rationale in the above code is that the delta series seems reset based on this part of the spec. So with the current delta span metric behaviour, prometheus views each data point as if the counter was reset.

A workaround is having the connector produce cumulative span metrics, but that requires the connector caching all series in memory, which is redundant since prometheusexporter already does this. @matej-g is working on an improvement to expire infrequently updated series from the cache which will definitely help with reducing memory.

Solution Discussion

Having delta span metrics follow the spec could significantly reduce memory usage when used with prometheusexporter.

To do this, the connector probably has to cache the StartTimeUnixNano for each series when in delta mode. The memory used to cache the tunestamps should be much less than what resourceMetrics caches though. That cache's values are nested structs holding data like metric attributes and histogram bucket sizes

@swar8080 swar8080 added bug Something isn't working needs triage New item requiring triage labels Mar 9, 2024
Copy link
Contributor

github-actions bot commented Mar 9, 2024

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@swar8080
Copy link
Contributor Author

I forked the collector in our internal gitlab to apply the patch I had in mind. It's making a noticeable difference to memory usage:

image

I'll clean-up the change and open a PR to see if it makes sense to include here too

@swar8080 swar8080 changed the title [connector/spanmetrics] Delta span metric StartTimeUnixNano doesn't follow specification [connector/spanmetrics] Delta span metric StartTimeUnixNano doesn't follow specification, causing unbounded memory usage with prometheusexporter Mar 12, 2024
@mx-psi
Copy link
Member

mx-psi commented Mar 13, 2024

I think we should change the behavior of the connector, but to be clear: having start timestamp equal the point timestamp is valid per the spec:

When StartTimeUnixNano equals TimeUnixNano, a new unbroken sequence of observations begins with a reset at an unknown start time. The initial observed value is recorded to indicate that an unbroken sequence of observations resumes. These points have zero duration, and indicate that nothing is known about previously-reported points and that data may have been lost.

@crobert-1
Copy link
Member

Looks like it's agreed that this is a bug, and a PR is welcome. I'll remove needs triage.

@crobert-1 crobert-1 removed the needs triage New item requiring triage label Mar 13, 2024
swar8080 added a commit to swar8080/opentelemetry-collector-contrib that referenced this issue Apr 9, 2024
…mestamps representing an uninterrupted series. This can avoid significant memory usage compared to producing cumulative span metrics, as long a downstream component can convert from delta back to cumulative, which can depend on the timestamps being uninterrupted.
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label May 13, 2024
@mx-psi mx-psi removed the Stale label May 13, 2024
mx-psi pushed a commit that referenced this issue Jun 4, 2024
…imestamps representing an uninterrupted series (#31780)

Closes #31671

**Description:** 
Currently delta temporality span metrics are produced with
(StartTimestamp, Timestamp)'s of `(T1, T2), (T3, T4) ...`. However, the
[specification](https://opentelemetry.io/docs/specs/otel/metrics/data-model/#temporality)
says that the correct pattern for an uninterrupted delta series is `(T1,
T2), (T2, T3) ...`

This misalignment with the spec can confuse downstream components'
conversion from delta temporality to cumulative temporality, causing
each data point to be viewed as a cumulative counter "reset". An example
of this is in `prometheusexporter`

The conversion issue forces you to generate cumulative span metrics,
which use significantly more memory to cache the cumulative counts.

At work, I applied this patch to our collectors and switched to
producing delta temporality metrics for `prometheusexporter` to then
convert to cumulative. That caused a significant drop in-memory usage:


![image](https://github.com/open-telemetry/opentelemetry-collector-contrib/assets/17691679/804d0792-1085-400e-a4e3-d64fb865cd4f)

**Testing:** 
- Unit tests asserting the timestamps
- Manual testing with `prometheusexporter` to make sure counter values
are cumulative and no longer being reset after receiving each delta data
point
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working connector/spanmetrics
Projects
None yet
3 participants