Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[pkg/translator/prometheusremotewrite] inefficient memory use during conversion to native histograms #24405

Closed
krajorama opened this issue Jul 20, 2023 · 12 comments

Comments

@krajorama
Copy link
Contributor

Component(s)

pkg/translator/prometheusremotewrite

Describe the issue you're reporting

During implementation of #17565 I noticed that much (~50%) of time is spent on reallocating slices for the resulting spans and deltas.

Background

From conversation with @charleskorn :
I see your PR grafana/mimir#5531, there's a similar issue in translating exponential histograms to native histograms where the spans and deltas are getting reallocated all the time. Except it's hard to give a good estimate there about the size needed (although upper bound is obvious) due to the difference in otel and prom representations + downscaling possibly .

How bad is it if we take the upper bound as the estimate?

Haven't calculated yet, kind of on my backlog. For deltas the reduction is potentially 2^n where n=scale difference between exponential and native histogram (in my code it's the scaledown): https://github.com/open-telemetry/opentelemetry-collector-contrib/blob/65e6596c028[…]22439f45e788/pkg/translator/prometheusremotewrite/histograms.go . But there's a chance some buckets are not neatly merge-able , so you cannot say that the reduction is exactly 2^n .

For spans it might be that all buckets fit in a span because they are consecutive. On the other hand, you might find that every bucket needs its own span. It might be worth calculating the number of deltas and spans in a first pass and just filling it later. Or maybe a better strategy is to allocate in batches (+10%?, double?), no idea.

It'd be interesting to compare doing a first pass to calculate the number of deltas and spans and being able to allocate accurately with guessing or assuming the worst case (edited)

For the batches, something that worked well for a similar problem when streaming chunks from [Mimir] ingesters to queriers was to use a linked list of batches of a fixed size, then create one final slice with the correct size and copy the elements from the batches - this saved a bunch of repetitive work copying elements into expanded slices that only had to be copied again when we expanded the slice again. These batches can also be pooled, so over time you only pay the cost of the final slice (edited)

But given the difference in the amount of data involved here, might not be worth it (in the ingesters, we were dealing with substantial structs if I remember correctly)

@krajorama krajorama added the needs triage New item requiring triage label Jul 20, 2023
@github-actions
Copy link
Contributor

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions
Copy link
Contributor

Pinging code owners for exporter/prometheusremotewrite: @Aneurysm9 @rapphil. See Adding Labels via Comments if you do not have permissions to add labels yourself.

@frzifus
Copy link
Member

frzifus commented Jul 26, 2023

cc @sh0rez

@github-actions
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Sep 25, 2023
@frzifus frzifus removed the Stale label Sep 25, 2023
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Nov 27, 2023
@jmichalek132
Copy link
Contributor

still relevant.

Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jan 29, 2024
@jmichalek132
Copy link
Contributor

still relevant.

@github-actions github-actions bot removed the Stale label Jan 30, 2024
Copy link
Contributor

github-actions bot commented Apr 1, 2024

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Apr 1, 2024
@aknuds1
Copy link
Contributor

aknuds1 commented Apr 24, 2024

I can have a look at this.

@github-actions github-actions bot removed the Stale label Apr 25, 2024
Copy link
Contributor

This issue has been inactive for 60 days. It will be closed in 60 days if there is no activity. To ping code owners by adding a component label, see Adding Labels via Comments, or if you are unsure of which component this issue relates to, please ping @open-telemetry/collector-contrib-triagers. If this issue is still relevant, please ping the code owners or leave a comment explaining why it is still relevant. Otherwise, please close it.

Pinging code owners:

See Adding Labels via Comments if you do not have permissions to add labels yourself.

@github-actions github-actions bot added the Stale label Jun 25, 2024
Copy link
Contributor

This issue has been closed as inactive because it has been stale for 120 days with no activity.

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Aug 24, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants