feat: add capability to purge old histogram data #451

mnpw · 2024-02-17T21:07:42Z

What

add purge_timeout option to PrometheusBuilder
run a purger that purges based on the purge_timeout

Implements third way as prescribed here to purge old histogram data:

update the builder to generate a future which both drives the Hyper server future as well as a call to get_recent_metrics on an interval

Fixes #245

Rework static metric names and add new routing layer

…her_fix metrics-exporter-prometheus: sanitize matchers the same as input keys

…enhancements registry: add clear method

…enhancements metrics-util: make generation tracking configurable in Registry

The Atomic::compare_exchange function only appears in 0.9.2.

Fix minium version of crossbeam-epoch

t1ha causes a global-buffer-overflow when testing under ASAN. This change switches the default hash implementation to aHash which has a similar performance approach. Switch to aHash removes the global-buffer-overflow and the tests complete successfully. In general, cargo bench shows performance improvements as high as 31% on specific benches. There are 3 regressions and 29 improvements total

…#435)

- add purge_timeout option to PrometheusBuilder - run a purger that purges based on the purge_timeout

tobz · 2024-02-20T17:00:52Z

As I read back through what I originally wrote... I think a major flaw with my proposal to do this by periodically calling render is that we would actually also handle things like expiration, which could lead to subsequent scrapes/pushes missing data.

I think what we'd actually have to do here is split the logic out of get_recent_metrics that handles draining the histograms. Essentially, take this code and stick it into a new method on the struct -- let's call it run_upkeep for this example -- and trim it down so it only iterates through each histogram, finds/creates the entry in self.distributions, and then drains samples into the entry. After that, we'd add a call to that method right before this block, and in that block we'd remove the logic where it drains the histograms.

Does that make sense?

mnpw · 2024-02-22T09:26:46Z

Thanks for the update!

I think a major flaw with my proposal to do this by periodically calling render is that we would actually also handle things like expiration, which could lead to subsequent scrapes/pushes missing data.

Does this refer to the self.recency.should_store_* call where we check the validity of a metric? If so then I think eagerly calling this method would be fine. It would be helpful to understand your concern with an example.

tobz · 2024-02-28T13:53:17Z

Thanks for the update!

I think a major flaw with my proposal to do this by periodically calling render is that we would actually also handle things like expiration, which could lead to subsequent scrapes/pushes missing data.

Does this refer to the self.recency.should_store_* call where we check the validity of a metric? If so then I think eagerly calling this method would be fine. It would be helpful to understand your concern with an example.

(Sorry for the delayed response here.)

Yes, you're right that this would be the crux of the concern.

The example would be something like:

the exporter is configured with an idle timeout of 10 seconds, and a purger interval of 5s
the exporter is scraped at t=0, which includes metric A
metric A is updated at t=5 (meaning it will go idle at t=15 if not updated by then)
the purger runs at t=5 and t=10 and since nothing is idle yet, all it ends up doing is draining the histogram buckets
the purger again runs at t=15, but now, metric A is considered idle, which removes metric A from the registry
the exporter is scraper at t=30, which now no longer includes metric A

Despite metric A being updated between the two scrapes at t=0 and t=30, we've actually missed the update to metric A, which if we were only rendering during scrapes, would have been included in the second scrape.

mnpw · 2024-03-07T12:43:33Z

Despite metric A being updated between the two scrapes at t=0 and t=30, we've actually missed the update to metric A, which if we were only rendering during scrapes, would have been included in the second scrape.

Ah this makes sense once I checked documentation for PrometheusBuilder::idle_timeout:

This behavior is driven by requests to generate rendered output, and so metrics will not be removed unless a request has been made recently enough to prune the idle metrics.

I have separated out the purge action to only drain the histograms now.

…p part two

tobz and others added 30 commits May 16, 2021 10:47

update changelogs

51613f5

finally fix compare_and_set/compare_exchange stuff

58333f3

Merge pull request metrics-rs#205 from metrics-rs/feature/routing-layer

a2847da

Rework static metric names and add new routing layer

metrics-exporter-prometheus: sanitize matchers the same as input keys

ae2fa23

Merge pull request metrics-rs#206 from metrics-rs/tlawrence_prom_matc…

09ace1c

…her_fix metrics-exporter-prometheus: sanitize matchers the same as input keys

registry: add clear method

bac47e9

Merge pull request metrics-rs#207 from metrics-rs/tlawrence_registry_…

0fc8719

…enhancements registry: add clear method

macros: update CHANGELOG

1afbac8

(cargo-release) version 0.4.0

19180dc

(cargo-release) version 0.16.0

8883461

(cargo-release) version 0.8.0

7bbfed0

(cargo-release) version 0.5.0

946f040

(cargo-release) version 0.4.0

7338ad1

(cargo-release) version 0.5.0

fc6c9de

rough pass at parameterizing generation tracking

ec15373

break apart tracked vs untracked registry

ab583f8

more Registry API tweaks

5fb02b1

Merge pull request metrics-rs#208 from metrics-rs/tlawrence_registry_…

4508527

…enhancements metrics-util: make generation tracking configurable in Registry

update changelog

c09c5d7

(cargo-release) version 0.9.0

ece278a

(cargo-release) version 0.6.0

b96ee32

Fix minium version of crossbeam-epoch

f3543e3

The Atomic::compare_exchange function only appears in 0.9.2.

Merge pull request metrics-rs#209 from flub/crossbeam-epoch-version

5d7c076

Fix minium version of crossbeam-epoch

update changelog

87cf823

(cargo-release) version 0.9.1

c0ef299

Update LICENSE

9b782d9

point subcrate license files to repo-level license

4d2c2ad

Merge branch 'main' of github.com:metrics-rs/metrics

85f142a

remove code commenting stuff

67da32c

tobz and others added 14 commits December 24, 2023 13:20

(cargo-release) version 0.3.0

c5aa020

Update RELEASES.md

27a5ff3

Specific endpoint for prometheus exporter health checking (metrics-rs…

a83f591

…#435)

Update hashbrown to 0.14 (metrics-rs#438)

70588d5

Update indexmap to version 2 (metrics-rs#439)

7b7fd4f

update changelogs

71a3285

(cargo-release) version 0.16.1

a4472c8

fix pointless ahash MSRV bump

07e6e89

(cargo-release) version 0.16.2

b77c7f3

(cargo-release) version 0.13.1

42f5b72

fix ahash MSRV nonsense

5dddf77

woops

8295d66

(cargo-release) version 0.22.1

0306300

update changelog

b335b5e

mnpw force-pushed the purge-old-histogram-data branch 2 times, most recently from dbbb01d to 139bdd0 Compare February 17, 2024 21:12

feat: add capability to purge old histogram data

c1b8846

- add purge_timeout option to PrometheusBuilder - run a purger that purges based on the purge_timeout

mnpw force-pushed the purge-old-histogram-data branch from 139bdd0 to f4abc32 Compare February 18, 2024 18:32

mnpw and others added 5 commits March 8, 2024 13:07

chore: only drain histogram during purge

cd773e5

Rolling summary configuration (metrics-rs#444)

7611322

update changelog + fix fmt/clippy lints

7745e4d

bump ahash to 0.8.8 to drop range constraint needed to avoid MSRV bump

a7e1ccc

bump ahash to 0.8.8 to drop range constraint needed to avoid MSRV bum…

568e0fb

…p part two

mnpw closed this Mar 8, 2024

mnpw force-pushed the purge-old-histogram-data branch from b87237b to 568e0fb Compare March 8, 2024 07:38

mnpw mentioned this pull request Mar 8, 2024

feat: add capability to purge old histogram data #460

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

feat: add capability to purge old histogram data #451

feat: add capability to purge old histogram data #451

Uh oh!

mnpw commented Feb 17, 2024 •

edited

Loading

Uh oh!

tobz commented Feb 20, 2024

Uh oh!

mnpw commented Feb 22, 2024

Uh oh!

tobz commented Feb 28, 2024

Uh oh!

mnpw commented Mar 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

feat: add capability to purge old histogram data #451

feat: add capability to purge old histogram data #451

Uh oh!

Conversation

mnpw commented Feb 17, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Uh oh!

tobz commented Feb 20, 2024

Uh oh!

mnpw commented Feb 22, 2024

Uh oh!

tobz commented Feb 28, 2024

Uh oh!

mnpw commented Mar 7, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

20 participants

mnpw commented Feb 17, 2024 •

edited

Loading