Skip to content

Conversation

@mnpw
Copy link
Contributor

@mnpw mnpw commented Mar 8, 2024

Copy of #451


What

  • add purge_timeout option to PrometheusBuilder
  • run a purger that purges based on the purge_timeout

Implements third way as prescribed here to purge old histogram data:

  • update the builder to generate a future which both drives the Hyper server future as well as a call to get_recent_metrics on an interval

Fixes #245

mnpw added 3 commits March 8, 2024 13:21
- add purge_timeout option to PrometheusBuilder
- run a purger that purges based on the purge_timeout
Copy link
Member

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking good, but just a few notes/requests to try and make this a little more generic/understandable to folks.

Comment on lines 498 to 513
if let Ok(handle) = runtime::Handle::try_current() {
handle.spawn(purger);
} else {
let thread_name = "metrics-exporter-prometheus-purger";

let runtime = runtime::Builder::new_current_thread()
.enable_all()
.build()
.map_err(|e| BuildError::FailedToCreateRuntime(e.to_string()))?;

thread::Builder::new()
.name(thread_name.to_owned())
.spawn(move || runtime.block_on(purger))
.map_err(|e| BuildError::FailedToCreateRuntime(e.to_string()))?;
};
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for the conditional logic here: just spawn the future directly.

(We already document that this method must be called from within a Tokio runtime or else it will panic.)

let handle = recorder.handle();

// use the handle to recorder
// #[cfg(not(feature = "push-gateway"))]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Delete this line.


/// Purges registry's histogram data by draining it into the distribution. This should be
/// called periodically to prevent the accumulation of histogram samples.
pub fn purge(&self) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should just change the wording overall from "purge" to "upkeep", and make this run_upkeep.

Purging to me sounds like getting rid of, when realistically we're just doing periodic cleanup work.

Comment on lines 366 to 370
/// Sets the purge timeout for metrics.
///
/// If a purge timeout is set, the purger will call `.render()` on the registry, causing
/// the values from histograms to be drained out. This ensures that stale histogram values
/// do not persist indefinitely.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar note here about just changing the wording overall from "purge"/"purging" to "upkeep".

I would also remove the specific bit about calling .render(), since it doesn't actually do that anymore. Just be generic, something like:

/// Sets the upkeep interval.
///
/// The upkeep task handles periodic maintenance operations, such as draining histogram data,
/// to ensure that all recorded data is up-to-date and prevent unbounded memory growth. 

buckets: None,
bucket_overrides: None,
idle_timeout: None,
purge_timeout: None,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should actually enable this by default.

Thinking more about it, it's a quality of life improvement to avoid unbounded memory growth for people with a high rate of histogram metrics, or who scrape/push their Prometheus exporter infrequently.

Probably 5 seconds as the default?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense, 5 seconds is a good default. Also I can't think of a use case where somebody would not want to run upkeep so making this non-optional would be sensible

Copy link
Member

@tobz tobz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect. Nice and simple. 👍🏻

@tobz tobz merged commit 3c71988 into metrics-rs:main Mar 16, 2024
@tobz tobz added C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-enhancement Type: enhancement. S-awaiting-release Status: awaiting a release to be considered fixed/implemented. labels Mar 16, 2024
@tobz
Copy link
Member

tobz commented Mar 16, 2024

Released in metrics-exporter-prometheus@v0.14.0.

Thanks again for your contribution!

@tobz tobz removed the S-awaiting-release Status: awaiting a release to be considered fixed/implemented. label Mar 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

C-exporter Component: exporters such as Prometheus, TCP, etc. E-simple Effort: simple. T-enhancement Type: enhancement.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Clearing/Expiration of old values in histogram/limit to histogram

2 participants