Feature Request: Time-Averaged Threshold for Alerts #93

RedwindA · 2024-08-07T02:05:03Z

Description:
Currently, beszel only supports instant thresholds for server monitoring and alerting. This can lead to false alarms triggered by normal, short-term operations such as file compression that may temporarily spike resource usage.

Feature request:
Implement a new threshold type that triggers alerts based on the average value of a monitored metric over a specified time period, rather than instantaneous values.

Proposed functionality:

Allow users to set a time period (e.g., 5 minutes, 1 hour) for averaging the monitored value.
Calculate the average value of the metric over the specified time period.
Trigger an alert only if this average value exceeds the set threshold.

Example use case:

Metric: CPU usage
Time period: 15 minutes
Threshold: 80%

In this scenario, an alert would only be triggered if the average CPU usage over a 15-minute period exceeds 80%, reducing false alarms from short-term spikes.

Benefits:

Reduced false alarms from temporary spikes in resource usage
More accurate representation of sustained resource constraints
Improved ability to distinguish between normal operations and actual issues

This feature would significantly enhance beszel's monitoring capabilities and provide more meaningful alerts to users.

henrygd · 2024-08-07T16:47:21Z

I can add options at some point for 10m, 20m, and 2h time periods.

Shouldn't add much overhead since we're already calculating those averages for the 12h, 24h, and 1w charts.

Just a point of clarification - the threshold currently is not instant. It works exactly as you outlined -- time averaged -- but only based on one minute intervals. So you can have short spikes above threshold of under a minute that won't trigger an alert.

That may be what you meant, but wanted to point that out in case anyone was wondering.

RedwindA · 2024-08-07T16:50:26Z

Thank you for your explanation! I hope it can be a customizable value instead of hardcoded options, as the AUP varies across different IDCs, and the allowed duration for full load differs as well

ghost · 2024-08-12T16:28:12Z

This would go a long way at improving the alerting features, I would love to see this implemented.

Would it be possible to have multiple alert triggers for each metric? This would make it even more customisable.

henrygd · 2024-08-12T22:09:30Z

Maybe a better implementation would be to add another slider allowing you to choose any number of minutes from 1m to 60m?

This would be slightly more intensive as we'd need to query, loop, and decode json for previous 1m records.

But we'd only need to do that if the alert hasn't been triggered and the current 1m record is above threshold, or the alert is triggered and the current record is below threshold.

Most of the time you'll be below threshold and without a triggered alert, so that operation wouldn't need to run.

Seems like that may be the way to go.

henrygd · 2024-10-16T22:38:23Z

Added in 0.6.0.

Please update and let me know if you run into any issues with it.

Matthias-vdE · 2024-10-17T11:13:37Z

How to dismiss an active alert? I currently have an alert for one of my servers:

I'm fine with the disk being filled for 50%, but even now raising it to 80%, the alert stays:

I assume I have to wait for another 10 minutes to pass? Disabling the alert and re-enabling it made it go away.
(Great feature by the way! It seriously reduces the alerts from small CPU spikes when doing updates).

henrygd · 2024-10-17T14:36:51Z

@Matthias-vdE It should clear on the next system update, but I'll change it so the alert gets set to inactive if you update the time or threshold.

henrygd added the enhancement New feature or request label Aug 7, 2024

henrygd mentioned this issue Aug 12, 2024

[Feature Request] Configurable time window before triggering an alert #115

Closed

henrygd mentioned this issue Sep 17, 2024

[Feature Request] Temps Notification/Alert #131

Closed

henrygd mentioned this issue Sep 29, 2024

CPU usage should have a minimum time interval (and maybe memory too) #191

Closed

henrygd closed this as completed Oct 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature Request: Time-Averaged Threshold for Alerts #93

Feature Request: Time-Averaged Threshold for Alerts #93

RedwindA commented Aug 7, 2024

henrygd commented Aug 7, 2024

RedwindA commented Aug 7, 2024 •

edited

Loading

ghost commented Aug 12, 2024

henrygd commented Aug 12, 2024

henrygd commented Oct 16, 2024

Matthias-vdE commented Oct 17, 2024

henrygd commented Oct 17, 2024

Feature Request: Time-Averaged Threshold for Alerts #93

Feature Request: Time-Averaged Threshold for Alerts #93

Comments

RedwindA commented Aug 7, 2024

henrygd commented Aug 7, 2024

RedwindA commented Aug 7, 2024 • edited Loading

ghost commented Aug 12, 2024

henrygd commented Aug 12, 2024

henrygd commented Oct 16, 2024

Matthias-vdE commented Oct 17, 2024

henrygd commented Oct 17, 2024

RedwindA commented Aug 7, 2024 •

edited

Loading