Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Need way to alert on time in certain state #1211

Closed
phemmer opened this issue Feb 20, 2017 · 0 comments · Fixed by #1216
Closed

[feature request] Need way to alert on time in certain state #1211

phemmer opened this issue Feb 20, 2017 · 0 comments · Fixed by #1216

Comments

@phemmer
Copy link

phemmer commented Feb 20, 2017

This is a feature request to solve an issue previously discussed on the google group: https://groups.google.com/d/topic/influxdb/1CdjyXrhbjY/discussion
I've been trying to solve this problem for months and have yet to find a solution.

The request is to be able to alert when something has been in a certain state for too long. That google group posting gives one example, but I have others, which are more relevant to other people.
For example:

Using the telegraf ntpq probe, to monitor the ntp daemon on our systems, we want to alert when the host has not had any active peer for a certain amount of time. Warn at 5 minutes, critical at 1 hour.

This logic used to determine this is: when 0 points are seen within the interval where state_prefix == '*'.

Another way to look at this is: when the time duration since the last point where state_prefix == '*' is greater than the threshold.
However this has one significant different from the above logic in that this approach needs to also handle if state_prefix == '*' was never seen, and therefore there is no starting point to calculate duration. But this logic does feel like it would have more potential use cases.


One proposal to solve this would be a new node which takes a lambda parameter, and emits a point containing the duration for which the lambda has evaluated as true.

Usage in my scenario would then be:

data
  | state_duration(lambda: "state_prefix" != '*').unit(1m)
  | default('state_duration',-1)
  | alert()
    .warn(lambda: "state_duration" >= 5)
    .crit(lambda: "state_duration" >= 60)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant