Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add support for considering connection / timeout related errors as UNKNOWN state #878

Open
atc0005 opened this issue Jun 17, 2024 · 1 comment
Assignees
Labels
config documentation Improvements or additions to documentation enhancement New feature or request output/extended Long Service Output (aka, "extended" or "detailed") output/summary Service Output (aka, "one-line-summary") plugin/check_cert
Milestone

Comments

@atc0005
Copy link
Owner

atc0005 commented Jun 17, 2024

Overview

Because the stock check_http plugin considers connection refusals and timeouts as CRITICAL, this plugin adopted the same behavior when it was initially developed.

The check_cert plugin continues to consider connection refusals and timeouts as CRITICAL state, but there are cases where it could be useful to (optionally) force those issues to be considered as UKNOWN state.

For example, having this support could allow a sysadmin to configure only expirations (approaching, exceeded) as actionable, effectively ignoring connection issues by deselecting UNKNOWN state as triggering notifications. Service checks configured in this way would presume that connection related issues are temporary, relying on a separate service check (e.g., HTTPS GET/POST/whatever) to be responsible for alerting if normal client connections are unsuccessful.

With this setup, less "noise" would be generated from service checks intended to trigger only when certificates are approaching expiration (or failing some other enforced validation requirement).

References

@atc0005 atc0005 added documentation Improvements or additions to documentation enhancement New feature or request config plugin/check_cert output/summary Service Output (aka, "one-line-summary") output/extended Long Service Output (aka, "extended" or "detailed") labels Jun 17, 2024
@atc0005 atc0005 added this to the Future milestone Jun 17, 2024
@atc0005 atc0005 self-assigned this Jun 17, 2024
@atc0005 atc0005 changed the title Add support for considering connection / timeout related errors as UNKNOWN state Add support for considering connection / timeout related errors as UNKNOWN state Jun 17, 2024
@atc0005
Copy link
Owner Author

atc0005 commented Jun 20, 2024

Tangent:

Might be worth doing the same for DNS related errors.

Probably also worth reviewing the current handling and introducing retry tuning.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
config documentation Improvements or additions to documentation enhancement New feature or request output/extended Long Service Output (aka, "extended" or "detailed") output/summary Service Output (aka, "one-line-summary") plugin/check_cert
Projects
None yet
Development

No branches or pull requests

1 participant