-
Notifications
You must be signed in to change notification settings - Fork 2.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThanosSidecarUnhealthy doesn't fire if the sidecar is never healthy #3990
Comments
Based on how the That being said, I can see a couple of other solutions:
|
Hello 👋 Looks like there was no activity on this issue for the last two months. |
This issue is still valid. cc @arajkumar @slashpai can you please have a look once you have some time on your hands? |
@dgrisonnet ya I can take a look at this one /assign |
Wouldn't the following resolve this:
? |
@paulfantom Yes, it should fix the problem. I will test it. |
IMHO, The following query should ideally work,
I will test it before raising a PR. |
I had tested with local instances with following config prometheus#0
prometheus#1
thanos sidecar#0
thanos sidecar#1
Even after I suspend prometheus#0(kill -TSTP <pid>), Proposed expression
|
This issue can't be solved by adding Also, according to #3204, the If we were to continue with the same metric for this alert, may I suggest:
|
@dgrisonnet IMHO, without |
No, you will end up in a situation where the alert will fire immediately since the query will be evaluated as |
Okay, Do you think adding a In our case, cmo has 1hr duration for this alert. EDIT: |
Thanos, Prometheus and Golang version used:
Thanos mixins main.
What happened:
The
ThanosSidecarUnhealthy
alert never fire if the sidecar is never healthy.What you expected to happen:
I would've expected the alert to fire 10 minutes after the sidecar start.
How to reproduce it (as minimally and precisely as possible):
Create a sidecar with a wrong
prometheus.url
so that it will never be able to scrape Prometheus and thus never be healthy.Anything else we need to know:
The problems lies in the Prometheus query used by the alert:
If the Thanos sidecar is unheathy from start up and remain in this state, the
thanos_sidecar_last_heartbeat_success_time_seconds
metric will not be initialized so callingtimestamp
on it will not return any value. As a result theThanosSidecarUnhealthy
alert will not fire even though the sidecar is unhealthy.The text was updated successfully, but these errors were encountered: