Skip to content

Commit

Permalink
mixin: Use sidecar's metric timestamp for healthcheck
Browse files Browse the repository at this point in the history
During prometheus updates the alert was firing because the metric was
initialized with a value of '0' before the first heartbeat was sent. As
such, the evaluation of the alert results into actually taking just the
value of time() into consideration which led to misleading information
about the health of the sidecar.

As the thanos_sidecar_last_heartbeat_success_time_seconds metric is
effectively just a timestamp that resets on new deployments, we can
simply wrap it around the timestamp() function which should return
almost the same value of the metric itself with the added benefit that
heartbeat resets will be ignored.

Signed-off-by: Markos Chandras <markos@chandras.me>
  • Loading branch information
hwoarang committed Sep 24, 2020
1 parent 7b5c426 commit 47cb8d8
Show file tree
Hide file tree
Showing 2 changed files with 3 additions and 1 deletion.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,7 @@ Thanos Rule's `/api/v1/rules` endpoint no longer returns the old, deprecated `pa
- [#3095](https://github.com/thanos-io/thanos/pull/3095) Ruler: Update the manager when all rule files are removed.
- [#3105](https://github.com/thanos-io/thanos/pull/3105) Querier: Fix overwriting `maxSourceResolution` when auto downsampling is enabled.
- [#3010](https://github.com/thanos-io/thanos/pull/3010) Querier: Added `--query.lookback-delta` flag to override the default lookback delta in PromQL. The flag should be lookback delta should be set to at least 2 times of the slowest scrape interval. If unset it will use the PromQL default of 5m.
- [#3204](https://github.com/thanos-io/thanos/pull/3204) Mixin: Use sidecar's metric timestamp for healthcheck.

### Added

Expand Down
3 changes: 2 additions & 1 deletion mixin/alerts/sidecar.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -29,8 +29,9 @@
summary: 'Thanos Sidecar is unhealthy.',
},
expr: |||
time() - max(thanos_sidecar_last_heartbeat_success_time_seconds{%(selector)s}) by (job, pod) >= 600
time() - max(timestamp(thanos_sidecar_last_heartbeat_success_time_seconds{%(selector)s})) by (job,pod) >= 600
||| % thanos.sidecar,
'for': '2m'
labels: {
severity: 'critical',
},
Expand Down

0 comments on commit 47cb8d8

Please sign in to comment.