Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: mixin / add loki compaction not successfull alert #14239

Merged
12 changes: 12 additions & 0 deletions production/loki-mixin-compiled-ssd/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,15 @@ groups:
for: 5m
labels:
severity: warning
- alert: LokiCompactorHasNotSuccessfullyRunCompaction
annotations:
description: |
{{ $labels.cluster }} {{ $labels.namespace }} has not run compaction in the last 24 hours. This may indicate a problem with the compactor.
summary: Loki compaction has not run in the last 24 hours.
expr: |
# The "last successful run" metric is updated even if the compactor owns no tenants,
# so this alert correctly doesn't fire if compactor has nothing to do.
(time() - loki_compactor_apply_retention_last_successful_run_timestamp_seconds > 60 * 60 * 24)
for: 1h
labels:
severity: critical
12 changes: 12 additions & 0 deletions production/loki-mixin-compiled/alerts.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -43,3 +43,15 @@ groups:
for: 5m
labels:
severity: warning
- alert: LokiCompactorHasNotSuccessfullyRunCompaction
annotations:
description: |
{{ $labels.cluster }} {{ $labels.namespace }} has not run compaction in the last 24 hours. This may indicate a problem with the compactor.
summary: Loki compaction has not run in the last 24 hours.
expr: |
# The "last successful run" metric is updated even if the compactor owns no tenants,
# so this alert correctly doesn't fire if compactor has nothing to do.
(time() - loki_compactor_apply_retention_last_successful_run_timestamp_seconds > 60 * 60 * 24)
for: 1h
labels:
severity: critical
19 changes: 19 additions & 0 deletions production/loki-mixin/alerts.libsonnet
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,25 @@
|||, 'cluster', $._config.per_cluster_label),
},
},
{
// Alert if the compactor has not successfully run compaction in the last 24h.
alert: 'LokiCompactorHasNotSuccessfullyRunCompaction',
expr: |||
# The "last successful run" metric is updated even if the compactor owns no tenants,
# so this alert correctly doesn't fire if compactor has nothing to do.
(time() - loki_compactor_apply_retention_last_successful_run_timestamp_seconds > 60 * 60 * 24)
Copy link
Contributor

@ashwanthgoli ashwanthgoli Sep 25, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
(time() - loki_compactor_apply_retention_last_successful_run_timestamp_seconds > 60 * 60 * 24)
(time() - loki_boltdb_shipper_compact_tables_operation_last_successful_run_timestamp_seconds > 60 * 60 * 24)

might be better to use the compaction metric instead of the last successful retention run.
metric name is misleading here, it refers to boltdb but it is updated for tsdb indexes as well.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks I might give metrics renaming a go if I have time

ashwanthgoli marked this conversation as resolved.
Show resolved Hide resolved
|||,
'for': '1h',
labels: {
severity: 'critical',
},
annotations: {
summary: 'Loki compaction has not run in the last 24 hours.',
description: std.strReplace(|||
{{ $labels.cluster }} {{ $labels.namespace }} has not run compaction in the last 24 hours. This may indicate a problem with the compactor.
|||, 'cluster', $._config.per_cluster_label),
},
},
],
},
],
Expand Down
Loading