Skip to content

Commit

Permalink
Update alert history chart screenshots (#4301)
Browse files Browse the repository at this point in the history
* update triage threshold breaches pages

* update triage slo burn rate breaches page

* attempt to address feedback from @maryam-saeidi
  • Loading branch information
colleenmcginnis authored Sep 30, 2024
1 parent 2579aca commit 42101a4
Show file tree
Hide file tree
Showing 7 changed files with 34 additions and 14 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file not shown.
Binary file modified docs/en/observability/images/slo-burn-rate-breach.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
20 changes: 14 additions & 6 deletions docs/en/observability/triage-slo-burn-rate-breaches.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,15 +18,23 @@ You can follow the links to navigate to the source SLO or rule definition.

Explore charts on the page to learn more about the SLO breach:

* *Burn rate chart*. The first chart shows the burn rate during the time range when the alert was active.
The line indicates how close the SLO came to breaching the threshold.
+
[role="screenshot"]
image::images/slo-burn-rate-breach.png[Alert details for SLO burn rate breach]

* The first chart shows the burn rate during the time range when the alert was active.
The line indicates how close the SLO came to breaching the threshold.
* The next chart shows the alerts history over the last 30 days.
It shows the number of alerts that were triggered and the average time it took to recover after a breach.
* Both timelines are annotated to show when the threshold was breached.
+
[TIP]
====
The timeline is annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
====

* *Alerts history chart*. The next chart provides information about alerts for the same rule and group over the last 30 days.
It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days, and the average time it took to recover after a breach.
+
[role="screenshot"]
image::images/log-threshold-breach-alert-history-chart.png[Alert history chart in alert details for SLO burn rate breach]

The number, duration, and frequency of these breaches over time gives you an indication of how severely the service is degrading so that you can focus on high severity issues first.

Expand Down
28 changes: 20 additions & 8 deletions docs/en/observability/triage-threshold-breaches.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -19,22 +19,34 @@ You can follow the links to navigate to the rule definition.

Explore charts on the page to learn more about the threshold breach:

* *Charts for each condition*. The page includes a chart for each condition specified in the rule.
These charts help you understand when the breach occurred and its severity.
+
[role="screenshot"]
image::images/log-threshold-breach.png[Alert details for log threshold breach]
image::images/log-threshold-breach-condition-chart.png[Chart for a condition in alert details for log threshold breach]
+
[TIP]
====
The timeline is annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
====

* The page includes a chart for each condition specified in the rule.
These charts help you understand when the breach occurred and its severity.
* If your rule is intended to detect log threshold breaches
* *Log rate analysis chart*. If your rule is intended to detect log threshold breaches
(that is, it has a single condition that uses a count aggregation),
you can run a log rate analysis, assuming you have the required license.
Running a log rate analysis is useful for detecting significant dips or spikes in the number of logs.
Notice that you can adjust the baseline and deviation, and then run the analysis again.
For more information about using the log rate analysis feature,
refer to the {kibana-ref}/xpack-ml-aiops.html#log-rate-analysis[AIOps Labs] documentation.
* The page may also include an alerts history chart that shows the number of triggered alerts per day for the last 30 days.
This chart is currently only available for rules that specify a single condition.
* Timelines on the page are annotated to show when the threshold was breached.
You can hover over an alert icon to see the timestamp of the alert.
+
[role="screenshot"]
image::images/log-threshold-breach-log-rate-analysis.png[Log rate analysis chart in alert details for log threshold breach]

* *Alerts history chart*. The next chart provides information about alerts for the same rule and group over the last 30 days.
It shows the number of those alerts that were triggered per day, the total number of alerts triggered throughout the 30 days, and the average time it took to recover after a breach.
+
[role="screenshot"]
image::images/log-threshold-breach-alert-history-chart.png[Alert history chart in alert details for log threshold breach]

Analyze these charts to better understand when the breach started, it's current
state, and how the issue is trending.
Expand Down

0 comments on commit 42101a4

Please sign in to comment.