Skip to content

Commit

Permalink
[DOCS] Integrates changes from #101703
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl committed Jun 10, 2021
1 parent 52ed2af commit 687be06
Show file tree
Hide file tree
Showing 3 changed files with 45 additions and 36 deletions.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
81 changes: 45 additions & 36 deletions docs/user/monitoring/kibana-alerts.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -8,43 +8,52 @@ of potential issues in the {stack}. These rules are preconfigured based on the
best practices recommended by Elastic. However, you can tailor them to meet your
specific needs.

[role="screenshot"]
image::user/monitoring/images/monitoring-kibana-alerts.png["{kib} alerts in {stack-monitor-app}"]

When you open *{stack-monitor-app}*, the preconfigured rules are created
automatically. If you collect monitoring data from multiple clusters, these
rules can search, detect, and notify on various conditions across the
clusters. The resulting alerts are visible alongside your existing {watcher}
cluster alerts. You can view details about the alerts that are active and view
health and performance data for {es}, {ls}, and Beats in real time, as well as
analyze past performance. You can also modify active alerts when you enter setup
mode.
automatically. They are initially configured to detect and notify on various
conditions across your monitored clusters. You can view notifications for: *Cluster health*, *Resource utilization*, and *Errors and exceptions* for {es}
in real time.

NOTE: The default {watcher} based "cluster alerts" for {stack-monitor-app} have
been recreated as rules in {kib} {alert-features}. For this reason, the existing
{watcher} email action
`monitoring.cluster_alerts.email_notifications.email_address` no longer works.
The default action for all {stack-monitor-app} rules is to write to {kib} logs
and display a notification in the UI.

[role="screenshot"]
image::user/monitoring/images/monitoring-kibana-alerts.png["Kibana alerts in the Stack Monitoring app"]
image::user/monitoring/images/monitoring-kibana-alerting-notification.png["{kib} alerting notifications in {stack-monitor-app}"]

To review and modify all available rules, use *Enter setup mode* on the *Cluster overview* page in *{stack-monitor-app}*:

To review all the available rules, use
<<alert-management,*Rules and Connectors*>> in *{stack-manage-app}*.
[role="screenshot"]
image::user/monitoring/images/monitoring-kibana-alerting-setup-mode.png["Modify {kib} alerting rules in {stack-monitor-app}"]

[discrete]
[[kibana-alerts-cpu-threshold]]
== CPU threshold
== CPU usage threshold

This rule checks for nodes that run a consistently high CPU load. By default,
the condition is set at 85% or more averaged over the last 5 minutes. The rule
is grouped across all the nodes of the cluster by running checks on a schedule time of 1 minute with a re-notify interval of 1 day.
This rule checks for {es} nodes that run a consistently high CPU load. By
default, the condition is set at 85% or more averaged over the last 5 minutes.
The rule is grouped across all the nodes of the cluster by running checks on a
schedule time of 1 minute with a re-notify interval of 1 day.

[discrete]
[[kibana-alerts-disk-usage-threshold]]
== Disk usage threshold

This rule checks for nodes that are nearly at disk capacity. By default, the
condition is set at 80% or more averaged over the last 5 minutes. The rule is
grouped across all the nodes of the cluster by running checks on a schedule time
of 1 minute with a re-notify interval of 1 day.
This rule checks for {es} nodes that are nearly at disk capacity. By default,
the condition is set at 80% or more averaged over the last 5 minutes. The rule
is grouped across all the nodes of the cluster by running checks on a schedule
time of 1 minute with a re-notify interval of 1 day.

[discrete]
[[kibana-alerts-jvm-memory-threshold]]
== JVM memory threshold

This rule checks for nodes that run a consistently high JVM memory usage. By
This rule checks for {es} nodes that use a high amount of JVM memory. By
default, the condition is set at 85% or more averaged over the last 5 minutes.
The rule is grouped across all the nodes of the cluster by running checks on a
schedule time of 1 minute with a re-notify interval of 1 day.
Expand All @@ -53,27 +62,26 @@ schedule time of 1 minute with a re-notify interval of 1 day.
[[kibana-alerts-missing-monitoring-data]]
== Missing monitoring data

This rule checks for {stack} product nodes or instances that stop sending
monitoring data. By default, the condition is set to missing for 15 minutes
looking back 1 day. The rule is grouped across all the nodes of the cluster by
running checks on a schedule time of 1 minute with a re-notify interval of 6
hours.
This rule checks for {es} nodes that stop sending monitoring data. By default,
the condition is set to missing for 15 minutes looking back 1 day. The rule is
grouped across all the {es} nodes of the cluster by running checks on a schedule
time of 1 minute with a re-notify interval of 6 hours.

[discrete]
[[kibana-alerts-thread-pool-rejections]]
== Thread pool rejections (search/write)

This rule checks for nodes that experience thread pool rejections. By default,
the condition is set at 300 or more over the last 5 minutes. The rule is grouped
across all the nodes of the cluster by running checks on a schedule time of 1
minute with a re-notify interval of 1 day. Thresholds can be set independently
for `search` and `write` type rejections.
This rule checks for {es} nodes that experience thread pool rejections. By
default, the condition is set at 300 or more over the last 5 minutes. The rule
is grouped across all the nodes of the cluster by running checks on a schedule
time of 1 minute with a re-notify interval of 1 day. Thresholds can be set
independently for `search` and `write` type rejections.

[discrete]
[[kibana-alerts-ccr-read-exceptions]]
== CCR read exceptions

This rule checks for read exceptions on any of the replicated clusters. The
This rule checks for read exceptions on any of the replicated {es} clusters. The
condition is met if 1 or more read exceptions are detected in the last hour. The
rule is grouped across all replicated clusters by running checks on a schedule
time of 1 minute with a re-notify interval of 6 hours.
Expand All @@ -83,19 +91,20 @@ time of 1 minute with a re-notify interval of 6 hours.
== Large shard size

This rule checks for a large average shard size (across associated primaries) on
any of the specified index patterns. The condition is met if an index's average
shard size is 55gb or higher in the last 5 minutes. The rule is grouped across
all indices that match the default pattern of `*` by running checks on a
schedule time of 1 minute with a re-notify interval of 12 hours.
any of the specified index patterns in an {es} cluster. The condition is met if
an index's average shard size is 55gb or higher in the last 5 minutes. The rule
is grouped across all indices that match the default pattern of `-.*` by running
checks on a schedule time of 1 minute with a re-notify interval of 12 hours.

[discrete]
[[kibana-alerts-cluster-alerts]]
== Cluster alerts
== Cluster alerting

These rules check the current status of your {stack}. You can drill down into
the metrics to view more information about your cluster and specific nodes, instances, and indices.

An alert occurs if any of the following conditions are met within the last minute:
An action is triggered if any of the following conditions are met within the
last minute:

* {es} cluster health status is yellow (missing at least one replica)
or red (missing at least one primary).
Expand Down

0 comments on commit 687be06

Please sign in to comment.