Add Uptime monitoring content (#162) (#173)

* Add uptime content * Add shared set up cloud content * Add anomaly alert * Minor edits * Edits following review * Edits following review
elastic · Oct 13, 2020 · 35443cd · 35443cd
1 parent bd5f0b5
commit 35443cd
Show file tree

Hide file tree

Showing 44 changed files with 545 additions and 20 deletions.
diff --git a/docs/en/observability/analyze-metrics.asciidoc b/docs/en/observability/analyze-metrics.asciidoc
@@ -16,7 +16,7 @@ Using {metricbeat} modules, you can ingest and analyze
 metrics from servers, Docker containers, Kubernetes orchestrations, explore and
 analyze Prometheus-style metrics or application telemetries, and many more.
 
-To view the {metrics-app}, in the side navigation, expand *Observability*, and then click *Metrics*. 
+To view the {metrics-app}, go to *Observability > Metrics*. 
 
 [role="screenshot"]
 image::images/metrics-app.png[Metrics app in Kibana]
diff --git a/docs/en/observability/analyze-monitors.asciidoc b/docs/en/observability/analyze-monitors.asciidoc
@@ -0,0 +1,60 @@
+[[analyze-monitors]]
+= Analyze monitors
+
+To access this page, go to *Observability > Uptime*. From the *Overview* page,
+click on a listed monitor to view more details and analyze further.
+
+The monitor detail screen displays several panels of information.
+
+[[uptime-status-panel]] 
+== Status panel
+
+The *Status* panel displays a summary of the latest information regarding your monitor.
+You can view its availability, click a link to visit the targeted URL, view when the
+TLS certificate expires, and determine the amount of time that has elapsed since the last check.
+
+[role="screenshot"]
+image::images/uptime-status-panel.png[Uptime status panel]
+
+The *Monitoring from* list displays service availability per monitoring location,
+along with the amount of time elapsed since data was received from that location.
+The availability percentage is the percentage of successful checks made during
+the selected time period.
+
+To display a map with each location as a pinpoint, you can toggle the availability view from list
+view to map view.
+
+[[uptime-monitor-duration]] 
+== Monitor duration 
+
+The *Monitor duration* chart displays the timing for each check that was performed. The visualization
+helps you to gain insights into how quickly requests resolve by the targeted endpoint and give you a
+sense of how frequently a host or endpoint was down in your selected timespan.
+
+Included on this chart is the anomaly detection ({ml}) integration. For more information, see
+<<inspect-uptime-duration-anomalies,Inspect Uptime duration anomalies>>.
+
+[role="screenshot"]
+image::images/monitor-duration-chart.png[Monitor duration chart]
+
+[[uptime-pings-chart]] 
+== Pings over time 
+
+The *Pings over time* chart is a graphical representation of the check statuses over time.
+Hover over the charts to display crosshairs with specific numeric data.
+
+[role="screenshot"]
+image::images/pings-over-time.png[Pings over time chart]
+
+[[uptime-history-panel]]
+== Check history
+
+The *History* table lists the total count of this monitor’s checks for the selected date range.
+To help find recent problems on a per-check basis, you can filter by `status`
+and `location`.
+
+This table can help you gain insights into more granular details
+about recent individual data points that {heartbeat} is logging about your host or endpoint.
+
+[role="screenshot"]
+image::images/uptime-history.png[Monitor history list]
diff --git a/docs/en/observability/configure-logs-sources.asciidoc b/docs/en/observability/configure-logs-sources.asciidoc
@@ -15,7 +15,7 @@ default configuration settings.
 [[edit-config-settings]]
 == Edit configuration settings
 
-. In the side navigation, expand *Observability*, and then click *Logs*.
+. To access this page, go to *Observability > Logs*.
 +
 . Click *Settings*.
 +
@@ -58,7 +58,7 @@ base field, `message`, is used.
 
 |=== 
 
-1. To add a new column to the logs stream, in the *Settings* tab, click *Add column*.
+1. To add a new column to the logs stream, select *Settings > Add column*.
 2. In the list of available fields, select the field you want to add.
 To filter the field list by that name, you can start typing a field name in the search box.
 3. To remove an existing column, click the *Remove this column* icon.

diff --git a/docs/en/observability/configure-metrics-sources.asciidoc b/docs/en/observability/configure-metrics-sources.asciidoc
@@ -9,7 +9,7 @@ and container names.
 [[metrics-config-settings]]
 == Override configuration settings
 
-. In the side navigation, expand *Observability*, and then click *Metrics*.
+. To access this page, go to *Observability > Metrics*.
 +
 . Click *Settings*.
 +

diff --git a/docs/en/observability/configure-uptime-settings.asciidoc b/docs/en/observability/configure-uptime-settings.asciidoc
@@ -0,0 +1,75 @@
+[[configure-uptime-settings]]
+= Configure settings
+
+The *Settings* page enables you to change which {heartbeat} indices are displayed
+by the {uptime-app}, configure alert connectors, and set expiration/age thresholds
+for TLS certificates.
+
+Uptime settings apply to the current space only. To segment
+different uptime use cases and domains, use different settings in other spaces.
+
+. To access this page, go to *Observability > Uptime*.
+. Click *Settings*.
++
+[IMPORTANT]
+=====
+To modify items on this page, you must have the {kibana-ref}/space-rbac-tutorial.html[`all`]
+privilege granted to your role. The `all` privilege grants cluster administration operations, like snapshotting,
+node shutdown/restart, settings update, rerouting, or managing users and roles.
+=====
+
+[[configure-uptime-indices]]
+== Configure indices
+
+Specify a comma-separated list of index patterns to match indices in {es} that contain {heartbeat} data.
+
+[NOTE]
+=====
+The pattern set here only restricts what the {uptime-app} displays. You can still query {es} for
+data outside of this pattern.
+=====
+
+[role="screenshot"]
+image::images/heartbeat-indices.png[Heartbeat indices]
+
+[[configure-uptime-alert-connectors]]
+== Configure alert connectors
+
+*Alerts* work by running checks on a schedule to detect conditions. When a condition is met, the alert tracks
+it as an *alert instance* and responds by triggering one or more *actions*. 
+Actions typically involve interaction with {kib} services or third party integrations. *Connectors* allow actions
+to talk to these services and integrations.
+
+Click *Create connector* and follow the prompts to select a connector type and configure its properties.
+After you create a connector, it's available to you anytime you set up an alert action in the current space.
+
+For more information about each connector, see {kibana-ref}/action-types.html[action types and connectors].
+
+[role="screenshot"]
+image::images/alert-connector.png[Alert connector]
+
+[[configure-cert-thresholds]]
+== Configure certificate thresholds
+
+You can modify certificate thresholds to control how Uptime displays your TLS values in
+the <<view-certificate-status,Certificates>> page. These settings also determine which certificates are
+selected by any TLS alert you create.
+
+|=== 
+
+| *Expiration threshold* | The `expiration` threshold specifies when you are notified
+about certificates that are approaching expiration dates. When the value of a certificate's remaining valid days falls
+below the `Expiration threshold`, it's considered a warning state. When you define a 
+<<tls-certificate-alert,TLS alert>>, you receive a notification about the certificate.
+
+| *Age limit* | The `age` threshold specifies when you are notified about certificates
+that have been valid for too long.
+
+|=== 
+
+A standard security requirement is to make sure that your TLS certificates have not been
+valid for longer than a year. To help you keep track of which certificates you may want to refresh, 
+modify the *Age limit* value to `365` days.
+
+[role="screenshot"]
+image::images/cert-expiry-settings.png[Certificate expiry settings]
diff --git a/docs/en/observability/create-alerts.asciidoc b/docs/en/observability/create-alerts.asciidoc
@@ -15,3 +15,9 @@ include::logs-threshold-alert.asciidoc[leveloffset=+1]
 include::infrastructure-threshold-alert.asciidoc[leveloffset=+1]
 
 include::metrics-threshold-alert.asciidoc[leveloffset=+1]
+
+include::monitor-status-alert.asciidoc[leveloffset=+1]
+
+include::uptime-tls-alert.asciidoc[leveloffset=+1]
+
+include::uptime-duration-anomaly-alert.asciidoc[leveloffset=+1]
diff --git a/docs/en/observability/explore-metrics.asciidoc b/docs/en/observability/explore-metrics.asciidoc
@@ -9,7 +9,7 @@ for one or more resources that you are monitoring.
 Additionally, for detailed analyses of your metrics, you can annotate and save visualizations for
 your custom dashboards by using the {kibana-ref}/dashboard.html#tsvb[Time Series Visual Builder (TSVB)] within {kib}.
 
-In the side navigation, expand *Observability*, click *Metrics*, and then click *Metrics Explorer*.
+To access this page, go to *Observability > Metrics*, and then click *Metrics Explorer*.
 
 By default, the Metrics Explorer page displays the CPU usage for hosts, Kubernetes pods, and Docker containers.
 The initial configuration has the *Average* aggregation selected, the *of* field is populated with the default metrics,

diff --git a/docs/en/observability/images/alert-connector.png b/docs/en/observability/images/alert-connector.png
diff --git a/docs/en/observability/images/cert-expiry-settings.png b/docs/en/observability/images/cert-expiry-settings.png
diff --git a/docs/en/observability/images/heartbeat-indices.png b/docs/en/observability/images/heartbeat-indices.png
diff --git a/docs/en/observability/images/inspect-uptime-duration-anomalies.png b/docs/en/observability/images/inspect-uptime-duration-anomalies.png
diff --git a/docs/en/observability/images/monitor-duration-chart.png b/docs/en/observability/images/monitor-duration-chart.png
diff --git a/docs/en/observability/images/monitor-status-alert.png b/docs/en/observability/images/monitor-status-alert.png
diff --git a/docs/en/observability/images/monitors-chart.png b/docs/en/observability/images/monitors-chart.png
diff --git a/docs/en/observability/images/monitors-list.png b/docs/en/observability/images/monitors-list.png
diff --git a/docs/en/observability/images/pings-over-time.png b/docs/en/observability/images/pings-over-time.png
diff --git a/docs/en/observability/images/response-durations-alert.png b/docs/en/observability/images/response-durations-alert.png
diff --git a/docs/en/observability/images/tls-alert.png b/docs/en/observability/images/tls-alert.png
diff --git a/docs/en/observability/images/tls-certificates.png b/docs/en/observability/images/tls-certificates.png
diff --git a/docs/en/observability/images/uptime-alert-connectors.png b/docs/en/observability/images/uptime-alert-connectors.png
diff --git a/docs/en/observability/images/uptime-app.png b/docs/en/observability/images/uptime-app.png
diff --git a/docs/en/observability/images/uptime-connector-duration.png b/docs/en/observability/images/uptime-connector-duration.png
diff --git a/docs/en/observability/images/uptime-connector-variables.png b/docs/en/observability/images/uptime-connector-variables.png
diff --git a/docs/en/observability/images/uptime-filter-bar.png b/docs/en/observability/images/uptime-filter-bar.png
diff --git a/docs/en/observability/images/uptime-history.png b/docs/en/observability/images/uptime-history.png
diff --git a/docs/en/observability/images/uptime-status-panel.png b/docs/en/observability/images/uptime-status-panel.png
diff --git a/docs/en/observability/images/uptime-status-variables.png b/docs/en/observability/images/uptime-status-variables.png
diff --git a/docs/en/observability/index.asciidoc b/docs/en/observability/index.asciidoc
@@ -60,6 +60,16 @@ include::explore-metrics.asciidoc[leveloffset=+2]
 
 include::configure-metrics-sources.asciidoc[leveloffset=+2]
 
+include::monitor-uptime.asciidoc[leveloffset=+1]
+
+include::view-monitor-status.asciidoc[leveloffset=+2]
+
+include::analyze-monitors.asciidoc[leveloffset=+2]
+
+include::inspect-uptime-duration-anomalies.asciidoc[leveloffset=+2]
+
+include::configure-uptime-settings.asciidoc[leveloffset=+2]
+
 include::create-alerts.asciidoc[leveloffset=+1]
 
 include::fields-reference.asciidoc[leveloffset=+1]

diff --git a/docs/en/observability/infrastructure-threshold-alert.asciidoc b/docs/en/observability/infrastructure-threshold-alert.asciidoc
@@ -8,8 +8,8 @@ resource or for a group of resources within your infrastructure.
 Additionally, each alert can be defined using multiple
 conditions that combine metrics and thresholds to create precise notifications and reduce false positives.
 
-. In the side navigation, expand *Observability*, and then click *Metrics*.
-. On the *Inventory* page, click *Alerts*, and then select *Create alert*.
+. To access this page, go to *Observability > Metrics*.
+. On the *Inventory* page, click *Alerts > Create alert*.
 
 [role="screenshot"]
 image::images/inventory-create-alert.png[Closeup of the open Alerts menu on the Inventory page]

diff --git a/docs/en/observability/ingest-logs.asciidoc b/docs/en/observability/ingest-logs.asciidoc
@@ -18,6 +18,12 @@ If you haven't already, you need to install {es} for storing and searching your
 managing it. For more information, see <<install-observability,Get started>>.
 =====
 
+Install and configure {filebeat} on your servers to collect log events. {filebeat} allows you ship log data from sources that come
+in the form of files. It monitors the log files or locations that you specify,
+collects log events, and forwards them to {es}. To ease the collection and parsing of
+log formats for common applications such as Apache, MySQL, and Kafka, a number of
+{filebeat-ref}/filebeat-modules.html[modules] are available.
+
 [[install-filebeat]]
 == Step 1: Install {beatname_uc}
 
@@ -174,7 +180,7 @@ Let's confirm your data is correctly streaming to your cloud instance.
 include::{beats-repo-dir}/tab-widgets/open-kibana-widget.asciidoc[]
 --
 
-. In the side navigation, expand *{kib}*, and then click *Discover*.
+. In the side navigation, click *{kib} > Discover*.
 +
 . Select `filebeat-*` as your index pattern.
 +

diff --git a/docs/en/observability/ingest-metrics.asciidoc b/docs/en/observability/ingest-metrics.asciidoc
@@ -17,6 +17,16 @@ If you haven't already, you need to install {es} for storing and searching your
 managing it. For more information, see <<install-observability,Get started>>.
 =====
 
+Install and configure {metricbeat} on your servers to collect and preprocess system 
+and service metrics, such as information about running processes, as well as CPU, memory,
+disk, and network utilization numbers.
+
+{metricbeat} comes with predefined assets for parsing, indexing, and
+visualizing your data. To load these assets, {metricbeat} uses
+{metricbeat-ref}/metricbeat-modules.html[modules], before sending them to {es}. Each
+integration defines the basic logic for collecting data from specific services, such as
+Redis or MySQL. A module consists of metricsets that fetch and structure the data.
+
 [[install-metricbeat]]
 == Step 1: Install {metricbeat}
 
@@ -135,7 +145,7 @@ Let's confirm your data is correctly ingested to your cluster.
 include::{beats-repo-dir}/tab-widgets/open-kibana-widget.asciidoc[]
 --
 
-. In the side navigation, expand *{kib}*, and then click *Discover*.
+. In the side navigation, click *{kib} > Discover*
 +
 . Select `metricbeat-*` as your index pattern. 
 +

diff --git a/docs/en/observability/ingest-uptime.asciidoc b/docs/en/observability/ingest-uptime.asciidoc
@@ -15,6 +15,47 @@ If you haven't already, you need to install {es} for storing and searching your
 managing it. For more information, see <<install-observability,Get started>>.
 =====
 
+Install and configure {heartbeat} on your servers to periodically check the status of your 
+services. {heartbeat} uses probing to monitor the availability of services and helps 
+verify that you’re meeting your service level agreements for service uptime.
+You typically install {heartbeat} as part of a monitoring service that runs on a separate machine 
+and possibly even outside of the network where the services that you want to monitor are running.
+
+[[deployment-considerations]]
+== Deployment considerations
+
+There are multiple ways to deploy Uptime and Heartbeat. A guiding principle is that when
+an outage takes down the service being monitored, it should not take down {heartbeat}.
+
+{heartbeat} is commonly run as a centralized service within a data center.
+While it's possible to run it as a separate "sidecar" process paired with each process/container,
+we recommend against it. Running {heartbeat} centrally ensures you will still be able to see
+monitoring data in the event of an overloaded, disconnected, or otherwise malfunctioning server.
+
+For further redundancy, you may want to deploy multiple instances of {heartbeat} across geographic and network boundaries
+to provide more data.
+
+For example:
+
+* A site served from a content delivery network (CDN) with points of presence (POPs) around the globe.
++
+To check if your site is reachable via CDN POPS, deploy multiple {heartbeat} instances at
+different data centers around the world.
++
+* A service within a single data center that is accessed across multiple VPNs.
++ 
+Set up one {heartbeat} instance within the VPN the service operates from, and another within an additional
+VPN that users access the service from. In the event of an outage, having both instances helps pinpoint
+the network errors.
++
+* A single service running primarily in a US east coast data center, with a hot failover located in
+a US west coast data center.
++
+In each data center, run a {heartbeat} instance that checks both the local
+copy of the service and its counterpart across the country. Set up two monitors in each region, one for
+the local service, and one for the remote service. In the event of a data center failure, it will be
+immediately apparent if the service has a connectivity issue to the outside world, or if the failure is only internal.
+
 [[install-heartbeat]]
 == Step 1: Install {beatname_uc}
 
@@ -150,15 +191,17 @@ include::{beats-repo-dir}/tab-widgets/start-widget.asciidoc[]
 [[view-uptime-kibana]]
 == Step 6: View your data in {kib}
 
-To view the <<observability-ui,Observability Overview>> page:
+Let's confirm your data is correctly ingested to your cluster.
 
 . Launch {kib}:
 +
 --
 include::{beats-repo-dir}/tab-widgets/open-kibana-widget.asciidoc[]
 --
 
-. In the side navigation, expand *Observability*, and then click *Overview*.
+. In the side navigation, click *Observability > Uptime*.
+
+Now let's have a look at the <<monitor-uptime,Uptime app>>.
 
 // Add Javascript and CSS for tabbed panels
 include::{beats-repo-dir}/tab-widgets/code.asciidoc[]

diff --git a/docs/en/observability/inspect-uptime-duration-anomalies.asciidoc b/docs/en/observability/inspect-uptime-duration-anomalies.asciidoc
@@ -0,0 +1,31 @@
+[[inspect-uptime-duration-anomalies]]
+= Inspect uptime duration anomalies
+
+Each monitor location is modelled, and when a monitor runs
+for an unusual amount of time, at a particular time, an anomaly is recorded and highlighted
+on the *Monitor duration* chart.
+
+[[uptime-anomaly-detection]]
+== Enable uptime duration anomaly detection
+
+Create a machine learning job to detect anomalous monitor duration rates automatically.
+
+1. To access this page, go to *Observability > Uptime*, and then click a monitor to view its the details.
+2. In the *Monitor duration* panel, click *Enable anomaly detection*.
++
+[NOTE]
+=====
+If anomaly detection is already enabled, click *Anomaly detection* and select to view duration anomalies directly in the
+{ml-docs}/ml-gs-results.html[Machine Learning app], enable an <<duration-anomaly-alert,anomaly alert>>,
+or disable the anomaly detection.
+=====
++
+3. You are prompted to create a <<duration-anomaly-alert,response duration anomaly alert>> for the machine learning job which will carry
+out the analysis, and you can configure which severity level to create the alert for.
+
+When an anomaly is detected, the duration is displayed on the *Monitor duration*
+chart, along with the duration times. The colors represent the criticality of the anomaly: red
+(critical) and yellow (minor).
+
+[role="screenshot"]
+image::images/inspect-uptime-duration-anomalies.png[]
diff --git a/docs/en/observability/install-observability.asciidoc b/docs/en/observability/install-observability.asciidoc
@@ -9,7 +9,7 @@ data, and {kib} for visualizing and managing it.
 [[set-up-on-cloud]]
 == Set up on Cloud
 
-include::{docs-root}/shared/cloud/ess-getting-started.asciidoc[]
+include::{docs-root}/shared/cloud/ess-getting-started-obs.asciidoc[]
 
 [float]
 [[self-manage]]