Skip to content

Commit

Permalink
[docs] 7.10 APM docs updates (#80605) (#81248)
Browse files Browse the repository at this point in the history
  • Loading branch information
bmorelli25 authored Oct 21, 2020
1 parent 9a4e2af commit 633dc4f
Show file tree
Hide file tree
Showing 31 changed files with 97 additions and 47 deletions.
26 changes: 18 additions & 8 deletions docs/apm/apm-alerts.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -18,12 +18,22 @@ image::apm/images/apm-alert.png[Create an alert in the APM app]
For a walkthrough of the alert flyout panel, including detailed information on each configurable property,
see Kibana's <<defining-alerts,defining alerts>>.

The APM app supports two different types of threshold alerts: transaction duration, and error rate.
Below, we'll create one of each.
The APM app supports four different types of alerts:

* Transaction duration anomaly:
alerts when the service's transaction duration reaches a certain anomaly score
* Transaction duration threshold:
alerts when the service's transaction duration exceeds a given time limit over a given time frame
* Transaction error rate threshold:
alerts when the service's transaction error rate is above the selected rate over a given time frame
* Error count threshold:
alerts when service exceeds a selected number of errors over a given time frame

Below, we'll walk through the creation of two of these alerts.

[float]
[[apm-create-transaction-alert]]
=== Create a transaction duration alert
=== Example: create a transaction duration alert

Transaction duration alerts trigger when the duration of a specific transaction type in a service exceeds a defined threshold.
This guide will create an alert for the `opbeans-java` service based on the following criteria:
Expand Down Expand Up @@ -57,17 +67,17 @@ Enter a name for the connector,
and paste the webhook URL.
See Slack's webhook documentation if you need to create one.

Add a message body in markdown format.
A default message is provided as a starting point for your alert.
You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}`
to pass alert values at the time a condition is detected to an action.
to pass additional alert values at the time a condition is detected to an action.
A list of available variables can be accessed by selecting the
**add variable** button image:apm/images/add-variable.png[add variable button].

Select **Save**. The alert has been created and is now active!

[float]
[[apm-create-error-alert]]
=== Create an error rate alert
=== Example: create an error rate alert

Error rate alerts trigger when the number of errors in a service exceeds a defined threshold.
This guide creates an alert for the `opbeans-python` service based on the following criteria:
Expand All @@ -94,9 +104,9 @@ Based on the alert criteria, define the following alert details:
Select the **Email** action type and click **Create a connector**.
Fill out the required details: sender, host, port, etc., and click **save**.

Add a message body in markdown format.
A default message is provided as a starting point for your alert.
You can use the https://mustache.github.io/[Mustache] template syntax, i.e., `{{variable}}`
to pass alert values at the time a condition is detected to an action.
to pass additional alert values at the time a condition is detected to an action.
A list of available variables can be accessed by selecting the
**add variable** button image:apm/images/add-variable.png[add variable button].

Expand Down
2 changes: 1 addition & 1 deletion docs/apm/filters.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -69,7 +69,7 @@ the host filter will still be applied.

These filters are very useful for quickly and easily removing noise from your data.
With just a click, you can filter your transactions by the transaction result,
host, container ID, and more.
host, container ID, Kubernetes pod, and more.

[role="screenshot"]
image::apm/images/local-filter.png[Local filters available in the APM app in Kibana]
Binary file modified docs/apm/images/apm-alert.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-distributed-tracing.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-error-group.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-errors-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-geo-ui.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-query-bar.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-service-map-anomaly.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-services-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-settings.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-traces.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-transaction-response-dist.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-transaction-sample.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/apm-transactions-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/example-metadata.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/jvm-metrics-overview.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/jvm-metrics.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/local-filter.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/service-maps-java.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/service-maps.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/apm/images/service-quick-health.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file modified docs/apm/images/specific-transaction.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
7 changes: 6 additions & 1 deletion docs/apm/machine-learning.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,12 @@ Machine learning jobs are created per environment, and are based on a service's
Because jobs are created at the environment level,
you can add new services to your existing environments without the need for additional machine learning jobs.

After a machine learning job is created, results are shown in two places:
Results from machine learning jobs are shown in multiple places throughout the APM app:

* The **Services overview** provides a quick-glance view of the general health of all of your services.
+
[role="screenshot"]
image::apm/images/service-quick-health.png[Example view of anomaly scores on response times in the APM app]

* The transaction duration chart will show the expected bounds and add an annotation when the anomaly score is 75 or above.
+
Expand Down
2 changes: 1 addition & 1 deletion docs/apm/service-maps.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,7 @@ distributed tracing will not work, and the connection will not be drawn on the m
Select the **Service Map** tab to get started.
By default, all instrumented services and connections are shown.
Whether you're onboarding a new engineer, or just trying to grasp the big picture,
click around, zoom in and out, and begin to visualize how your services are connected.
drag things around, zoom in and out, and begin to visualize how your services are connected.

If there's a specific service that interests you, select that service to highlight its connections.
Clicking **Focus map** will refocus the map on that specific service and lock the connection highlighting.
Expand Down
11 changes: 8 additions & 3 deletions docs/apm/services.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -2,8 +2,13 @@
[[services]]
=== Services overview

The *Services* overview gives you quick insights into the health and general performance of all of your instrumented services.
Services are sorted by the `service.name` configured in each of the {apm-agents-ref}[APM agents] you’ve installed.
The *Services* overview page provides a quick, high-level overview of the health and general
performance of all instrumented services.

To help surface potential issues, services are sorted by their health status:
**critical** > **warning** > **healthy** > **unknown**.
Health status is powered by machine learning and requires anomaly detection to be enabled.
Learn more in <<machine-learning-integration,machine learning>>.

[role="screenshot"]
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]
image::apm/images/apm-services-overview.png[Example view of services table the APM app in Kibana]
9 changes: 6 additions & 3 deletions docs/apm/spans.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
=== Trace sample timeline

The trace sample timeline visualization is a bird's-eye view of what your application was doing while it was trying to respond to a request.
This makes it useful for visualizing where the selected transaction spent most of its time.
This makes it useful for visualizing where a selected transaction spent most of its time.

[role="screenshot"]
image::apm/images/apm-transaction-sample.png[Example of distributed trace colors in the APM app in Kibana]
Expand Down Expand Up @@ -43,9 +43,12 @@ this makes finding possible bottlenecks throughout your application much easier
image::apm/images/apm-distributed-tracing.png[Example view of the distributed tracing in APM app in Kibana]

Don't forget; by definition, a distributed trace includes more than one transaction.
When viewing these distributed traces in the timeline waterfall, you'll see this image:apm/images/transaction-icon.png[APM icon] icon,
When viewing distributed traces in the timeline waterfall,
you'll see this icon: image:apm/images/transaction-icon.png[APM icon],
which indicates the next transaction in the trace.
These transactions can be expanded and viewed in detail by clicking on them.
For easier problem isolation, transactions can be collapsed in the waterfall by clicking
the icon to the left of the transactions.
Transactions can also be expanded and viewed in detail by clicking on them.

After exploring these traces,
you can return to the full trace by clicking *View full trace*.
Expand Down
3 changes: 2 additions & 1 deletion docs/apm/traces.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,8 @@ and which services were part of it.
In addition to the Traces overview, you can view your application traces in the <<spans,trace sample timeline waterfall>>.

The *Traces* overview displays the entry transaction for all traces in your application.
If you're using <<distributed-tracing>>, this view is key to finding the critical paths within your application.
If you're using <<distributed-tracing,distributed tracing>>,
this view is key to finding the critical paths within your application.
Transactions with the same name are grouped together and only shown once in this table.

By default, transactions are sorted by _Impact_.
Expand Down
67 changes: 38 additions & 29 deletions docs/apm/transactions.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -10,29 +10,35 @@ Selecting a <<services,*service*>> brings you to the *transactions* overview.
[role="screenshot"]
image::apm/images/apm-transactions-overview.png[Example view of transactions table in the APM app in Kibana]

The *time spent by span type*, *transaction duration*, and *requests per minute* chart display information on all transactions associated with the selected service:

*Time spent by span type*::
Visualize where your application is spending most of its time.
For example, is your app spending time in external calls, database processing, or application code execution?
+
The time a transaction took to complete is also recorded and displayed on the chart under the "app" label.
"app" indicates that something was happening within the application, but we're not sure exactly what.
This could be a sign that the agent does not have auto-instrumentation for whatever was happening during that time.
+
It's important to note that if you have asynchronous spans, the sum of all span times may exceed the duration of the transaction.
The *transaction duration*, *transactions per minute*, *transaction error rate*, and *time spent by span type*
charts display information on all transactions associated with the selected service:

*Transaction duration*::
Response times for this service, broken down into average, 95th, and 99th percentile.
If there's a weird spike that you'd like to investigate,
you can simply zoom in on the graph - this will adjust the specific time range,
and all of the data on the page will update accordingly.

*Requests per minute*::
*Transactions per minute*::
Visualize response codes: `2xx`, `3xx`, `4xx`, etc.,
and is useful for determining if you're serving more of one code than you typically do.
Like in the Transaction duration graph, you can zoom in on anomalies to further investigate them.

*Transaction error rate*::
Visualize the total number of transactions with errors divided by the total number of transactions.
Any unexpected increases, decreases, or irregular patterns can be investigated further
with the <<errors,errors overview>>.

*Time spent by span type*::
Visualize where your application is spending most of its time.
For example, is your app spending time in external calls, database processing, or application code execution?
+
The time a transaction took to complete is also recorded and displayed on the chart under the "app" label.
"app" indicates that something was happening within the application, but we're not sure exactly what.
This could be a sign that the agent does not have auto-instrumentation for whatever was happening during that time.
+
It's important to note that if you have asynchronous spans, the sum of all span times may exceed the duration of the transaction.

[[transactions-table]]
==== Transactions table

Expand Down Expand Up @@ -61,42 +67,45 @@ refer to the documentation for each {apm-agents-ref}[APM Agent] you've implement
==== RUM Transaction overview

The transaction overview page is customized for the JavaScript RUM Agent.
This page highlights things like *page load times*, *transactions per minute*, and even the *average page load duration distribution by country*.
Specifically, the page highlights *page load times* for your service:

[role="screenshot"]
image::apm/images/apm-geo-ui.png[average page load duration distribution]

This data is available due to the geo-ip and user agent pipelines being enabled by default,
which allows for the capture of geo-location and user agent data.
These visualizations make it easy for you to visualize performance information about your
end-users' experience based on their location.
Additional RUM goodies, like core vitals, and visitor breakdown by browser, location, and device,
are available in the Observability User Experience tab.
// To do
// Add link to the Observability UE docs when complete

[[transaction-details]]
==== Transaction details

Selecting a transaction group will bring you to the *transaction* details.
Transaction details include a high-level overview of the time spent by span type,
transaction group duration, requests per minute, and transaction group duration distribution.
It's important to note that all of these graphs show data from every transaction within the selected transaction group.
This page is visually similar to the transaction overview, but it shows data from all transactions within
the selected transaction group.

[role="screenshot"]
image::apm/images/apm-transaction-response-dist.png[Example view of response time distribution]

Up to ten sampled transactions are also displayed.
These sampled transactions are based on your selection in the *Transactions duration distribution*.
You can update the sampled transactions by selecting a new _bucket_ in the transactions duration distribution graph.
The number of requests per bucket is displayed when hovering over the graph, and the selected bucket is highlighted to stand out.
These sampled transactions are based on the _bucket_ selection in the *Transactions duration distribution* chart.
You can update the sampled transactions by selecting a new _bucket_.
The number of requests per bucket is displayed when hovering over the graph,
and the selected bucket is highlighted to stand out.

The screenshot below shows a typical distribution, and indicates most of our requests were served quickly--awesome!
It's the requests on the right, the ones taking longer than average, that we probably want to focus on.

[role="screenshot"]
image::apm/images/apm-transaction-duration-dist.png[Example view of transactions duration distribution graph]

This graph shows a typical distribution, and indicates most of our requests were served quickly--awesome!
It's the requests on the right, the ones taking longer than average, that we probably want to focus on.

When you select one of these buckets,
When you select a bucket,
you're presented with up to ten trace samples.
Each sample has a trace timeline waterfall that shows what a typical request in that bucket was doing.
By investigating this timeline waterfall, we can hopefully determine _why_ this request was slow and then implement a fix.
Each sample has a trace timeline waterfall that shows how a typical request in that bucket executed.
This waterfall is useful for understanding the parent/child hierarchy of transactions and spans,
and ultimately determining _why_ a request was slow.
For large waterfalls, expand problematic transactions and collapse well-performing ones
for easier problem isolation and troubleshooting.

[role="screenshot"]
image::apm/images/apm-transaction-sample.png[Example view of transactions sample]
Expand Down
17 changes: 17 additions & 0 deletions docs/apm/troubleshooting.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ Also, check out the https://discuss.elastic.co/c/apm[APM discussion forum].
* <<troubleshooting-too-many-transactions>>
* <<troubleshooting-unknown-route>>
* <<troubleshooting-fields-unsearchable>>
* <<service-map-rum-connections>>

[float]
[[no-apm-data-found]]
Expand Down Expand Up @@ -180,3 +181,19 @@ setup.template.append_fields:
type: object
dynamic: true
----

[float]
[[service-map-rum-connections]]
=== Service maps: no connection between client and server

If the service map is not showing an expected connection between the client and server,
it's likely because you haven't configured
{apm-agent-rum}/configuration.html#distributed-tracing-origins[`distributedTracingOrigins`].


This setting is necessary, for example, for cross-origin requests.
If you have a basic web application that provides data via an API on `localhost:4000`,
and serves HTML from `localhost:4001`, you'd need to set `distributedTracingOrigins: ['https://localhost:4000']`
to ensure the origin is monitored as a part of distributed tracing.
In other words, `distributedTracingOrigins` is consulted prior to the agent adding the
distributed tracing `traceparent` header to each request.

0 comments on commit 633dc4f

Please sign in to comment.