Skip to content

Commit

Permalink
[DOCS] Adds analysis step
Browse files Browse the repository at this point in the history
  • Loading branch information
lcawl committed Jun 1, 2021
1 parent 47c76c3 commit 74cc91d
Show file tree
Hide file tree
Showing 6 changed files with 50 additions and 28 deletions.
78 changes: 50 additions & 28 deletions docs/en/stack/ml/anomaly-detection/geographic-anomalies.asciidoc
Original file line number Diff line number Diff line change
Expand Up @@ -4,16 +4,16 @@
= Detecting anomalous locations in geographic data

If your data includes geographic fields, you can use {ml-features} to detect
anomalous behavior, such as a credit card transaction that occurs in an
unusual location or a web request that has an unusual source location.
anomalous behavior, such as a credit card transaction that occurs in an unusual
location or a web request that has an unusual source location.

[discrete]
[geographic-anomalies-prereqs]
== Prerequisites

To run this type of {anomaly-job}, you must have <<setup,{ml-features} set up>>.
You must also have data that contains spatial data types. In particular, you
must have:
You must also have time series data that contains spatial data types. In
particular, you must have:

* two comma-separated numbers of the form `latitude,longitude`,
* a {ref}/geo-point.html[`geo_point`] field,
Expand All @@ -31,11 +31,12 @@ more information, see
[geographic-anomalies-visualize]
== Explore your geographic data

If you want to see more information about your geographic data, you can use the
**{data-viz}** in the **{ml-app}** app. You can search for specific fields or
field types then see how many documents contain those field within a specific
sample size and time period. You can also see the number of distinct values and
preview them on a map. For example:
To get the best results from {ml} analytics, you must understand your data. You
can use the **{data-viz}** in the **{ml-app}** app for this purpose. Search for
specific fields or field types, such as geo-point fields in the sample data sets.
You can see how many documents contain those fields within a specific time
period and sample size. You can also see the number of distinct values, a list
of example values, and preview them on a map. For example:

[role="screenshot"]
image::images/weblogs-data-visualizer-geopoint.jpg[A screenshot of a geo_point field in {data-viz}]
Expand All @@ -60,8 +61,8 @@ the advanced job wizard. Alternatively, use the
{ref}/ml-put-job.html[create {anomaly-jobs} API].

For example, create a job that analyzes the sample eCommerce orders data set to
find orders with unusual `geoip.location` values relative to the past behavior
of each `user` ID:
find orders with unusual coordinates (`geoip.location` values) relative to the
past behavior of each customer (`user` ID):

[role="screenshot"]
image::images/ecommerce-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the eCommerce data in {kib}]
Expand All @@ -77,7 +78,7 @@ PUT _ml/anomaly_detectors/ecommerce-geo <1>
"bucket_span":"15m",
"detectors": [
{
"detector_description": "Unusual coordinates",
"detector_description": "Unusual coordinates by user",
"function": "lat_long",
"field_name": "geoip.location",
"by_field_name": "user"
Expand All @@ -90,8 +91,7 @@ PUT _ml/anomaly_detectors/ecommerce-geo <1>
]
},
"data_description" : {
"time_field": "order_date",
"time_format": "epoch_ms"
"time_field": "order_date"
}
}
Expand All @@ -116,7 +116,7 @@ POST _ml/anomaly_detectors/ecommerce-geo/_open <3>
POST _ml/datafeeds/datafeed-ecommerce-geo/_start <4>
{
"end": "2021-04-18T18:00:00Z"
"end": "2021-06-19T24:00:00Z"
}
--------------------------------------------------
<1> Create the {anomaly-job}.
Expand All @@ -134,8 +134,8 @@ include::../shared/influencers.asciidoc[]
****

Alternatively, create a job that analyzes the sample web logs data set to detect
unusual `geo.coordinates` values for each host or anomalous behavior in the sum
of the `bytes` field:
events with unusual coordinates (`geo.coordinates` values) or unusually high
sums of transferred data (`bytes` values):

[role="screenshot"]
image::images/weblogs-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the web logs data in {kib}]
Expand All @@ -151,26 +151,24 @@ PUT _ml/anomaly_detectors/weblogs-geo <1>
"bucket_span":"15m",
"detectors": [
{
"detector_description": "Unusual coordinates partitioned by host",
"detector_description": "Unusual coordinates",
"function": "lat_long",
"field_name": "geo.coordinates",
"partition_field_name": "host.keyword"
"field_name": "geo.coordinates"
},
{
"detector_description": "Sum of bytes",
"function": "sum",
"function": "high_sum",
"field_name": "bytes"
}
],
]
"influencers": [
"geo.src",
"agent.keyword",
"geo.dest"
"geo.src",
"extension.keyword",
"geo.dest"
]
},
"data_description" : {
"time_field": "timestamp",
"time_format": "epoch_ms"
"time_format": "epoch_ms"
}
}
Expand All @@ -195,7 +193,7 @@ POST _ml/anomaly_detectors/weblogs-geo/_open <3>
POST _ml/datafeeds/datafeed-weblogs-geo/_start <4>
{
"end": "2021-05-21T22:00:00Z"
"end": "2021-07-15T22:00:00Z"
}
--------------------------------------------------
<1> Create the {anomaly-job}.
Expand All @@ -215,15 +213,39 @@ include::../shared/multi-metric-jobs.asciidoc[]
[geographic-anomalies-results]
== Analyze the results

After the {anomaly-jobs} have processed some data, you can view the results in
{kib}.

TIP: If you used APIs to create the jobs and {dfeeds}, you cannot see them
in {kib} until you follow the prompts to synchronize the necessary saved objects.

When you select a period that contains an anomaly in the swim lane results, you
can see a map of the typical and actual coordinates. For example, the `jackson`
user ID typically shops in Los Angeles so their purchase in New York is
anomalous in the eCommerce sample data:
//TBD: Is it working as designed that the map only appears after you click the swim lane?

[role="screenshot"]
image::images/ecommerce-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the eCommerce data in Anomaly Explorer]

Likewise, there are time periods in the web logs sample data where there are
both unusually high sums of data transferred and unusual geographical
coordinates:

[role="screenshot"]
image::images/weblogs-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the web logs data in Anomaly Explorer]

You can use the top influencer values to further filter your results and
identify possible contributing factors or patterns of behavior.

When you try this type of {anomaly-job} with your own data, it might take
some experimentation to find the best combination of buckets, detectors, and
influencers to detect the type of behavior you're seeking.

For more information about {anomaly-detect} concepts, see <<ml-concepts>>.
For the full list of functions that you can use in {anomaly-jobs}, see
<<ml-functions>>. For more {anomaly-detect} examples, see <<anomaly-examples>>.

[discrete]
[geographic-anomalies-next]
== What's next
Expand Down
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.

0 comments on commit 74cc91d

Please sign in to comment.