diff --git a/docs/en/stack/ml/anomaly-detection/geographic-anomalies.asciidoc b/docs/en/stack/ml/anomaly-detection/geographic-anomalies.asciidoc index 512ab221d..ca3b520bb 100644 --- a/docs/en/stack/ml/anomaly-detection/geographic-anomalies.asciidoc +++ b/docs/en/stack/ml/anomaly-detection/geographic-anomalies.asciidoc @@ -4,16 +4,16 @@ = Detecting anomalous locations in geographic data If your data includes geographic fields, you can use {ml-features} to detect -anomalous behavior, such as a credit card transaction that occurs in an -unusual location or a web request that has an unusual source location. +anomalous behavior, such as a credit card transaction that occurs in an unusual +location or a web request that has an unusual source location. [discrete] [geographic-anomalies-prereqs] == Prerequisites To run this type of {anomaly-job}, you must have <>. -You must also have data that contains spatial data types. In particular, you -must have: +You must also have time series data that contains spatial data types. In +particular, you must have: * two comma-separated numbers of the form `latitude,longitude`, * a {ref}/geo-point.html[`geo_point`] field, @@ -31,11 +31,12 @@ more information, see [geographic-anomalies-visualize] == Explore your geographic data -If you want to see more information about your geographic data, you can use the -**{data-viz}** in the **{ml-app}** app. You can search for specific fields or -field types then see how many documents contain those field within a specific -sample size and time period. You can also see the number of distinct values and -preview them on a map. For example: +To get the best results from {ml} analytics, you must understand your data. You +can use the **{data-viz}** in the **{ml-app}** app for this purpose. Search for +specific fields or field types, such as geo-point fields in the sample data sets. +You can see how many documents contain those fields within a specific time +period and sample size. You can also see the number of distinct values, a list +of example values, and preview them on a map. For example: [role="screenshot"] image::images/weblogs-data-visualizer-geopoint.jpg[A screenshot of a geo_point field in {data-viz}] @@ -60,8 +61,8 @@ the advanced job wizard. Alternatively, use the {ref}/ml-put-job.html[create {anomaly-jobs} API]. For example, create a job that analyzes the sample eCommerce orders data set to -find orders with unusual `geoip.location` values relative to the past behavior -of each `user` ID: +find orders with unusual coordinates (`geoip.location` values) relative to the +past behavior of each customer (`user` ID): [role="screenshot"] image::images/ecommerce-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the eCommerce data in {kib}] @@ -77,7 +78,7 @@ PUT _ml/anomaly_detectors/ecommerce-geo <1> "bucket_span":"15m", "detectors": [ { - "detector_description": "Unusual coordinates", + "detector_description": "Unusual coordinates by user", "function": "lat_long", "field_name": "geoip.location", "by_field_name": "user" @@ -90,8 +91,7 @@ PUT _ml/anomaly_detectors/ecommerce-geo <1> ] }, "data_description" : { - "time_field": "order_date", - "time_format": "epoch_ms" + "time_field": "order_date" } } @@ -116,7 +116,7 @@ POST _ml/anomaly_detectors/ecommerce-geo/_open <3> POST _ml/datafeeds/datafeed-ecommerce-geo/_start <4> { - "end": "2021-04-18T18:00:00Z" + "end": "2021-06-19T24:00:00Z" } -------------------------------------------------- <1> Create the {anomaly-job}. @@ -134,8 +134,8 @@ include::../shared/influencers.asciidoc[] **** Alternatively, create a job that analyzes the sample web logs data set to detect -unusual `geo.coordinates` values for each host or anomalous behavior in the sum -of the `bytes` field: +events with unusual coordinates (`geo.coordinates` values) or unusually high +sums of transferred data (`bytes` values): [role="screenshot"] image::images/weblogs-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the web logs data in {kib}] @@ -151,26 +151,24 @@ PUT _ml/anomaly_detectors/weblogs-geo <1> "bucket_span":"15m", "detectors": [ { - "detector_description": "Unusual coordinates partitioned by host", + "detector_description": "Unusual coordinates", "function": "lat_long", - "field_name": "geo.coordinates", - "partition_field_name": "host.keyword" + "field_name": "geo.coordinates" }, { - "detector_description": "Sum of bytes", - "function": "sum", + "function": "high_sum", "field_name": "bytes" } - ], + ] "influencers": [ - "geo.src", - "agent.keyword", - "geo.dest" + "geo.src", + "extension.keyword", + "geo.dest" ] }, "data_description" : { "time_field": "timestamp", - "time_format": "epoch_ms" + "time_format": "epoch_ms" } } @@ -195,7 +193,7 @@ POST _ml/anomaly_detectors/weblogs-geo/_open <3> POST _ml/datafeeds/datafeed-weblogs-geo/_start <4> { - "end": "2021-05-21T22:00:00Z" + "end": "2021-07-15T22:00:00Z" } -------------------------------------------------- <1> Create the {anomaly-job}. @@ -215,15 +213,39 @@ include::../shared/multi-metric-jobs.asciidoc[] [geographic-anomalies-results] == Analyze the results +After the {anomaly-jobs} have processed some data, you can view the results in +{kib}. + TIP: If you used APIs to create the jobs and {dfeeds}, you cannot see them in {kib} until you follow the prompts to synchronize the necessary saved objects. +When you select a period that contains an anomaly in the swim lane results, you +can see a map of the typical and actual coordinates. For example, the `jackson` +user ID typically shops in Los Angeles so their purchase in New York is +anomalous in the eCommerce sample data: +//TBD: Is it working as designed that the map only appears after you click the swim lane? + [role="screenshot"] image::images/ecommerce-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the eCommerce data in Anomaly Explorer] +Likewise, there are time periods in the web logs sample data where there are +both unusually high sums of data transferred and unusual geographical +coordinates: + [role="screenshot"] image::images/weblogs-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the web logs data in Anomaly Explorer] +You can use the top influencer values to further filter your results and +identify possible contributing factors or patterns of behavior. + +When you try this type of {anomaly-job} with your own data, it might take +some experimentation to find the best combination of buckets, detectors, and +influencers to detect the type of behavior you're seeking. + +For more information about {anomaly-detect} concepts, see <>. +For the full list of functions that you can use in {anomaly-jobs}, see +<>. For more {anomaly-detect} examples, see <>. + [discrete] [geographic-anomalies-next] == What's next diff --git a/docs/en/stack/ml/anomaly-detection/images/ecommerce-advanced-wizard-geopoint.jpg b/docs/en/stack/ml/anomaly-detection/images/ecommerce-advanced-wizard-geopoint.jpg index cbad0ac6d..7ec721bda 100644 Binary files a/docs/en/stack/ml/anomaly-detection/images/ecommerce-advanced-wizard-geopoint.jpg and b/docs/en/stack/ml/anomaly-detection/images/ecommerce-advanced-wizard-geopoint.jpg differ diff --git a/docs/en/stack/ml/anomaly-detection/images/ecommerce-anomaly-explorer-geopoint.jpg b/docs/en/stack/ml/anomaly-detection/images/ecommerce-anomaly-explorer-geopoint.jpg index 9ac998b3a..08bdeecf9 100644 Binary files a/docs/en/stack/ml/anomaly-detection/images/ecommerce-anomaly-explorer-geopoint.jpg and b/docs/en/stack/ml/anomaly-detection/images/ecommerce-anomaly-explorer-geopoint.jpg differ diff --git a/docs/en/stack/ml/anomaly-detection/images/weblogs-advanced-wizard-geopoint.jpg b/docs/en/stack/ml/anomaly-detection/images/weblogs-advanced-wizard-geopoint.jpg index 840df0fb8..020d73501 100644 Binary files a/docs/en/stack/ml/anomaly-detection/images/weblogs-advanced-wizard-geopoint.jpg and b/docs/en/stack/ml/anomaly-detection/images/weblogs-advanced-wizard-geopoint.jpg differ diff --git a/docs/en/stack/ml/anomaly-detection/images/weblogs-anomaly-explorer-geopoint.jpg b/docs/en/stack/ml/anomaly-detection/images/weblogs-anomaly-explorer-geopoint.jpg index 51af797b8..a0765c3d2 100644 Binary files a/docs/en/stack/ml/anomaly-detection/images/weblogs-anomaly-explorer-geopoint.jpg and b/docs/en/stack/ml/anomaly-detection/images/weblogs-anomaly-explorer-geopoint.jpg differ diff --git a/docs/en/stack/ml/anomaly-detection/images/weblogs-data-visualizer-geopoint.jpg b/docs/en/stack/ml/anomaly-detection/images/weblogs-data-visualizer-geopoint.jpg index d78bb9ace..386cc9775 100644 Binary files a/docs/en/stack/ml/anomaly-detection/images/weblogs-data-visualizer-geopoint.jpg and b/docs/en/stack/ml/anomaly-detection/images/weblogs-data-visualizer-geopoint.jpg differ