|
| 1 | +[role="xpack"] |
| 2 | +[testenv="platinum"] |
| 3 | +[[geographic-anomalies]] |
| 4 | += Detecting anomalous locations in geographic data |
| 5 | + |
| 6 | +If your data includes geographic fields, you can use {ml-features} to detect |
| 7 | +anomalous behavior, such as a credit card transaction that occurs in an unusual |
| 8 | +location or a web request that has an unusual source location. |
| 9 | + |
| 10 | +[discrete] |
| 11 | +[[geographic-anomalies-prereqs]] |
| 12 | +== Prerequisites |
| 13 | + |
| 14 | +To run this type of {anomaly-job}, you must have <<setup,{ml-features} set up>>. |
| 15 | +You must also have time series data that contains spatial data types. In |
| 16 | +particular, you must have: |
| 17 | + |
| 18 | +* two comma-separated numbers of the form `latitude,longitude`, |
| 19 | +* a {ref}/geo-point.html[`geo_point`] field, |
| 20 | +* a {ref}/geo-shape.html[`geo_shape`] field that contains point values, or |
| 21 | +* a {ref}/search-aggregations-metrics-geocentroid-aggregation.html[`geo_centroid`] aggregation |
| 22 | + |
| 23 | +The latitude and longitude must be in the range -180 to 180 and represent a |
| 24 | +point on the surface of the Earth. |
| 25 | + |
| 26 | +This example uses the sample eCommerce orders and sample web logs data sets. For |
| 27 | +more information, see |
| 28 | +{kibana-ref}/get-started.html#gs-get-data-into-kibana[Add the sample data]. |
| 29 | + |
| 30 | +[discrete] |
| 31 | +[[geographic-anomalies-visualize]] |
| 32 | +== Explore your geographic data |
| 33 | + |
| 34 | +To get the best results from {ml} analytics, you must understand your data. You |
| 35 | +can use the **{data-viz}** in the **{ml-app}** app for this purpose. Search for |
| 36 | +specific fields or field types, such as geo-point fields in the sample data sets. |
| 37 | +You can see how many documents contain those fields within a specific time |
| 38 | +period and sample size. You can also see the number of distinct values, a list |
| 39 | +of example values, and preview them on a map. For example: |
| 40 | + |
| 41 | +[role="screenshot"] |
| 42 | +image::images/weblogs-data-visualizer-geopoint.jpg[A screenshot of a geo_point field in {data-viz}] |
| 43 | + |
| 44 | +[discrete] |
| 45 | +[[geographic-anomalies-jobs]] |
| 46 | +== Create an {anomaly-job} |
| 47 | + |
| 48 | +There are a few limitations to consider before you create this type of job: |
| 49 | + |
| 50 | +. You cannot create forecasts for {anomaly-jobs} that contain geographic |
| 51 | +functions. |
| 52 | +. You cannot add <<ml-rules,custom rules with conditions>> to detectors that use geographic functions. |
| 53 | + |
| 54 | +If those limitations are acceptable, try creating an {anomaly-job} that uses |
| 55 | +the <<ml-lat-long,`lat_long` function>> to analyze your own data or the sample |
| 56 | +data sets. |
| 57 | + |
| 58 | +To create an {anomaly-job} that uses the `lat_long` function, in {kib} you must |
| 59 | +click **Create job** on the **{ml-cap} > {anomaly-detect-cap}** page and select |
| 60 | +the advanced job wizard. Alternatively, use the |
| 61 | +{ref}/ml-put-job.html[create {anomaly-jobs} API]. |
| 62 | + |
| 63 | +For example, create a job that analyzes the sample eCommerce orders data set to |
| 64 | +find orders with unusual coordinates (`geoip.location` values) relative to the |
| 65 | +past behavior of each customer (`user` ID): |
| 66 | + |
| 67 | +[role="screenshot"] |
| 68 | +image::images/ecommerce-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the eCommerce data in {kib}] |
| 69 | + |
| 70 | +.API example |
| 71 | +[%collapsible] |
| 72 | +==== |
| 73 | +[source,console] |
| 74 | +-------------------------------------------------- |
| 75 | +PUT _ml/anomaly_detectors/ecommerce-geo <1> |
| 76 | +{ |
| 77 | + "analysis_config" : { |
| 78 | + "bucket_span":"15m", |
| 79 | + "detectors": [ |
| 80 | + { |
| 81 | + "detector_description": "Unusual coordinates by user", |
| 82 | + "function": "lat_long", |
| 83 | + "field_name": "geoip.location", |
| 84 | + "by_field_name": "user" |
| 85 | + } |
| 86 | + ], |
| 87 | + "influencers": [ |
| 88 | + "geoip.country_iso_code", |
| 89 | + "day_of_week", |
| 90 | + "category.keyword" |
| 91 | + ] |
| 92 | + }, |
| 93 | + "data_description" : { |
| 94 | + "time_field": "order_date" |
| 95 | + } |
| 96 | +} |
| 97 | +
|
| 98 | +PUT _ml/datafeeds/datafeed-ecommerce-geo <2> |
| 99 | +{ |
| 100 | + "job_id": "ecommerce-geo", |
| 101 | + "query": { |
| 102 | + "bool": { |
| 103 | + "must": [ |
| 104 | + { |
| 105 | + "match_all": {} |
| 106 | + } |
| 107 | + ] |
| 108 | + } |
| 109 | + }, |
| 110 | + "indices": [ |
| 111 | + "kibana_sample_data_ecommerce" |
| 112 | + ] |
| 113 | +} |
| 114 | +
|
| 115 | +POST _ml/anomaly_detectors/ecommerce-geo/_open <3> |
| 116 | +
|
| 117 | +POST _ml/datafeeds/datafeed-ecommerce-geo/_start <4> |
| 118 | +{ |
| 119 | + "end": "2021-06-19T24:00:00Z" |
| 120 | +} |
| 121 | +-------------------------------------------------- |
| 122 | +<1> Create the {anomaly-job}. |
| 123 | +<2> Create the {dfeed}. |
| 124 | +<3> Open the job. |
| 125 | +<4> Start the {dfeed}. Since the sample data sets often contain timestamps that |
| 126 | +are later than the current date, it is a good idea to specify the appropriate |
| 127 | +end date for the {dfeed}. |
| 128 | +==== |
| 129 | + |
| 130 | +Alternatively, create a job that analyzes the sample web logs data set to detect |
| 131 | +events with unusual coordinates (`geo.coordinates` values) or unusually high |
| 132 | +sums of transferred data (`bytes` values): |
| 133 | + |
| 134 | +[role="screenshot"] |
| 135 | +image::images/weblogs-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the web logs data in {kib}] |
| 136 | + |
| 137 | +.API example |
| 138 | +[%collapsible] |
| 139 | +==== |
| 140 | +[source,console] |
| 141 | +-------------------------------------------------- |
| 142 | +PUT _ml/anomaly_detectors/weblogs-geo <1> |
| 143 | +{ |
| 144 | + "analysis_config" : { |
| 145 | + "bucket_span":"15m", |
| 146 | + "detectors": [ |
| 147 | + { |
| 148 | + "detector_description": "Unusual coordinates", |
| 149 | + "function": "lat_long", |
| 150 | + "field_name": "geo.coordinates" |
| 151 | + }, |
| 152 | + { |
| 153 | + "function": "high_sum", |
| 154 | + "field_name": "bytes" |
| 155 | + } |
| 156 | + ] |
| 157 | + "influencers": [ |
| 158 | + "geo.src", |
| 159 | + "extension.keyword", |
| 160 | + "geo.dest" |
| 161 | + ] |
| 162 | + }, |
| 163 | + "data_description" : { |
| 164 | + "time_field": "timestamp", |
| 165 | + "time_format": "epoch_ms" |
| 166 | + } |
| 167 | +} |
| 168 | +
|
| 169 | +PUT _ml/datafeeds/datafeed-weblogs-geo <2> |
| 170 | +{ |
| 171 | + "job_id": "weblogs-geo", |
| 172 | + "query": { |
| 173 | + "bool": { |
| 174 | + "must": [ |
| 175 | + { |
| 176 | + "match_all": {} |
| 177 | + } |
| 178 | + ] |
| 179 | + } |
| 180 | + }, |
| 181 | + "indices": [ |
| 182 | + "kibana_sample_data_logs" |
| 183 | + ] |
| 184 | +} |
| 185 | +
|
| 186 | +POST _ml/anomaly_detectors/weblogs-geo/_open <3> |
| 187 | +
|
| 188 | +POST _ml/datafeeds/datafeed-weblogs-geo/_start <4> |
| 189 | +{ |
| 190 | + "end": "2021-07-15T22:00:00Z" |
| 191 | +} |
| 192 | +-------------------------------------------------- |
| 193 | +<1> Create the {anomaly-job}. |
| 194 | +<2> Create the {dfeed}. |
| 195 | +<3> Open the job. |
| 196 | +<4> Start the {dfeed}. Since the sample data sets often contain timestamps that |
| 197 | +are later than the current date, it is a good idea to specify the appropriate |
| 198 | +end date for the {dfeed}. |
| 199 | +==== |
| 200 | + |
| 201 | +[discrete] |
| 202 | +[[geographic-anomalies-results]] |
| 203 | +== Analyze the results |
| 204 | + |
| 205 | +After the {anomaly-jobs} have processed some data, you can view the results in |
| 206 | +{kib}. |
| 207 | + |
| 208 | +TIP: If you used APIs to create the jobs and {dfeeds}, you cannot see them |
| 209 | +in {kib} until you follow the prompts to synchronize the necessary saved objects. |
| 210 | + |
| 211 | +When you select a period that contains an anomaly in the swim lane results, you |
| 212 | +can see a map of the typical and actual coordinates. For example, the `jackson` |
| 213 | +user ID typically shops in Los Angeles so their purchase in New York is |
| 214 | +anomalous in the eCommerce sample data: |
| 215 | + |
| 216 | +[role="screenshot"] |
| 217 | +image::images/ecommerce-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the eCommerce data in Anomaly Explorer] |
| 218 | + |
| 219 | +Likewise, there are time periods in the web logs sample data where there are |
| 220 | +both unusually high sums of data transferred and unusual geographical |
| 221 | +coordinates: |
| 222 | + |
| 223 | +[role="screenshot"] |
| 224 | +image::images/weblogs-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the web logs data in Anomaly Explorer] |
| 225 | + |
| 226 | +You can use the top influencer values to further filter your results and |
| 227 | +identify possible contributing factors or patterns of behavior. |
| 228 | + |
| 229 | +When you try this type of {anomaly-job} with your own data, it might take |
| 230 | +some experimentation to find the best combination of buckets, detectors, and |
| 231 | +influencers to detect the type of behavior you're seeking. |
| 232 | + |
| 233 | +For more information about {anomaly-detect} concepts, see <<ml-concepts>>. |
| 234 | +For the full list of functions that you can use in {anomaly-jobs}, see |
| 235 | +<<ml-functions>>. For more {anomaly-detect} examples, see <<anomaly-examples>>. |
| 236 | + |
| 237 | +[discrete] |
| 238 | +[[geographic-anomalies-next]] |
| 239 | +== What's next |
| 240 | + |
| 241 | +* {kibana-ref}/maps.html[Learn more about **Maps**] |
| 242 | +* <<ml-configuring-alerts,Generate alerts for your {anomaly-jobs}>> |
0 commit comments