Skip to content

Commit 4266cee

Browse files
lcawlszabosteve
andcommitted
[DOCS] Add anomaly detection example for geographic data (elastic#1631)
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
1 parent 33d083c commit 4266cee

8 files changed

+245
-0
lines changed

docs/en/stack/ml/anomaly-detection/anomaly-examples.asciidoc

+1
Original file line numberDiff line numberDiff line change
@@ -19,6 +19,7 @@ The scenarios in this section describe some best practices for generating useful
1919
* <<ml-configuring-transform>>
2020
* <<ml-configuring-url>>
2121
* <<ml-delayed-data-detection>>
22+
* <<geographic-anomalies>>
2223

2324
[discrete]
2425
[[anomaly-examples-blog-posts]]
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,242 @@
1+
[role="xpack"]
2+
[testenv="platinum"]
3+
[[geographic-anomalies]]
4+
= Detecting anomalous locations in geographic data
5+
6+
If your data includes geographic fields, you can use {ml-features} to detect
7+
anomalous behavior, such as a credit card transaction that occurs in an unusual
8+
location or a web request that has an unusual source location.
9+
10+
[discrete]
11+
[[geographic-anomalies-prereqs]]
12+
== Prerequisites
13+
14+
To run this type of {anomaly-job}, you must have <<setup,{ml-features} set up>>.
15+
You must also have time series data that contains spatial data types. In
16+
particular, you must have:
17+
18+
* two comma-separated numbers of the form `latitude,longitude`,
19+
* a {ref}/geo-point.html[`geo_point`] field,
20+
* a {ref}/geo-shape.html[`geo_shape`] field that contains point values, or
21+
* a {ref}/search-aggregations-metrics-geocentroid-aggregation.html[`geo_centroid`] aggregation
22+
23+
The latitude and longitude must be in the range -180 to 180 and represent a
24+
point on the surface of the Earth.
25+
26+
This example uses the sample eCommerce orders and sample web logs data sets. For
27+
more information, see
28+
{kibana-ref}/get-started.html#gs-get-data-into-kibana[Add the sample data].
29+
30+
[discrete]
31+
[[geographic-anomalies-visualize]]
32+
== Explore your geographic data
33+
34+
To get the best results from {ml} analytics, you must understand your data. You
35+
can use the **{data-viz}** in the **{ml-app}** app for this purpose. Search for
36+
specific fields or field types, such as geo-point fields in the sample data sets.
37+
You can see how many documents contain those fields within a specific time
38+
period and sample size. You can also see the number of distinct values, a list
39+
of example values, and preview them on a map. For example:
40+
41+
[role="screenshot"]
42+
image::images/weblogs-data-visualizer-geopoint.jpg[A screenshot of a geo_point field in {data-viz}]
43+
44+
[discrete]
45+
[[geographic-anomalies-jobs]]
46+
== Create an {anomaly-job}
47+
48+
There are a few limitations to consider before you create this type of job:
49+
50+
. You cannot create forecasts for {anomaly-jobs} that contain geographic
51+
functions.
52+
. You cannot add <<ml-rules,custom rules with conditions>> to detectors that use geographic functions.
53+
54+
If those limitations are acceptable, try creating an {anomaly-job} that uses
55+
the <<ml-lat-long,`lat_long` function>> to analyze your own data or the sample
56+
data sets.
57+
58+
To create an {anomaly-job} that uses the `lat_long` function, in {kib} you must
59+
click **Create job** on the **{ml-cap} > {anomaly-detect-cap}** page and select
60+
the advanced job wizard. Alternatively, use the
61+
{ref}/ml-put-job.html[create {anomaly-jobs} API].
62+
63+
For example, create a job that analyzes the sample eCommerce orders data set to
64+
find orders with unusual coordinates (`geoip.location` values) relative to the
65+
past behavior of each customer (`user` ID):
66+
67+
[role="screenshot"]
68+
image::images/ecommerce-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the eCommerce data in {kib}]
69+
70+
.API example
71+
[%collapsible]
72+
====
73+
[source,console]
74+
--------------------------------------------------
75+
PUT _ml/anomaly_detectors/ecommerce-geo <1>
76+
{
77+
"analysis_config" : {
78+
"bucket_span":"15m",
79+
"detectors": [
80+
{
81+
"detector_description": "Unusual coordinates by user",
82+
"function": "lat_long",
83+
"field_name": "geoip.location",
84+
"by_field_name": "user"
85+
}
86+
],
87+
"influencers": [
88+
"geoip.country_iso_code",
89+
"day_of_week",
90+
"category.keyword"
91+
]
92+
},
93+
"data_description" : {
94+
"time_field": "order_date"
95+
}
96+
}
97+
98+
PUT _ml/datafeeds/datafeed-ecommerce-geo <2>
99+
{
100+
"job_id": "ecommerce-geo",
101+
"query": {
102+
"bool": {
103+
"must": [
104+
{
105+
"match_all": {}
106+
}
107+
]
108+
}
109+
},
110+
"indices": [
111+
"kibana_sample_data_ecommerce"
112+
]
113+
}
114+
115+
POST _ml/anomaly_detectors/ecommerce-geo/_open <3>
116+
117+
POST _ml/datafeeds/datafeed-ecommerce-geo/_start <4>
118+
{
119+
"end": "2021-06-19T24:00:00Z"
120+
}
121+
--------------------------------------------------
122+
<1> Create the {anomaly-job}.
123+
<2> Create the {dfeed}.
124+
<3> Open the job.
125+
<4> Start the {dfeed}. Since the sample data sets often contain timestamps that
126+
are later than the current date, it is a good idea to specify the appropriate
127+
end date for the {dfeed}.
128+
====
129+
130+
Alternatively, create a job that analyzes the sample web logs data set to detect
131+
events with unusual coordinates (`geo.coordinates` values) or unusually high
132+
sums of transferred data (`bytes` values):
133+
134+
[role="screenshot"]
135+
image::images/weblogs-advanced-wizard-geopoint.jpg[A screenshot of creating an {anomaly-job} using the web logs data in {kib}]
136+
137+
.API example
138+
[%collapsible]
139+
====
140+
[source,console]
141+
--------------------------------------------------
142+
PUT _ml/anomaly_detectors/weblogs-geo <1>
143+
{
144+
"analysis_config" : {
145+
"bucket_span":"15m",
146+
"detectors": [
147+
{
148+
"detector_description": "Unusual coordinates",
149+
"function": "lat_long",
150+
"field_name": "geo.coordinates"
151+
},
152+
{
153+
"function": "high_sum",
154+
"field_name": "bytes"
155+
}
156+
]
157+
"influencers": [
158+
"geo.src",
159+
"extension.keyword",
160+
"geo.dest"
161+
]
162+
},
163+
"data_description" : {
164+
"time_field": "timestamp",
165+
"time_format": "epoch_ms"
166+
}
167+
}
168+
169+
PUT _ml/datafeeds/datafeed-weblogs-geo <2>
170+
{
171+
"job_id": "weblogs-geo",
172+
"query": {
173+
"bool": {
174+
"must": [
175+
{
176+
"match_all": {}
177+
}
178+
]
179+
}
180+
},
181+
"indices": [
182+
"kibana_sample_data_logs"
183+
]
184+
}
185+
186+
POST _ml/anomaly_detectors/weblogs-geo/_open <3>
187+
188+
POST _ml/datafeeds/datafeed-weblogs-geo/_start <4>
189+
{
190+
"end": "2021-07-15T22:00:00Z"
191+
}
192+
--------------------------------------------------
193+
<1> Create the {anomaly-job}.
194+
<2> Create the {dfeed}.
195+
<3> Open the job.
196+
<4> Start the {dfeed}. Since the sample data sets often contain timestamps that
197+
are later than the current date, it is a good idea to specify the appropriate
198+
end date for the {dfeed}.
199+
====
200+
201+
[discrete]
202+
[[geographic-anomalies-results]]
203+
== Analyze the results
204+
205+
After the {anomaly-jobs} have processed some data, you can view the results in
206+
{kib}.
207+
208+
TIP: If you used APIs to create the jobs and {dfeeds}, you cannot see them
209+
in {kib} until you follow the prompts to synchronize the necessary saved objects.
210+
211+
When you select a period that contains an anomaly in the swim lane results, you
212+
can see a map of the typical and actual coordinates. For example, the `jackson`
213+
user ID typically shops in Los Angeles so their purchase in New York is
214+
anomalous in the eCommerce sample data:
215+
216+
[role="screenshot"]
217+
image::images/ecommerce-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the eCommerce data in Anomaly Explorer]
218+
219+
Likewise, there are time periods in the web logs sample data where there are
220+
both unusually high sums of data transferred and unusual geographical
221+
coordinates:
222+
223+
[role="screenshot"]
224+
image::images/weblogs-anomaly-explorer-geopoint.jpg[A screenshot of an anomalous event in the web logs data in Anomaly Explorer]
225+
226+
You can use the top influencer values to further filter your results and
227+
identify possible contributing factors or patterns of behavior.
228+
229+
When you try this type of {anomaly-job} with your own data, it might take
230+
some experimentation to find the best combination of buckets, detectors, and
231+
influencers to detect the type of behavior you're seeking.
232+
233+
For more information about {anomaly-detect} concepts, see <<ml-concepts>>.
234+
For the full list of functions that you can use in {anomaly-jobs}, see
235+
<<ml-functions>>. For more {anomaly-detect} examples, see <<anomaly-examples>>.
236+
237+
[discrete]
238+
[[geographic-anomalies-next]]
239+
== What's next
240+
241+
* {kibana-ref}/maps.html[Learn more about **Maps**]
242+
* <<ml-configuring-alerts,Generate alerts for your {anomaly-jobs}>>
Loading
Loading
Loading
Loading
Loading

docs/en/stack/ml/anomaly-detection/index.asciidoc

+2
Original file line numberDiff line numberDiff line change
@@ -59,6 +59,8 @@ include::{es-repo-dir}/ml/anomaly-detection/ml-configuring-detector-custom-rules
5959

6060
include::{es-repo-dir}/ml/anomaly-detection/ml-configuring-categories.asciidoc[leveloffset=+2]
6161

62+
include::geographic-anomalies.asciidoc[leveloffset=+2]
63+
6264
include::{es-repo-dir}/ml/anomaly-detection/ml-configuring-populations.asciidoc[leveloffset=+2]
6365

6466
include::{es-repo-dir}/ml/anomaly-detection/ml-configuring-transform.asciidoc[leveloffset=+2]

0 commit comments

Comments
 (0)