Skip to content

Commit ca90b14

Browse files
committed
[DOCS] Refresh regression screenshots with histograms (elastic#1267)
1 parent 05f8547 commit ca90b14

5 files changed

+53
-33
lines changed

docs/en/stack/ml/df-analytics/flightdata-regression.asciidoc

+53-33
Original file line numberDiff line numberDiff line change
@@ -89,10 +89,14 @@ results.
8989

9090
To predict the number of minutes delayed for each flight:
9191

92+
. Verify that your environment is set up properly to use {ml-features}. If the
93+
{stack} {security-features} are enabled, you need a user that has authority
94+
to create and manage {dfanalytics-jobs}. See <<setup>>.
95+
9296
. Create a {dfanalytics-job}.
9397
+
9498
--
95-
You can use the wizard on the *Machine Learning* > *Data Frame Analaytics* tab
99+
You can use the wizard on the *{ml-app}* > *Data Frame Analytics* tab
96100
in {kib} or the {ref}/put-dfanalytics.html[create {dfanalytics-jobs}] API.
97101

98102
[role="screenshot"]
@@ -195,10 +199,8 @@ POST _ml/data_frame/analytics/model-flight-delays/_start
195199
[role="screenshot"]
196200
image::images/flights-regression-details.png["Statistics for a {dfanalytics-job} in {kib}"]
197201

198-
The job has four main phases (reindexing, loading data, analyzing, and writing
199-
results). When all the phases have completed, the job stops and the results are
200-
ready to view and evaluate. Consult <<ml-dfa-phases>> to learn more about the
201-
different phases.
202+
When the job stops, the results are ready to view and evaluate. To learn more
203+
about the job phases, see <<ml-dfa-phases>>.
202204

203205

204206
.API example
@@ -230,46 +232,63 @@ The API call returns the following response:
230232
"progress_percent" : 100
231233
},
232234
{
233-
"phase" : "analyzing",
235+
"phase" : "feature_selection",
236+
"progress_percent" : 100
237+
},
238+
{
239+
"phase" : "coarse_parameter_search",
240+
"progress_percent" : 100
241+
},
242+
{
243+
"phase" : "fine_tuning_parameters",
244+
"progress_percent" : 100
245+
},
246+
{
247+
"phase" : "final_training",
234248
"progress_percent" : 100
235249
},
236250
{
237251
"phase" : "writing_results",
238252
"progress_percent" : 100
253+
},
254+
{
255+
"phase" : "inference",
256+
"progress_percent" : 100
239257
}
240258
],
241259
"data_counts" : {
242-
"training_docs_count" : 11759,
243-
"test_docs_count" : 1300,
260+
"training_docs_count" : 11210,
261+
"test_docs_count" : 1246,
244262
"skipped_docs_count" : 0
245263
},
246264
"memory_usage" : {
247-
"timestamp" : 1587590328000,
248-
"peak_usage_bytes" : 2424894
265+
"timestamp" : 1596237978801,
266+
"peak_usage_bytes" : 2204548,
267+
"status" : "ok"
249268
},
250269
"analysis_stats" : {
251270
"regression_stats" : {
252-
"timestamp" : 1587590328000,
271+
"timestamp" : 1596237978801,
253272
"iteration" : 18,
254273
"hyperparameters" : {
255-
"alpha" : 13913.440706141744,
256-
"downsample_factor" : 0.8296546656515433,
257-
"eta" : 0.04216457735949444,
258-
"eta_growth_rate_per_tree" : 1.0264998162827081,
274+
"alpha" : 168825.7788898173,
275+
"downsample_factor" : 0.9033277769849748,
276+
"eta" : 0.04884738703731517,
277+
"eta_growth_rate_per_tree" : 1.0299887790757198,
259278
"feature_bag_fraction" : 0.5504020748926737,
260-
"gamma" : 722.9233202705029,
261-
"lambda" : 1.0278806525490607,
279+
"gamma" : 1454.4275926774008,
280+
"lambda" : 2.1114872989215074,
262281
"max_attempts_to_add_tree" : 3,
263282
"max_optimization_rounds_per_hyperparameter" : 2,
264-
"max_trees" : 483,
283+
"max_trees" : 427,
265284
"num_folds" : 4,
266285
"num_splits_per_feature" : 75,
267-
"soft_tree_depth_limit" : 3.105960810136212,
286+
"soft_tree_depth_limit" : 5.8014874129785,
268287
"soft_tree_depth_tolerance" : 0.13448633124842999
269288
},
270289
"timing_stats" : {
271-
"elapsed_time" : 168362,
272-
"iteration_time" : 9691
290+
"elapsed_time" : 124851,
291+
"iteration_time" : 15081
273292
},
274293
"validation_loss" : {
275294
"loss_type" : "mse",
@@ -302,7 +321,8 @@ predict with the {reganalysis}. It also shows a column for the prediction values
302321
(`ml.FlightDelayMin_prediction`) and a column that indicates whether the
303322
document was used in the training set (`ml.is_training`). You can filter the
304323
table to show only testing or training data and you can select which fields are
305-
shown in the table.
324+
shown in the table. You can also enable histogram charts to get a better
325+
understanding of the distribution of values in your data.
306326

307327
If you do not use {kib}, you can see the same information by using the standard
308328
{es} search command to view the results in the destination index.
@@ -321,13 +341,13 @@ The snippet below shows a part of a document with the annotated results:
321341
[source,console-result]
322342
----
323343
...
324-
"DestRegion" : "UK",
325-
"OriginAirportID" : "LHR",
344+
"DestCountry" : "GB",
345+
"DestRegion" : "GB-ENG",
346+
"OriginAirportID" : "CAN",
326347
"DestCityName" : "London",
327-
"FlightDelayMin" : 66,
328348
"ml" : {
329-
"FlightDelayMin_prediction" : 62.527,
330-
"is_training" : false
349+
"FlightDelayMin_prediction" : 10.039840698242188,
350+
"is_training" : true
331351
}
332352
...
333353
----
@@ -376,7 +396,7 @@ POST _ml/data_frame/_evaluate
376396
"predicted_field": "ml.FlightDelayMin_prediction", <4>
377397
"metrics": {
378398
"r_squared": {},
379-
"mean_squared_error": {}
399+
"mse": {}
380400
}
381401
}
382402
}
@@ -395,11 +415,11 @@ The API returns a response like this:
395415
----
396416
{
397417
"regression" : {
398-
"mean_squared_error" : {
399-
"error" : 3006.517622042659
418+
"mse" : {
419+
"value" : 3125.3396943667544
400420
},
401421
"r_squared" : {
402-
"value" : 0.6794200914263231
422+
"value" : 0.6659988649180306
403423
}
404424
}
405425
}
@@ -423,7 +443,7 @@ POST _ml/data_frame/_evaluate
423443
"predicted_field": "ml.FlightDelayMin_prediction",
424444
"metrics": {
425445
"r_squared": {},
426-
"mean_squared_error": {}
446+
"mse": {}
427447
}
428448
}
429449
}
@@ -436,4 +456,4 @@ POST _ml/data_frame/_evaluate
436456

437457
If you don't want to keep the {dfanalytics-job}, you can delete it. For example,
438458
use {kib} or the {ref}/delete-dfanalytics.html[delete {dfanalytics-job} API].
439-
When you delete {dfanalytics-jobs}, the destination indices remain intact.
459+
When you delete {dfanalytics-jobs}, the destination indices remain intact.
Loading
Loading
Loading
Loading

0 commit comments

Comments
 (0)