@@ -89,10 +89,14 @@ results.
89
89
90
90
To predict the number of minutes delayed for each flight:
91
91
92
+ . Verify that your environment is set up properly to use {ml-features}. If the
93
+ {stack} {security-features} are enabled, you need a user that has authority
94
+ to create and manage {dfanalytics-jobs}. See <<setup>>.
95
+
92
96
. Create a {dfanalytics-job}.
93
97
+
94
98
--
95
- You can use the wizard on the *Machine Learning * > *Data Frame Analaytics * tab
99
+ You can use the wizard on the *{ml-app} * > *Data Frame Analytics * tab
96
100
in {kib} or the {ref}/put-dfanalytics.html[create {dfanalytics-jobs}] API.
97
101
98
102
[role="screenshot"]
@@ -195,10 +199,8 @@ POST _ml/data_frame/analytics/model-flight-delays/_start
195
199
[role="screenshot"]
196
200
image::images/flights-regression-details.png["Statistics for a {dfanalytics-job} in {kib}"]
197
201
198
- The job has four main phases (reindexing, loading data, analyzing, and writing
199
- results). When all the phases have completed, the job stops and the results are
200
- ready to view and evaluate. Consult <<ml-dfa-phases>> to learn more about the
201
- different phases.
202
+ When the job stops, the results are ready to view and evaluate. To learn more
203
+ about the job phases, see <<ml-dfa-phases>>.
202
204
203
205
204
206
.API example
@@ -230,46 +232,63 @@ The API call returns the following response:
230
232
"progress_percent" : 100
231
233
},
232
234
{
233
- "phase" : "analyzing",
235
+ "phase" : "feature_selection",
236
+ "progress_percent" : 100
237
+ },
238
+ {
239
+ "phase" : "coarse_parameter_search",
240
+ "progress_percent" : 100
241
+ },
242
+ {
243
+ "phase" : "fine_tuning_parameters",
244
+ "progress_percent" : 100
245
+ },
246
+ {
247
+ "phase" : "final_training",
234
248
"progress_percent" : 100
235
249
},
236
250
{
237
251
"phase" : "writing_results",
238
252
"progress_percent" : 100
253
+ },
254
+ {
255
+ "phase" : "inference",
256
+ "progress_percent" : 100
239
257
}
240
258
],
241
259
"data_counts" : {
242
- "training_docs_count" : 11759 ,
243
- "test_docs_count" : 1300 ,
260
+ "training_docs_count" : 11210 ,
261
+ "test_docs_count" : 1246 ,
244
262
"skipped_docs_count" : 0
245
263
},
246
264
"memory_usage" : {
247
- "timestamp" : 1587590328000,
248
- "peak_usage_bytes" : 2424894
265
+ "timestamp" : 1596237978801,
266
+ "peak_usage_bytes" : 2204548,
267
+ "status" : "ok"
249
268
},
250
269
"analysis_stats" : {
251
270
"regression_stats" : {
252
- "timestamp" : 1587590328000 ,
271
+ "timestamp" : 1596237978801 ,
253
272
"iteration" : 18,
254
273
"hyperparameters" : {
255
- "alpha" : 13913.440706141744 ,
256
- "downsample_factor" : 0.8296546656515433 ,
257
- "eta" : 0.04216457735949444 ,
258
- "eta_growth_rate_per_tree" : 1.0264998162827081 ,
274
+ "alpha" : 168825.7788898173 ,
275
+ "downsample_factor" : 0.9033277769849748 ,
276
+ "eta" : 0.04884738703731517 ,
277
+ "eta_growth_rate_per_tree" : 1.0299887790757198 ,
259
278
"feature_bag_fraction" : 0.5504020748926737,
260
- "gamma" : 722.9233202705029 ,
261
- "lambda" : 1.0278806525490607 ,
279
+ "gamma" : 1454.4275926774008 ,
280
+ "lambda" : 2.1114872989215074 ,
262
281
"max_attempts_to_add_tree" : 3,
263
282
"max_optimization_rounds_per_hyperparameter" : 2,
264
- "max_trees" : 483 ,
283
+ "max_trees" : 427 ,
265
284
"num_folds" : 4,
266
285
"num_splits_per_feature" : 75,
267
- "soft_tree_depth_limit" : 3.105960810136212 ,
286
+ "soft_tree_depth_limit" : 5.8014874129785 ,
268
287
"soft_tree_depth_tolerance" : 0.13448633124842999
269
288
},
270
289
"timing_stats" : {
271
- "elapsed_time" : 168362 ,
272
- "iteration_time" : 9691
290
+ "elapsed_time" : 124851 ,
291
+ "iteration_time" : 15081
273
292
},
274
293
"validation_loss" : {
275
294
"loss_type" : "mse",
@@ -302,7 +321,8 @@ predict with the {reganalysis}. It also shows a column for the prediction values
302
321
(`ml.FlightDelayMin_prediction`) and a column that indicates whether the
303
322
document was used in the training set (`ml.is_training`). You can filter the
304
323
table to show only testing or training data and you can select which fields are
305
- shown in the table.
324
+ shown in the table. You can also enable histogram charts to get a better
325
+ understanding of the distribution of values in your data.
306
326
307
327
If you do not use {kib}, you can see the same information by using the standard
308
328
{es} search command to view the results in the destination index.
@@ -321,13 +341,13 @@ The snippet below shows a part of a document with the annotated results:
321
341
[source,console-result]
322
342
----
323
343
...
324
- "DestRegion" : "UK",
325
- "OriginAirportID" : "LHR",
344
+ "DestCountry" : "GB",
345
+ "DestRegion" : "GB-ENG",
346
+ "OriginAirportID" : "CAN",
326
347
"DestCityName" : "London",
327
- "FlightDelayMin" : 66,
328
348
"ml" : {
329
- "FlightDelayMin_prediction" : 62.527 ,
330
- "is_training" : false
349
+ "FlightDelayMin_prediction" : 10.039840698242188 ,
350
+ "is_training" : true
331
351
}
332
352
...
333
353
----
@@ -376,7 +396,7 @@ POST _ml/data_frame/_evaluate
376
396
"predicted_field": "ml.FlightDelayMin_prediction", <4>
377
397
"metrics": {
378
398
"r_squared": {},
379
- "mean_squared_error ": {}
399
+ "mse ": {}
380
400
}
381
401
}
382
402
}
@@ -395,11 +415,11 @@ The API returns a response like this:
395
415
----
396
416
{
397
417
"regression" : {
398
- "mean_squared_error " : {
399
- "error " : 3006.517622042659
418
+ "mse " : {
419
+ "value " : 3125.3396943667544
400
420
},
401
421
"r_squared" : {
402
- "value" : 0.6794200914263231
422
+ "value" : 0.6659988649180306
403
423
}
404
424
}
405
425
}
@@ -423,7 +443,7 @@ POST _ml/data_frame/_evaluate
423
443
"predicted_field": "ml.FlightDelayMin_prediction",
424
444
"metrics": {
425
445
"r_squared": {},
426
- "mean_squared_error ": {}
446
+ "mse ": {}
427
447
}
428
448
}
429
449
}
@@ -436,4 +456,4 @@ POST _ml/data_frame/_evaluate
436
456
437
457
If you don't want to keep the {dfanalytics-job}, you can delete it. For example,
438
458
use {kib} or the {ref}/delete-dfanalytics.html[delete {dfanalytics-job} API].
439
- When you delete {dfanalytics-jobs}, the destination indices remain intact.
459
+ When you delete {dfanalytics-jobs}, the destination indices remain intact.
0 commit comments