Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' #49796

przemekwitek · 2019-12-03T14:55:24Z

I've run a classification analysis on a synthetic dataset that tries to detect circle on a plane.
I've indexed docs with points on a 2D plane as well as a dependent variable ("is the point inside a unit circle"). The analysis finished correctly, but then I tried to evaluate the results using the following request:

{
  "index": "circle-ml",
  "query": {
    "term": {
      "ml.is_training": false
    }
  },
  "evaluation": {
    "classification": {
      "actual_field": "in_unit_circle",
      "predicted_field": "ml.in_unit_circle_prediction.keyword",
      "metrics": {
        "accuracy": {},
        "multiclass_confusion_matrix": {}
      }
    }
  }
}

The evaluation reported accuracy of 0 as it could not find any point for which dependent_variable was equal to the prediction.
The problem is that dependent variable is boolean and prediction is string, and the painless script is:

doc[''{0}''].value == doc[''{1}''].value

Two solutions I see here are:

(simpler) relax the equality check so that it treats boolean true and string "true" as equal
(more involved) make C++ code report prediction using the type of dependent variable. The type of the dependent variable can be passed down from Java.

Also, the same scenario should be reproduced for integer types.

The text was updated successfully, but these errors were encountered:

elasticmachine · 2019-12-03T14:55:28Z

Pinging @elastic/ml-core (:ml)

przemekwitek · 2019-12-09T13:47:00Z

The fix has been merged to master and 7.x branches.
I've just verified manually that it worked.

przemekwitek added >non-issue :ml Machine learning v8.0.0 v7.6.0 labels Dec 3, 2019

przemekwitek self-assigned this Dec 3, 2019

przemekwitek changed the title ~~Accuracy metric fails when the dependent variable is of type 'boolean'~~ Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' Dec 4, 2019

This was referenced Dec 5, 2019

Emit predicted category using an appropriate JSON type. elastic/ml-cpp#877

Merged

Pass prediction_field_type to C++ analytics process #49861

Merged

przemekwitek closed this as completed Dec 9, 2019

This was referenced Feb 3, 2020

[meta] 7.6 release elastic/elasticsearch-net#4340

Closed

[meta] 7.6 release elastic/elasticsearch-net#4341

Closed

jakelandis added v8.0.0-alpha1 and removed v8.0.0 labels Jul 26, 2021

ChrisHegarty unassigned przemekwitek Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' #49796

Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' #49796

przemekwitek commented Dec 3, 2019 •

edited

Loading

elasticmachine commented Dec 3, 2019

przemekwitek commented Dec 9, 2019

Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' #49796

Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' #49796

Comments

przemekwitek commented Dec 3, 2019 • edited Loading

elasticmachine commented Dec 3, 2019

przemekwitek commented Dec 9, 2019

przemekwitek commented Dec 3, 2019 •

edited

Loading