Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' #49796

Closed
przemekwitek opened this issue Dec 3, 2019 · 2 comments

Comments

@przemekwitek
Copy link
Contributor

przemekwitek commented Dec 3, 2019

I've run a classification analysis on a synthetic dataset that tries to detect circle on a plane.
I've indexed docs with points on a 2D plane as well as a dependent variable ("is the point inside a unit circle"). The analysis finished correctly, but then I tried to evaluate the results using the following request:

{
  "index": "circle-ml",
  "query": {
    "term": {
      "ml.is_training": false
    }
  },
  "evaluation": {
    "classification": {
      "actual_field": "in_unit_circle",
      "predicted_field": "ml.in_unit_circle_prediction.keyword",
      "metrics": {
        "accuracy": {},
        "multiclass_confusion_matrix": {}
      }
    }
  }
}

The evaluation reported accuracy of 0 as it could not find any point for which dependent_variable was equal to the prediction.
The problem is that dependent variable is boolean and prediction is string, and the painless script is:

doc[''{0}''].value == doc[''{1}''].value

Two solutions I see here are:

  • (simpler) relax the equality check so that it treats boolean true and string "true" as equal
  • (more involved) make C++ code report prediction using the type of dependent variable. The type of the dependent variable can be passed down from Java.

Also, the same scenario should be reproduced for integer types.

@elasticmachine
Copy link
Collaborator

Pinging @elastic/ml-core (:ml)

@przemekwitek przemekwitek changed the title Accuracy metric fails when the dependent variable is of type 'boolean' Accuracy metric fails when the dependent variable is of type 'boolean' or 'integer' Dec 4, 2019
@przemekwitek
Copy link
Contributor Author

The fix has been merged to master and 7.x branches.
I've just verified manually that it worked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants