Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] DF Analytics results: Support for feature_importance #55805

Closed
walterra opened this issue Jan 24, 2020 · 2 comments · Fixed by #61761
Closed

[ML] DF Analytics results: Support for feature_importance #55805

walterra opened this issue Jan 24, 2020 · 2 comments · Fixed by #61761
Assignees
Labels
bug Fixes for quality problems that affect the customer experience Feature:Data Frame Analytics ML data frame analytics features :ml v7.8.0

Comments

@walterra
Copy link
Contributor

walterra commented Jan 24, 2020

feature_importance is missing for regression and classification job results. To fix it, the parameter num_top_feature_importance_values needs to be set. To fix feature_importance in the UI we need to do two things:

  • In the job creation flyout, make num_top_feature_importance_values available as an optional input field so it gets added to the configuration. At the moment this can only be done via the advanced editor and manually editing the JSON.
  • For the results pages, feature_importance fields need to be made available in the dropdown to select table column, similar to how feature_influence is made available for outlier detection jobs. On top of that we can use it to do the same color coding we do for outlier detection.

Documentation about feature_importance and num_top_feature_importance_values can be found here: https://www.elastic.co/guide/en/elasticsearch/reference/master/put-dfanalytics.html

@walterra walterra added :ml Feature:Data Frame Analytics ML data frame analytics features labels Jan 24, 2020
@elasticmachine
Copy link
Contributor

Pinging @elastic/ml-ui (:ml)

@Winterflower
Copy link

Hi @walterra - I'm exploring feature_importance interpretation at the moment with some datasets and I wanted to bring up a potential issue that might affect the presentation of this data in the DF Analytics results UI. The behaviour of this feature is different than the feature influence values we had in outlier detection and probably bears keeping in mind in UI design.

Suppose we set the num_top_feature_importance_values to 2. This means that the analytics process will output values for at most the two top feature importance fields, but this is calculated per document. There is no guarantee that every data point/document will have the same two fields as the top two most important once. Hence, you can end up in a situation where you have say 10 fields in the results index with the ml.feature_importance prefix, but not every doc will have a value in each of the 10. So you might end up with a table that looks a bit like this

Screen Shot 2020-02-26 at 2 13 34 PM

@walterra walterra added the bug Fixes for quality problems that affect the customer experience label Mar 26, 2020
@walterra walterra added v7.8.0 and removed v7.7.0 labels Mar 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Fixes for quality problems that affect the customer experience Feature:Data Frame Analytics ML data frame analytics features :ml v7.8.0
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants