Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[ML] Add decision path charts to exploration results table #73561

Merged
merged 50 commits into from
Sep 9, 2020

Conversation

qn895
Copy link
Member

@qn895 qn895 commented Jul 28, 2020

Summary

This PR adds the decision path popover to the exploration table. Popover includes both the chart view and the JSON view.

Clicking on the ml.feature_importance cell will show both tabs.
Screen Shot 2020-08-20 at 3 45 22 PM
Screen Shot 2020-08-20 at 4 29 55 PM

  • If there's only 1 feature, no need to show the chart.
  • For regression jobs:
    • Calculate the baseline, and add the feature importance values in order of ascending absolute importance to arrive at the final predicted value.

Screen Shot 2020-08-20 at 4 01 53 PM

  • Show other which is the residual feature importance for when number of features included is less than number of feature importance set

Screen Shot 2020-08-20 at 4 01 20 PM

  • For binary classification jobs:

    • Show dropdown for user to pick what class name/path to visualize. By default pick the first option since option has highest probability and is the ml.prediction value.
      2020-08-20 at 4 02 PM
  • For multiclass classification jobs: (pending on backend)
    multiclass_decision_path

Checklist

Delete any items that are not applicable to this PR.

@walterra
Copy link
Contributor

I love where this is going! Some feedback:

  • Can we style the tabs to have smaller font and padding (for example same as the data grid header)?
  • Ideally we should dynamically adjust the popover height so the chart is viewable without scrolling.
  • Suggest to make the padding between features significantly smaller.

Here's a mockup with the suggestions: (For this one I set the multiplier for the row height like this: height: mappedFeatureImportance.length * 22)

image

@walterra
Copy link
Contributor

The SHAP library calls this chart type a "Decision Plot", should we also go with this naming? In this case I'd rename the tab names in the popover to Decision Plot | JSON instead of just Chart | JSON. (Regardless of the name we decide to use, IMHO we should be more specific in the tab title than just saying Chart)

@qn895
Copy link
Member Author

qn895 commented Jul 29, 2020

The SHAP library calls this chart type a "Decision Plot", should we also go with this naming? In this case I'd rename the tab names in the popover to Decision Plot | JSON instead of just Chart | JSON. (Regardless of the name we decide to use, IMHO we should be more specific in the tab title than just saying Chart)

Will do!

I love where this is going! Some feedback:

  • Can we style the tabs to have smaller font and padding (for example same as the data grid header)?
  • Ideally we should dynamically adjust the popover height so the chart is viewable without scrolling.
  • Suggest to make the padding between features significantly smaller.

Here's a mockup with the suggestions: (For this one I set the multiplier for the row height like this: height: mappedFeatureImportance.length * 22)

image

That definitely looks much more neat and compact. Will try to make the tabs smaller.

@peteharverson
Copy link
Contributor

This is looking good! A few extra comments in addition to the suggestions from @walterra:

  • For the point tooltips, limit the value to 3 sigfigs (I'm unsure if the values in the JSON tab should also be limited to 3 sigfigs):

image

  • Popover definitely needs to adjust to the number of features (up to a maximum, at which point having a scrollbar makes sense)

  • The features should be sorted from top to bottom in order of descending magnitude(importance). This doesn't seem to be the case for some of my jobs (classification ones possibly).

  • Add some brief explanation text and/or an info tooltip to the chart tab to explain what the chart is displaying (using the text for the decision path viz in https://www.elastic.co/blog/feature-importance-for-data-frame-analytics-with-elastic-machine-learning as a guide).

  • Is it possible to add gridlines to the chart? If so, would be interesting to see how it looks.

  • Add 'Prediction' x-axis label

  • Some of my jobs don't have a single feature_importance column, so I don't see the chart / JSON popover. Do we know why we don't get a single column for some jobs?

image

  • I guess this might be because the baseline calculation is still a WIP, but the x-axis value for the top point doesn't end up at the predicted value. The x-axis value should be I think Moving from the bottom of the plot to the top, SHAP values (= feature importance) for each feature are added to the model's base value. This shows how each feature contributes to the overall prediction. Which in this case would result I think in the top point be plotted at the predicted value of -0.059:

image

- Add grid lines
- Adjust sorting for classification from biggest at the top to smallest at the bottom
- Change chart to using cumulative values
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@qn895
Copy link
Member Author

qn895 commented Jul 29, 2020

@walterra I renamed the tab to Decision Plot, updated the height (small and compact if num features > 3). Unfortunately, I have confirmed with the Eui team that there's not a way right now for us to override the padding of that EuiContentPopover. Also there seems to be a bug with the overflow so I'm working with the Eui team to figure out the fix.

Example: 3 features vs 8 features
Screen Shot 2020-07-29 at 3 18 41 PM
Screen Shot 2020-07-29 at 3 16 09 PM

@Pete I have also

  • Added sorting for the classification results, although I think this is temporary as
  • Added gridlines
  • Added Prediction as x axis label
  • Changed tooltips to 3 significant figures
  • Corrected the prediction decision plot where it will start at the baseline and add the feature importance until it reaches the predicted value at the top. Although I'm not sure if it's due to the significant figures or what but I noticed some rows that didn't add up quite right.

Screen Shot 2020-07-29 at 3 16 09 PM

Some of my jobs don't have a single feature_importance column, so I don't see the chart / JSON popover. Do we know why we don't get a single column for some jobs?

As for this I'm not too sure. I wasn't able to create this by setting num_features to 0 since it should hide the column automatically. Might need to look at what job config gives you that result.

@pete
Copy link

pete commented Jul 29, 2020

@qn895 Thank you, I think that is pretty cool to do but I don't have any strong opinions on this changeset, as I am completely unfamiliar with Kibana.

@peteharverson
Copy link
Contributor

Looking at my job configs which give me two empty feature_importance columns in the data grid, I think it only happens when I have num_top_feature_importance_values=0.

image

I don't know if we want to / can hide those columns from the data grid column picker if num_top_feature_importance_values=0?

# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
# Please enter a commit message to explain why this merge is necessary,
# especially if it merges an updated upstream into a topic branch.
#
# Lines starting with '#' will be ignored, and an empty message aborts
# the commit.
@peteharverson
Copy link
Contributor

peteharverson commented Sep 4, 2020

This is looking good! Just wondered if it's possible to control the formatting on the x-axis and tooltip value depending on magnitude of the values, so for example, we'd display 1000 rather than 1.00e+3 here:

image

Update: Tested after edits in 5fd5ff4 and formatting looks good.

image

@stevedodson
Copy link
Contributor

@peteharverson - reviewed this today with you and after going through some examples it looks good. My only concern is that the docs (linked to from the popup) have a clear concrete example of the chart with explanation.

An idea is that a specific prediction like AvgTicketPrice and show how the features are summed with the average to make the prediction + a similar concrete example for classification.

@qn895
Copy link
Member Author

qn895 commented Sep 8, 2020

This is looking good! Just wondered if it's possible to control the formatting on the x-axis and tooltip value depending on magnitude of the values, so for example, we'd display 1000 rather than 1.00e+3 here:

Updated here 5fd5ff4

We have DEFAULT_RESULTS_FIELD for this in plugins/ml/public/application/data_frame_analytics/common/constants.ts

Thanks @walterra! I have moved this DEFAULT_RESULTS_FIELD to x-pack/plugins/ml/common/constants/data_frame_analytics.ts so both the application and the server can share this constant here a93e740 and here 5fd5ff4

Copy link
Contributor

@peteharverson peteharverson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested latest edits and LGTM!

@walterra
Copy link
Contributor

walterra commented Sep 9, 2020

Latest code LGTM!

I noticed how the code is structured causes the ML page load bundle to grow in size because of the changes. My suspicion this is happening because part of the data grid code that's shared with the transforms plugin grew because of the decision plot code (even if the decision plot itself isn't used by the transform plugin).

I think this is something worth investigating in a follow up so I created an issue for it here #77041

@qn895
Copy link
Member Author

qn895 commented Sep 9, 2020

@elasticmachine merge upstream

@kibanamachine
Copy link
Contributor

💚 Build Succeeded

Build metrics

@kbn/optimizer bundle module count

id value diff baseline
ml 1395 +9 1386

async chunks size

id value diff baseline
ml 8.0MB -230.3KB 8.2MB

page load bundle size

id value diff baseline
ml 847.8KB ⚠️ +263.4KB 584.4KB

distributable file count

id value diff baseline
default 45476 +4 45472

History

To update your PR or re-run it, just comment with:
@elasticmachine merge upstream

@qn895 qn895 merged commit 619108c into elastic:master Sep 9, 2020
@qn895 qn895 deleted the feature-importance branch September 9, 2020 16:41
qn895 added a commit to qn895/kibana that referenced this pull request Sep 9, 2020
…3561)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
qn895 added a commit that referenced this pull request Sep 9, 2020
) (#77082)

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
gmmorris added a commit to gmmorris/kibana that referenced this pull request Sep 10, 2020
* master: (25 commits)
  [bugfix] Replace panel flyout opens 2 flyouts (elastic#76931)
  clean up test (elastic#76887)
  [Enterprise Search] Update shared API request handler (elastic#77112)
  [Maps] convert ESAggSource to TS (elastic#76999)
  Add plugin status API - take 2 (elastic#76732)
  Adds lens as a readable saved object for read-only dashboard users (elastic#77067)
  Skip checking for the reserved realm (elastic#76687)
  Reporting/diagnostics (elastic#74314)
  Reporting/Test: unskip non-screenshot tests (elastic#77088)
  Move metrics to setup and add cgroup metrics (elastic#76730)
  [Enterprise Search] Add Overview landing page/plugin (elastic#76734)
  First pass. Change TS type. Update OpenAPI (elastic#76434)
  [CI] Balance xpack ci groups a bit (elastic#77068)
  [Security_solution][Detections] Refactor signal ancestry to allow multiple parents (elastic#76531)
  [Maps] convert MetricsEditor to TS (elastic#76727)
  IndexMigrator: fix non blocking migration wrapper promise rejection (elastic#77018)
  [Enterprise Search] Update config data endpoint to v2 (elastic#76970)
  [ML] Add decision path charts to exploration results table (elastic#73561)
  Bump eventemitter3 from 4.0.0 to 4.0.7 (elastic#77016)
  [Ingest Pipelines] Add descriptions for ingest processors K-S (elastic#76981)
  ...
gmmorris added a commit to gmmorris/kibana that referenced this pull request Sep 10, 2020
* master: (41 commits)
  [bugfix] Replace panel flyout opens 2 flyouts (elastic#76931)
  clean up test (elastic#76887)
  [Enterprise Search] Update shared API request handler (elastic#77112)
  [Maps] convert ESAggSource to TS (elastic#76999)
  Add plugin status API - take 2 (elastic#76732)
  Adds lens as a readable saved object for read-only dashboard users (elastic#77067)
  Skip checking for the reserved realm (elastic#76687)
  Reporting/diagnostics (elastic#74314)
  Reporting/Test: unskip non-screenshot tests (elastic#77088)
  Move metrics to setup and add cgroup metrics (elastic#76730)
  [Enterprise Search] Add Overview landing page/plugin (elastic#76734)
  First pass. Change TS type. Update OpenAPI (elastic#76434)
  [CI] Balance xpack ci groups a bit (elastic#77068)
  [Security_solution][Detections] Refactor signal ancestry to allow multiple parents (elastic#76531)
  [Maps] convert MetricsEditor to TS (elastic#76727)
  IndexMigrator: fix non blocking migration wrapper promise rejection (elastic#77018)
  [Enterprise Search] Update config data endpoint to v2 (elastic#76970)
  [ML] Add decision path charts to exploration results table (elastic#73561)
  Bump eventemitter3 from 4.0.0 to 4.0.7 (elastic#77016)
  [Ingest Pipelines] Add descriptions for ingest processors K-S (elastic#76981)
  ...
rylnd added a commit to rylnd/kibana that referenced this pull request Sep 10, 2020
* master: (38 commits)
  Reporting/Test: unskip non-screenshot tests (elastic#77088)
  Move metrics to setup and add cgroup metrics (elastic#76730)
  [Enterprise Search] Add Overview landing page/plugin (elastic#76734)
  First pass. Change TS type. Update OpenAPI (elastic#76434)
  [CI] Balance xpack ci groups a bit (elastic#77068)
  [Security_solution][Detections] Refactor signal ancestry to allow multiple parents (elastic#76531)
  [Maps] convert MetricsEditor to TS (elastic#76727)
  IndexMigrator: fix non blocking migration wrapper promise rejection (elastic#77018)
  [Enterprise Search] Update config data endpoint to v2 (elastic#76970)
  [ML] Add decision path charts to exploration results table (elastic#73561)
  Bump eventemitter3 from 4.0.0 to 4.0.7 (elastic#77016)
  [Ingest Pipelines] Add descriptions for ingest processors K-S (elastic#76981)
  [Metrics UI] Replace Snapshot API with Metrics API (elastic#76253)
  legacy utils cleanup (elastic#76608)
  [ML] Account for "properties" layer in find_file_structure mappings (elastic#77035)
  fixed typo
  Upgrade to Kea 2.2 (elastic#77047)
  a11y tests on spaces home page including feature control  (elastic#76515)
  [ML] Transforms list: persist pagination through refresh interval (elastic#76786)
  [ML] Replace all use of date_histogram interval with fixed_interval (elastic#76876)
  ...
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants