-
Notifications
You must be signed in to change notification settings - Fork 738
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ENHANCEMENT: Add a prediction function that also calculates uncertainty in the prediction #584
ENHANCEMENT: Add a prediction function that also calculates uncertainty in the prediction #584
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## develop #584 +/- ##
===========================================
+ Coverage 75.71% 75.73% +0.02%
===========================================
Files 72 72
Lines 9101 9109 +8
===========================================
+ Hits 6891 6899 +8
Misses 2210 2210
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Sentry. |
Thanks for this PR @degenfabian. I do have some suggestions regarding your questions. There is an internal function called ebm_predict_scores that you can use to simplify this processing. You can see an example of it being used here: and also here: https://github.com/interpretml/interpret/blob/develop/python/interpret-core/interpret/glassbox/_ebm/_ebm.py#L1040-L1049 The ebm_predict_scores function is designed to handle individual bagged models. To do this you would replace self.intercept_ and self.term_scores_ with slices from self.bagged_intercept_ and self.bagged_scores_. With this change you shouldn't need the preprocessor anymore, and it should also handle the condition where max_bins != max_interaction_bins. |
Thank you so much for the feedback @paulbkoch, it was really helpful! I tried implementing it to the best of my ability and it looks a lot cleaner now! Do you mind taking another look at it? As you said it, max_bins != max_interaction_bins is not an issue anymore with this updated version! Attached is another plot with decision boundary on the left and prediction uncertainty on the right, calculated with the updated version of the function. |
e760490
to
15216ae
Compare
Thanks @degenfabian, the code looks good. Before I merge though, could you add a test? |
Of course, just added them! So sorry I didn't do that earlier and thank you so much for your valuable and insightful feedback! |
…ction Signed-off-by: Fabian Degen <fabian.degen@mytum.de>
…when calling uncertainty prediction function Signed-off-by: Fabian Degen <fabian.degen@mytum.de>
…s_with_uncertainty Signed-off-by: Fabian Degen <fabian.degen@mytum.de>
…tion function Signed-off-by: Fabian Degen <fabian.degen@mytum.de>
Signed-off-by: Fabian Degen <fabian.degen@mytum.de>
c79aa52
to
8fcf7c4
Compare
Signed-off-by: Fabian Degen <fabian.degen@mytum.de>
This PR intends to fix issue #235. Since release v0.3.0 individual bagged models are stored inside the EBM which made it possible to create a prediction function that also calculates uncertainty by calculating the mean and standard deviation across bags (as suggested in the issue #235 by @interpret-ml).
Since this issue was created, some implementation details were changed and I was not able to access the transform function from EBMPreprocessor anymore. Therefore I edited the function construct_bins in _preprocessor.py to return the main_preprocessor and then safe it as class attribute in EBMModel.
I am not sure if it should be kept this way because an earlier release removed this attribute from EBMModel, but I do not know how else to access the functionality of the transform method (except duplicating it). I would be happy about guidance regarding this.
Another implementation detail which has changed was the structure of bins in the transform function, which threw an error when calling the discretize function. I fixed this by always accessing the last element in the bins list, but I am also happy about guidance here for cleaner or more robust solutions.
Last but not least, this currently only works if max_bins == max_interaction_bins, so at the moment I am just informing the user when calling that these attributes have to be equal.
I also tested the uncertainty estimates on the breast cancer dataset from sklearn and have attached the plot I made to this PR. I think it looks good :)
![Figure_1](https://private-user-images.githubusercontent.com/106864199/382100497-c46b7c35-516d-46e1-bc4a-24b6c5a9d19c.png?jwt=eyJhbGciOiJIUzI1NiIsInR5cCI6IkpXVCJ9.eyJpc3MiOiJnaXRodWIuY29tIiwiYXVkIjoicmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbSIsImtleSI6ImtleTUiLCJleHAiOjE3MzkxMTQ2NTYsIm5iZiI6MTczOTExNDM1NiwicGF0aCI6Ii8xMDY4NjQxOTkvMzgyMTAwNDk3LWM0NmI3YzM1LTUxNmQtNDZlMS1iYzRhLTI0YjZjNWE5ZDE5Yy5wbmc_WC1BbXotQWxnb3JpdGhtPUFXUzQtSE1BQy1TSEEyNTYmWC1BbXotQ3JlZGVudGlhbD1BS0lBVkNPRFlMU0E1M1BRSzRaQSUyRjIwMjUwMjA5JTJGdXMtZWFzdC0xJTJGczMlMkZhd3M0X3JlcXVlc3QmWC1BbXotRGF0ZT0yMDI1MDIwOVQxNTE5MTZaJlgtQW16LUV4cGlyZXM9MzAwJlgtQW16LVNpZ25hdHVyZT00MDRmYWNjMjVmOWNmOGRmMjI3Yjg3MjFmYjYzMGIyZWJmYTU2Y2M1MGYxNjViYjhkNWRlZGQ5M2U3ZjBiODlmJlgtQW16LVNpZ25lZEhlYWRlcnM9aG9zdCJ9.MGupmgY0mtUoiclO6A8B_397-fFFjZgMdVFnicwTmOU)