Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

remove uses of predict(reshape = TRUE) for lightgbm models (fixes #40) #41

Merged
merged 3 commits into from
Jul 18, 2022
Merged

remove uses of predict(reshape = TRUE) for lightgbm models (fixes #40) #41

merged 3 commits into from
Jul 18, 2022

Conversation

jameslamb
Copy link
Contributor

Fixes #40.

As described in that issue, the reshape argument to predict.lgb.Booster() has been removed in the development version of {lightgbm}. This PR proposes changes to make {bonsai} compatible with old and new versions of {lightgbm} by removing the use of that argument.

Question for Reviewers

Is the following statement true?

generating leaf-index or SHAP feature contributions directly using {bonsai} is not supported

If not, then this PR might need some changes. See https://github.com/microsoft/LightGBM/blob/44fe591a60d7427d64997c37b22768a92c97c47b/R-package/R/lgb.Predictor.R#L323-L326 for reference on why I'm asking.

Thanks for your time and consideration.

Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Implementation looks good, besides that one comment.

Confirming that

generating leaf-index or SHAP feature contributions directly using {bonsai} is not supported

isn't from our documentation? Could you supply a reprex with lightgbm only showing the kind of output you're looking for?

R/lightgbm.R Outdated Show resolved Hide resolved
Co-authored-by: Simon P. Couch <simonpatrickcouch@gmail.com>
@jameslamb
Copy link
Contributor Author

jameslamb commented Jul 18, 2022

Could you supply a reprex with lightgbm only showing the kind of output you're looking for

Correct, that line about SHAP values and leaf-index "predictions" isn't from {bonsai}'s documentation. Sorry if my use of block-quote there was confusing.

{lightgbm}'s predict() method can be used to generate 4 types of predictions.

  • predict(...) = predictions which might be transformed into a different form than the objective function, based on task (e.g. probabilities for classification)
  • predict(rawscore = TRUE) = value of the objective function
  • predict(predleaf = TRUE) = a [num_rows_in_new_data, num_trees] matrix where a cell [i, j] gives the index of the leaf node that the i-th sample falls into in the j-th tree
  • predict(predcontrib = TRUE) = a [num_rows_in_new_data, num_features + 1] matrix where a cell [i, j] gives the contribution of feature j to the prediction for the i-th sample, and where the final column is the Shapley "base value"

But as I type this out, I realize...it doesn't actually matter for this PR. On both lightgbm <= 3.3.2 and the upcoming v4.0.0, the shape of the returned objects for those predictions doesn't depend on the reshape argument. Sorry for wasting your time with it!

For a reproducible example of these different prediction types, you could for example see the excellent example provided in this issue: microsoft/LightGBM#5223.

I'll get the tests working here shortly and request a re-review. I guess there must have been something I missed in the way that I ran them locally that differs from how this project's CI is set up, and maybe that ended up in me not noticing that some tests which are failing in CI were skipped locally.

@simonpcouch
Copy link
Contributor

Much appreciated, thanks for the references! I see—our machinery might indeed trip up on that output format, as yall seem to have come across. Glad it seems there's a workaround that feels good.

devtools::test() should set those environmental variables as expected during testing, if you're up for that change in workflow. Apologies for the frustrations in translating between our folks' tooling. :)

@jameslamb jameslamb requested a review from simonpcouch July 18, 2022 15:04
@jameslamb
Copy link
Contributor Author

Ok I think I've resolved the issues, sorry about that!

Copy link
Contributor

@simonpcouch simonpcouch left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect, thanks!

@simonpcouch simonpcouch merged commit b403e2a into tidymodels:main Jul 18, 2022
@jameslamb jameslamb deleted the lightgbm-reshape branch July 18, 2022 15:28
@github-actions
Copy link

This pull request has been automatically locked. If you believe you have found a related problem, please file a new issue (with a reprex: https://reprex.tidyverse.org) and link to this issue.

@github-actions github-actions bot locked and limited conversation to collaborators Jan 11, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

'reshape' argument to predict() will be removed in LightGBM v4.0.0
2 participants