Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How exactly LightGBM predictions are obtained? #3571

Closed
maksymiuks opened this issue Nov 16, 2020 · 3 comments
Closed

How exactly LightGBM predictions are obtained? #3571

maksymiuks opened this issue Nov 16, 2020 · 3 comments
Labels

Comments

@maksymiuks
Copy link

maksymiuks commented Nov 16, 2020

Hi

In the beginning, I'd like to appreciate your work covering that package. I consider it a great tool. That's why I'm working on a dedicated R interface for tree ensemble models that allows calculating shap values fast using C++ code via Rcpp. LightGBM package is one in the scope of my interest. But for that, I need the information on how exactly the committee of trees is aggregated. From my inspection of the code, I have a hunch that the final prediction is a sum of prediction for all trees with some intercept subtracted. Am I correct? If so how to find that intercept because I wasn't able to in the model object. The goal for me is to acquire a plain sum of predictions for all trees.

Best Regards
Szymon Maksymiuk

@maksymiuks maksymiuks changed the title How exactly LightGBM predictions are acquire? How exactly LightGBM predictions are obtained? Nov 16, 2020
@guolinke
Copy link
Collaborator

except for mutli-class tasks, the prediction is the sum of prediction of each tree.
For some tasks, like binary classification, there could be a transformation after sum, like sigmoid.
For multi-class, you need to sum the prediction by class first (trees are organized as tree[i * K + j], where i is iteration, j is class-id, and K is the number of class), and use softmax to get the probabilities for classes.

@btrotta
Copy link
Collaborator

btrotta commented Dec 7, 2020

I'm not sure if I'm understanding your question correctly, but I think when you talk about the "intercept" you mean something like the baseline constant prediction? E.g. for a binary prediction problem where the training labels are 90% ones and 10% zeros, we would start with a constant prediction of 0.9 and then add trees to improve the accuracy. This is indeed how LightGBM works, and this constant value is added to the leaf values of the first tree. So if you use Booster.save_model() (https://lightgbm.readthedocs.io/en/latest/pythonapi/lightgbm.Booster.html?highlight=save%20model#lightgbm.Booster.save_model) the leaf values for the first tree include this baseline value.
The relevant part of the C++ code is in TrainOneIter

bool GBDT::TrainOneIter(const score_t* gradients, const score_t* hessians) {
In the first iteration (when gradients and hessians are nullptr), it calls BoostFromAverage which calculates the constant initial prediction. Then it calculates the optimal tree (fitting to the error from the constant prediction), and later it calls AddBias to add the constant to the individual leaf values.

@github-actions
Copy link

This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Aug 23, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

No branches or pull requests

5 participants