-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Path smoothing #2950
Path smoothing #2950
Conversation
Below is a basic script to test on the Boston housing data and the results. The script tries several different regularisation parameters (individually, not in combination) and selects the best value for each, then plots the results.
|
I think I fixed the problem with the gpu version but CI is still failing, because of a problem with the R package. Not sure if this is due to my code changes or not... |
@btrotta If you rebase to Sorry for the inconvenience! |
@jameslamb We need your help here:
|
ah! I checked the diff and this note is not a problem for now. This is mainly there to keep the hosting costs for CRAN down, so that people don't (for example) check large files with datasets into their packages. But the check is interesting because it's checked on the installed package, so in our case I want to investigate this more, but it's not a problem caused by this PR. The NOTE is triggered by an installed package over 5.0 MB and I think we were juuuust under that on Mac and linux before. This is one of those weird NOTEs that CRAN will sometimes ignore if you explain it to them. For now @btrotta , just change this line to 4 allowed notes: https://github.com/microsoft/LightGBM/blob/master/.ci/test_r_package.sh#L94. This doesn't need to block this PR and I can address it with a more long-term answer in a separate PR. For what it's worth, I see it on #2936 s well. |
@jameslamb thanks, that fixed it! It's now failing because of a broken link in the docs (not related to this PR). |
Ha sorry, we have a few annoying things happening in our CI this week. The |
@btrotta as soon as I wrote that, I went to review that PR and saw that it could be merged! If you rebase to |
// desc = if `path_smooth > 0` then `min_data_in_leaf` must be at least `2`. | ||
// desc = larger values give stronger regularisation | ||
// descl2 = the weight of each node is `(n / path_smooth) * w + w_p / (n / path_smooth + 1)`, where `n` is the number of samples in the node, `w` is the calculated node weight, and `w_p` is the weight of the parent node | ||
double path_smooth = 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
more details about w
and w_p
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some more explanation, hope it's clearer now.
if (USE_MAX_OUTPUT) { | ||
if (max_delta_step > 0 && std::fabs(ret) > max_delta_step) { | ||
return Common::Sign(ret) * max_delta_step; | ||
ret = Common::Sign(ret) * max_delta_step; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it seems this could be moved outside too.
double gain_shift; | ||
if (USE_SMOOTHING) { | ||
gain_shift = GetLeafGainGivenOutput<USE_L1>( | ||
sum_gradient, sum_hessian, meta_->config->lambda_l1, meta_->config->lambda_l2, parent_output); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@btrotta could you double confirm here? if this is correct, could you add some detailed comments to make it clear?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, this was not quite correct. Have made a slight change, let me know if it's still unclear.
} | ||
if (USE_SMOOTHING) { | ||
ret = ret * (num_data / smoothing) / (num_data / smoothing + 1) \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe only apply this when smoothing > kEpsilon
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have done the check for smoothing > kEpsilon
outside this function (e.g. in FuncForNumricalL2
and FuncForCategoricalL1
) , and I only pass the template parameter USE_SMOOTHING=true
if smoothing > kEpsilon
. Alternatively we could move the check inside CalculateSplittedLeafOutput
, and then we wouldn't need the template parameter USE_SMOOTHING
. This would be simpler but we would lose the speedup that the template provides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
USE_SMOOTHING
may be useless for this function. the soothing output can be simplified as below。
weight for current output is w = n/(n+s)
and ret = w * ret + (1-w) * parent_out
. n
must be greater zero. s=0
means no path smoothing.
meta_->config->max_delta_step); | ||
double current_gain; | ||
bool use_smoothing = meta_->config->path_smooth > kEpsilon; | ||
if (use_smoothing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
for these GatherInfoxxx
functions, maybe you can use GetLeafGain<true, true, true>
, and check `path_smooth > kEpsilon' inside?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
okay, maybe we can make GatherInfoxxx
to template function too. For example, template<bool USE_SMOOTHING> GatherInfoForThresholdNumericalInner
. And check the use_smoothing in GatherInfoForThresholdNumerical. By this way, we can reduce these if ... else.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed as suggested.
Thanks @btrotta ! |
meta_->config->max_delta_step); | ||
double current_gain; | ||
bool use_smoothing = meta_->config->path_smooth > kEpsilon; | ||
if (use_smoothing) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there are lot of if(use_smoothing)
in code.
if(use_smoothing){
func<true, true, true>();
}else {
func<true, true, false>();
}
it's better use func<true, true, use_smoothing>();
instead
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This gives a compiler error because the value of use_smoothing
isn't known at compile time.
double output_without_split = CalculateSplittedLeafOutput<USE_L1, USE_MAX_OUTPUT, USE_SMOOTHING>( | ||
sum_gradient, sum_hessian, meta_->config->lambda_l1, meta_->config->lambda_l2, | ||
meta_->config->max_delta_step, meta_->config->path_smooth, num_data, parent_output); | ||
double gain_shift = GetLeafGainGivenOutput<USE_L1>( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
may be a new function that CalculateSplittedLeafOutput
then GetLeafGainGivenOutput
? it seems some other places also use this.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I realised we can just use the existing function GetLeafGain
here.
@jameslamb please take a look
|
@guolinke Can you please check your DM in Slack? |
@jameslamb Created #2986 to not pollute this PR. |
@StrikerRUS Thanks! |
@guolinke Yes that would be great! Thanks! |
it this ready to merge? |
@guolinke The CI is still failing, I'm not sure how to fix it. |
@jameslamb Thanks, that has fixed the problem from before. I also fixed some linting errors. But now I'm getting a new failure which wasn't happening previously:
|
It was a random network conection issue during conda installation:
I re-run that job and now everything is green. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.ci/test_r_package.sh
Outdated
@@ -98,7 +98,7 @@ if grep -q -R "WARNING" "$LOG_FILE_NAME"; then | |||
exit -1 | |||
fi | |||
|
|||
ALLOWED_CHECK_NOTES=2 | |||
ALLOWED_CHECK_NOTES=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Should be 3, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes @btrotta can you please change to 3?
A lot has changed in our CI since this PR was first opened, sorry 😬
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ALLOWED_CHECK_NOTES=4 | |
ALLOWED_CHECK_NOTES=3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb Is a similar increment needed for Windows?
LightGBM/.ci/test_r_package_windows.ps1
Line 102 in 2c18a0f
$ALLOWED_CHECK_NOTES = 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No worries, fixed now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ping @jameslamb for Windows updates
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@StrikerRUS nothing is required on this PR since the Windows R tests are passing. We have it set to 3 and are generating exactly 3:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@jameslamb OK. I just was afraid that we can hit #2950 (comment) on Windows if let say the current size is 4.99 during some runs it can be calculated as 5, like in #2988.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we do we can bump it back up and it wouldn't bother me. I just don't think this PR has to be the place where we deal with it. If CI is passing for the R package right now, nothing should be changed.
@StrikerRUS thanks for your comments, I've fixed those issues now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for your comments, I've fixed those issues now.
Many thanks! Please address one more comment about inconsistency in naming of new parameter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great to me, thanks @btrotta ! I'm going to leave just a Comment
review...my review shouldn't count towards a merge.
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
This implements the path smoothing idea proposed by @MaxHalford in #2790.
I implemented a slightly simpler version which does not use leaf depth. For each node (except the root), the output is calculated as
((n / s) * original_output + parent_output) / (n / s + 1)
wheren
is number of data in leaf,s
is a regularisation parameter (largers
means more smoothing),original_output
is the unsmoothed output, andparent_output
is the output of the parent node (which has itself been smoothed previously).The reasoning behind this is that it's similar to the Bayesian calculation of the posterior mean given a prior expectation (see, e.g. Section 2 here: http://www.ams.sunysb.edu/~zhu/ams570/Bayesian_Normal.pdf). The posterior estimate of the true mean (in our case, the leaf output) is the weighted average (weighted by sample size and the sample and prior variance) of the sample and prior mean (the parent output). (This is more of an analogy than a real model, since the leaf output isn't actually the mean of a sample.) The parameter
s
represents the expected ratio of child node variance to parent node variance; smalls
means we expect the ratio to be small, i.e. the data is not noisy and the model can explain a lot of the variance, so we don't need much regularisation.I haven't tested this much on real datasets. Based on the couple that I tried, it definitely improves regularisation, but doesn't seem to be dramatically different to existing regularisation methods. Given that it requires non-trivial code changes, it may not be worth the maintenance effort. Feel free to close this PR if so.