-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use double precision in threaded calculation of linear tree coefficients (fixes #5226) #5368
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for fixing this! I left a few initial suggestions for your consideration.
@@ -109,6 +109,7 @@ include_directories(${EIGEN_DIR}) | |||
|
|||
# See https://gitlab.com/libeigen/eigen/-/blob/master/COPYING.README | |||
add_definitions(-DEIGEN_MPL2_ONLY) | |||
add_definitions(-DEIGEN_DONT_PARALLELIZE) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you please add this to the flags used by the R package as well?
LightGBM/R-package/configure.ac
Line 38 in 44fe591
LGB_CPPFLAGS="${LGB_CPPFLAGS} -DEIGEN_MPL2_ONLY" |
LightGBM/R-package/configure.win
Line 22 in 44fe591
LGB_CPPFLAGS="${LGB_CPPFLAGS} -DEIGEN_MPL2_ONLY" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change seems to be causing CI to fail; I think the problem is that I need to regenerate the configure
file in R-package
. Is there a way to do this on Windows?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For non-Windows, we have a comment-triggered CI job that will update configure
based on changes to configure.ac
. I'll trigger that job right now.
For configure.win
, nothing needs to be regenerated. configure.win
is executed directly, instead of being used as a template.
These things are documented at https://github.com/microsoft/LightGBM/tree/master/R-package#changing-the-cran-package but that README is fairly large so it's easy to miss.
fd = FileLoader(EXAMPLES_DIR / 'binary_classification', 'binary', | ||
'train_linear.conf') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would you consider just re-defining EXAMPLES_DIR
at the top of this file + using load_breast_cancer()
to get a binary classification dataset, the way that other tests in this file do?
I worry that having one test file import from another could cause issues for pytest
in the future (even though right now I don't notice any). There are not any other places in this project's Python tests today where one test_*.py
file imports from another.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed it to use some randomly generated data (I was unable to reproduce the failure with the breast cancer dataset).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah ok, I didn't realize that it could be dataset-specific. Sorry for creating extra work for you! I guess we could have also avoided "test file importing another test file" by moving FileLoader
to utils.py
. That would probably good to do anyway (in a separate PR).
Anyway, if the new randomly-generated data is sufficient to reproduce the underlying issue fixed by this PR, I'm good with it!
/gha run r-configure |
hmmm the run-configure job failed (build link)
I'll try one more run, and if it still fails I'll look into this later. Sorry for the disruption 😭 |
/gha run r-configure |
I see the issue! I believe that updating @btrotta I just regenerated Thankfully, autoconf only generated a change on one line. Can you please change this line to LGB_CPPFLAGS="${LGB_CPPFLAGS} -DEIGEN_MPL2_ONLY -DEIGEN_DONT_PARALLELIZE" Sorry again for the disruption. |
One other thing... as a maintainer you have permissions to push branches directly to LightGBM instead of using your fork. I recommend doing that in the future. |
@jameslamb Thanks for your help, I've updated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you!
@jameslamb If you're happy with the changes, could you please approve? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks a lot for the fix!
I'm leaving my "request changes" review and not re-reviewing until the issue that has been LightGBM's CI for the last few days (#5362 (comment)) is fixed. I don't want this to be merged until the CI is fixed and we run it over these changes one more time. Sorry for the delay. Hopefully that issue will be fixed soon. |
@jameslamb Ok, no problem. |
Maybe we can close and reopen this PR to rerun the ci? Since the ci tests are run with the merged version automatically. |
I personally prefer to merge the target branch (in this repo, To avoid this situation:
If CI passes for this particular PR I'm ok with merging it to keep making progress on all the other PRs, but please consider that for the future. |
@jameslamb Thank you. I will follow this rule in the future. |
It seems that all ci tests are passed. Maybe we can cancel the change request now? |
oh yep, sorry about that! Meant to come back and approve yesterday. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 thanks again for the help @btrotta !
No problem! I've found it helpful. There is definitely a trade-off there. Especially since our CUDA CI jobs take 1-2 hours to run and we can only have a single job running at a time across the whole repo, extra CI runs can slow down development in the whole project. My approach is:
So, for example, once #5384 builds successfully, I'll just merge it even though this PR was merged to |
This pull request has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Fixes #5266
When calculating the linear tree coefficients, we need to calculate some matrix products. This calculation is multi-threaded for efficiency. However, floating-point addition can give slightly different results depending on the order terms are added, and this was causing the calculated coefficients to vary depending on the number of threads. I've resolved this by making these matrices double precision instead of single, so the inaccuracies are less significant. This will not use much additional memory, since the size of these matrices is only
O(num_features ^ 2)
.I also added a preprocessor directive to make Eigen calls single-threaded (since Eigen is always called inside a for-loop that is already parallelized.