-
Notifications
You must be signed in to change notification settings - Fork 835
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282
perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282
Conversation
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
Codecov Report
@@ Coverage Diff @@
## master #1282 +/- ##
==========================================
+ Coverage 83.33% 83.37% +0.04%
==========================================
Files 300 300
Lines 13828 13830 +2
Branches 672 675 +3
==========================================
+ Hits 11523 11531 +8
+ Misses 2305 2299 -6
Continue to review full report at Codecov.
|
An added benefit seems to be that the lightgbm tests finish faster as well - the lightgbm1 unit tests finished in 11 min in this build and took 33 min in previous build 😅 |
…eads to be cores-1
0ffee85
to
6fb7fc4
Compare
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
/azp run |
Azure Pipelines successfully started running 1 pipeline(s). |
In benchmarking, it was discovered that lightgbm training time could be reduced 4X-10X by setting the num_threads to be equal to the # of machine cores - 1 for the single dataset mode (which is now the default mode).
This actually was already suggested in the lightgbm docs:
https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_threads
Poor network communication can lead to very slow training execution time.
On one scenario with a 37 GB dataset on disk with parameters:
learning_rate = 0.1, num_leaves = 768, num_trees = 1000, min_data_in_leaf = 15000, max_bin = 512
the training time without setting num_threads was 5.5 hours, while setting num_threads to (number of machine cores)-1 reduced the training time to 1.2 hours. The change in performance will vary depending on the parameters used.
This PR sets the distributed lightgbm number of threads by default to be (num cores)-1 if the parameter is not specified by the user. The user can still override the parameter.