Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

perf: improve lightgbm training performance 4x-10x by setting num_threads to be cores-1 by default for single dataset mode #1282

Merged
merged 2 commits into from
Dec 3, 2021

Conversation

imatiach-msft
Copy link
Contributor

In benchmarking, it was discovered that lightgbm training time could be reduced 4X-10X by setting the num_threads to be equal to the # of machine cores - 1 for the single dataset mode (which is now the default mode).
This actually was already suggested in the lightgbm docs:
https://lightgbm.readthedocs.io/en/latest/Parameters.html#num_threads

for distributed learning, do not use all CPU cores because this will cause poor performance for the network communication

Poor network communication can lead to very slow training execution time.

On one scenario with a 37 GB dataset on disk with parameters:
learning_rate = 0.1, num_leaves = 768, num_trees = 1000, min_data_in_leaf = 15000, max_bin = 512
the training time without setting num_threads was 5.5 hours, while setting num_threads to (number of machine cores)-1 reduced the training time to 1.2 hours. The change in performance will vary depending on the parameters used.

This PR sets the distributed lightgbm number of threads by default to be (num cores)-1 if the parameter is not specified by the user. The user can still override the parameter.

@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Dec 2, 2021

Codecov Report

Merging #1282 (a52e0f8) into master (6ea8a9a) will increase coverage by 0.04%.
The diff coverage is 100.00%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #1282      +/-   ##
==========================================
+ Coverage   83.33%   83.37%   +0.04%     
==========================================
  Files         300      300              
  Lines       13828    13830       +2     
  Branches      672      675       +3     
==========================================
+ Hits        11523    11531       +8     
+ Misses       2305     2299       -6     
Impacted Files Coverage Δ
...osoft/azure/synapse/ml/lightgbm/LightGBMBase.scala 94.94% <100.00%> (+0.05%) ⬆️
...azure/synapse/ml/lightgbm/LightGBMClassifier.scala 91.11% <100.00%> (ø)
...oft/azure/synapse/ml/lightgbm/LightGBMRanker.scala 64.17% <100.00%> (ø)
.../azure/synapse/ml/lightgbm/LightGBMRegressor.scala 74.13% <100.00%> (ø)
...crosoft/azure/synapse/ml/lightgbm/TrainUtils.scala 85.98% <0.00%> (+2.54%) ⬆️
...crosoft/azure/synapse/ml/io/http/HTTPClients.scala 86.66% <0.00%> (+3.33%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 6ea8a9a...a52e0f8. Read the comment docs.

@imatiach-msft
Copy link
Contributor Author

An added benefit seems to be that the lightgbm tests finish faster as well - the lightgbm1 unit tests finished in 11 min in this build and took 33 min in previous build 😅

@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants