Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: lightgbm getting stuck when empty partition is chosen as the main worker in singleDatasetMode #1458

Merged

Conversation

imatiach-msft
Copy link
Contributor

@imatiach-msft imatiach-msft commented Mar 30, 2022

Summary

Lightgbm can get stuck when SingleDatasetMode is enabled and if an empty partition is, by random chance, chosen as the main worker in the executor.
The fix is to only allow non-empty partitions to be chosen as the main worker.

Tests

Run the test "Verify LightGBM Classifier won't get stuck on empty partitions" in the file:
https://github.com/microsoft/SynapseML/blob/master/lightgbm/src/test/scala/com/microsoft/azure/synapse/ml/lightgbm/split1/VerifyLightGBMClassifier.scala#L610
many times, by random chance it will often get stuck after several runs.

Dependency chances

No dependency changes.

@imatiach-msft imatiach-msft force-pushed the ilmat/fix-lightgbm-stuck branch from 8036c82 to 375f08f Compare March 30, 2022 05:02
@imatiach-msft imatiach-msft changed the title Fix lightgbm getting stuck when empty partition is chosen as the main worker in singleDatasetMode fix: lightgbm getting stuck when empty partition is chosen as the main worker in singleDatasetMode Mar 30, 2022
@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@codecov-commenter
Copy link

codecov-commenter commented Mar 30, 2022

Codecov Report

Merging #1458 (aca0398) into master (3eb4661) will decrease coverage by 0.09%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master    #1458      +/-   ##
==========================================
- Coverage   84.41%   84.31%   -0.10%     
==========================================
  Files         295      288       -7     
  Lines       14761    14633     -128     
  Branches      705      709       +4     
==========================================
- Hits        12460    12338     -122     
+ Misses       2301     2295       -6     
Impacted Files Coverage Δ
...osoft/azure/synapse/ml/lightgbm/LightGBMBase.scala 95.78% <100.00%> (+0.03%) ⬆️
...rosoft/azure/synapse/ml/lightgbm/SharedState.scala 89.28% <100.00%> (+0.82%) ⬆️
...zure/synapse/ml/lightgbm/TaskTrainingMethods.scala 100.00% <100.00%> (ø)
...org/apache/spark/ml/param/JsonEncodableParam.scala 55.55% <0.00%> (-27.78%) ⬇️
...crosoft/azure/synapse/ml/io/http/HTTPClients.scala 67.64% <0.00%> (-8.83%) ⬇️
...g/apache/spark/ml/param/PythonWrappableParam.scala 66.66% <0.00%> (-8.34%) ⬇️
...re/src/main/python/synapse/ml/core/schema/Utils.py 67.10% <0.00%> (-5.27%) ⬇️
...soft/azure/synapse/ml/cognitive/TextToSpeech.scala 84.84% <0.00%> (-3.04%) ⬇️
...oft/azure/synapse/ml/cognitive/TextAnalytics.scala 86.20% <0.00%> (-2.69%) ⬇️
.../azure/synapse/ml/cognitive/TextAnalyticsSDK.scala 86.01% <0.00%> (-1.40%) ⬇️
... and 9 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 3eb4661...aca0398. Read the comment docs.

@imatiach-msft imatiach-msft force-pushed the ilmat/fix-lightgbm-stuck branch from 375f08f to cf55dbb Compare March 31, 2022 04:43
@imatiach-msft imatiach-msft force-pushed the ilmat/fix-lightgbm-stuck branch from cf55dbb to aca0398 Compare March 31, 2022 04:46
@imatiach-msft
Copy link
Contributor Author

/azp run

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@mhamilton723 mhamilton723 merged commit 63c1235 into microsoft:master Mar 31, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants