-
Notifications
You must be signed in to change notification settings - Fork 834
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Why it almost do not speedup with distributed learning? #1316
Comments
hi @JWenBin can you please try: In performance testing we saw big speedup with new single dataset mode and numThreads set to num cores -1. For more information on the new single dataset mode please see the PR description: This new mode was created after extensive internal benchmarking. |
Thank you for your Reply! I tried that just now, the speed improved a lot, but AUC and accuracy becomes too low (less than 0.6). It looks like if I use 'setUseSingleDatasetMode(true)', I should change my params at the same time. |
hi @JWenBin |
I deleted 'setUseBarrierExecutionMode(true)' while using ‘setUseSingleDatasetMode(true)’ and retrain the model again. My AUC returned to normal level. But I still don't know how 'setUseBarrierExecutionMode(true)' affect ‘setUseSingleDatasetMode(true)’ while trainning. |
First, I tried 2 spark slaves, it take about 11 minutes to train my model.
submit info: spark-submit --master yarn --num-executors 2 --executor-memory 19G --executor-cores 16 --conf spark.dynamicAllocation.enabled=false --jars s3://EMR/jars/synapseml-vw_2.12-0.9.4.jar,s3://EMR/jars/synapseml_2.12-0.9.4.jar,s3://EMR/jars/client-sdk-1.14.0.jar ......
Second, I tried only one spark slave, it take about 12 minutes to train my model.
submit info: spark-submit --master yarn --num-executors 1 --executor-memory 19G --executor-cores 16 --conf spark.dynamicAllocation.enabled=false --jars s3://EMR/jars/synapseml-vw_2.12-0.9.4.jar,s3://EMR/jars/synapseml_2.12-0.9.4.jar,s3://EMR/jars/client-sdk-1.14.0.jar ......
My results show that LightGBM do not speedup with distributed learning. And my CPU Utilization could be more than 95% on each spark slave! Why it almost do not speedup with distributed learning?
My cluster/data/code Info:
spark slave: 16 vCore, 32 GiB * 2;
spark version: Spark 3.1.2, Hive 3.1.2, ZooKeeper 3.5.7;
dependency: synapseml_2.12-0.9.4.jar;
train data set: 5377937 rows;
code:
Thanks in advance!
AB#1984488
The text was updated successfully, but these errors were encountered: