-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[dask] DaskLGBMClassifier very slow and not using CPU #3797
Comments
changed to
|
Also, the accuracy is really low:
should be around 0.76 (plain lightgbm). |
with n_workers=1, threads_per_worker=16, npartitions=16
|
n_workers=4, threads_per_worker=4, npartitions=16
|
@jameslamb In the example above there might be some issues (possible data leakage from train to test due to Dask partitions because of the the way I lump train and test together to create a consistent label encoding and then I partition the lumped data) - a better way to do this is to do the integer encoding outside of Dask and read train and test separately in Dask: Plain lightgbm (no Dask):
results:
With dask:
Results:
It is still slow and CPU % is low (so same as before), but it's a better way to look at when we compare AUCs. Logs from Dask run:
Diagnostics:
|
With 1 worker, 16 threads, 16 partitions it's OK:
very similar to plain lightgbm:
|
Also 4 workers, 4 threads each, 16 partitions:
So maybe this bug is not such a big deal. |
Changing number of workers, threads, partitions:
|
Thanks for the report! I can look into this more crefully in a few days. Right now we're focusing on other things in the Dask interface. One possibility for your consideration...Dask will start spilling to disk when a worker's memory utilization approaches 60% (https://distributed.dask.org/en/latest/worker.html#memory-management). This can drastically slow down processing. It's possible that in the |
Since this works for everything but the N workers - 1 thread for each worker case now, I don't think that fixing this should be a huge priority. If you look above in the screenshot, the RAM utilization while training was about 4%. I also included |
Hi @szilard. I only have 8 cpus on my laptop but these are the results I get with the current master:
I also see high cpu usage. Are you still able to reproduce this issue? |
Awesome, sounds great. I'll check. Is this included now on the latest release I can install with pip? |
Not yet, I can ping you here once 3.2 is released |
Sounds good. Based on your results the issue should be resolved, but once the new release it out, I'll check again. |
Release |
This issue has been automatically closed because it has been awaiting a response for too long. When you have time to to work with the maintainers to resolve this issue, please post a new comment and it will be re-opened. If the issue has been locked for editing by the time you return to it, please open a new issue and reference this one. Thank you for taking the time to improve LightGBM! |
@szilard this issue was closed today by a bot we use to close issues that are |
@jameslamb Sure, thanks. I think it's fixed based on @jmoralez 's results above. |
This issue has been automatically locked since there has not been any recent activity since it was closed. To start a new related discussion, open a new issue at https://github.com/microsoft/LightGBM/issues including a reference to this. |
Using @jameslamb 's Dockerfile to set up dask+lightgbm:
Then run this code:
It runs very slowly (>30minutes vs regular lightgbm in <4 seconds) and also not using CPUs while running
For comparison regular lightgbm:
runs in 3.7 seconds.
The text was updated successfully, but these errors were encountered: