-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TPOT freezing at 0% with n_jobs >4 on linux with large dataset #876
Comments
It seems that there is a kind of threading deadlock issue (maybe related to this old issue in joblib). Could you please try to update joblib (> 0.13.2) and scikit-learn (>=0.21) via conda or pip and reinstall TPOT development branch via the command below? pip install --upgrade --no-deps --force-reinstall git+https://github.com/EpistasisLab/tpot.git@development We recently noticed that the internal joblib module (based on a older version of joblib) in scikit-learn (<0.20) was deprecated (see #867) and may cause the issue here because it did not have some important updates about limiting the number of threads in joblib (>0.12, see joblib change log). LMK if this solution works or now. |
Thanks for the quick reply! I installed the development branch and tried it again with joblib == 0.13.2 and scikit-learn >=0.21, but unfortunately it is still freezing. |
However, I just tried it again and now it is stuck at 5% (54/1050). |
Changing the value of DEFAULT_THREAD_BACKEND = 'threading' to e.g. 'loky' in parallel.py of joblib worked for me. |
Not working. It runs only when n_jobs is set to 1 |
@Chowkah Could you please talk about how can you reach 5%. I still stuck at 0% |
I cant really nail it down to a point, I tried several different things, sometimes it was working, sometimes not. I changed the parallel backend directly in the parallel.py of joblib which sometimes helped. Additionally I changed my random seed to some other value and with the same setting it was working. So the problem might be related to a specific algorithm (maybe just with some specific parameter setting) that makes TPOT freeze. However, I was not able to identify which one it might be. |
Thank you, maybe I should start with examples in official doc, make a few changes every time and see what will happen. |
Hello,
I am trying to get TPOT running since a while now but always encounter the same errro. I have a linux machine with 24 kernels. When I run TPOT on a large dataset (~6mio rows, ~20 features) it freezes at 0% and after about 10-20 minutes the CPU goes down to a few percent. I already tried setting the multiprocessing to forkserver without any changes. I also tried the dask implementation, but since the max_eval_time_mins does not seem to work there, it runs forever.
However, the problem does not occur when n_jobs != 1 but just if n_jobs > 4. I do not really know what else to try and I would appreciate any suggestions.
Thanks!
The text was updated successfully, but these errors were encountered: