-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
max_eval_time_mins parameter doesn't stop a long-running eval #508
Comments
Thank you for suggestion. I will look into it. One of my branches uses |
@weixuanfu2016's PR that should fix this issue is merged on to the development branch. @dnuffer, can you try the dev branch and let us know if that corrects your issue? |
I tried the development branch, and it didn't fix the issue. I'm think it's probably because the stopit module is a pure python solution and so can only interrupt a thread once it runs some python code, and since the core of most ml training algorithms is written non-python code, stopit won't get a chance to interrupt the thread. I also tried my earlier suggestion, but it doesn't work because a pool of processes is used, and a process doesn't exit once an evaluation is complete, leaving the threads running. I have successfully used the timeout feature in hyperopt-sklearn, and so I dug into how it works. |
Hmm, interesting. Thank you for these tests. I will look into it. |
I can report similar issues. Using the development branch also does not fix it. |
I just posted a PR #522 and use the way in
@rhiever For this way, I need use One drewback is that CTRL+C only works in Linux and Mac but not in Windows. So I add a warning message about it. |
@weixuanfu2016 Tested on OSX 10.12 and Ubuntu 14.04 with high dimensionality dataset (I think poly features was getting stuck), looks good so far. Will update if it creeps back in. |
The timeout_pipe branch has fixed the issue for me. |
@dnuffer @CSNoyes Thank you for feedbacks. @rhiever and I had second thoughts about this issue. We thought this issue might be related to the start methods in multiprocessing. I also reproduced the freezing issue when n_jobs >1 in MacOS and Linux but it seems everything is all right when n_jobs = 1. @dnuffer @CSNoyes Could you please let me know the sys environment, python version and n_jobs settings when this issue happened before? Thanks. |
I tested the solution of |
I have been using ubuntu 17.04 with python 3.5.3. I've been mostly using n_jobs=22. I am using a dataset with dimensionality of ~17000. |
Below is the demo for using
|
The PR branch worked from me for the big dataset ( approx 600 MB). The 0.8/0.9 branch freezes. |
@jaksmid did 0.9 version freezes with The reason why I closed that PR is that it did not save computation time with n_jobs > 1 in my tests. |
Thanks @weixuanfu for the speedy response. If I add the
lines it seems to be working. Otherwise it utilises all cores to 100%. After some time the CPU consumption per core drops to zero with no observable progress. Memory pressure does not see to be a problem. Using python 3.6.0 in the virtualenv on Mac Os Sierra. Please let me know if you need further information. |
In my experiments, tpot still ignores the While I am able to stop the process by using the I am using version 0.9.3 of tpot, python 3.6.2 and OSX 10.13.6. tpot runs in single thread mode ( Please let me know if you need any further infos. |
During
fit()
, I noticed that some evaluations were running for over an hour even though themax_eval_time_mins=5
.This is because the code is not actually stopping the thread doing the eval.
In
class Interruptable_cross_val_score
,stop()
is not actually interrupting the thread, but instead callingself._stopevent.set()
, which doesn't stop the thread because nothing is checking that event, and then waiting for the thread to stop on its own.Python doesn't have a good way to interrupt threads. See the discussion at https://stackoverflow.com/questions/323972/is-there-any-way-to-kill-a-thread-in-python
Given that
_wrapped_cross_val_score()
is already running in a separate process due to joblib, one solution would be to makeInterruptable_cross_val_score
a daemon thread, and then remove the call totmp_it.stop()
in_wrapped_cross_val_score()
. Thus when the timeout passes, the process will exit cleanly.The text was updated successfully, but these errors were encountered: