Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature request: put the fit time in evaluated_individuals_ #780

Open
louisabraham opened this issue Oct 7, 2018 · 5 comments
Open

Feature request: put the fit time in evaluated_individuals_ #780

louisabraham opened this issue Oct 7, 2018 · 5 comments

Comments

@louisabraham
Copy link

It would be handy. GridSearchCV does it for example.
I also think I encountered some strange pipelines that did not stop after max_eval_time_mins, and this would help me to reproduce the issue.

@weixuanfu
Copy link
Contributor

This is a old known issue for python.

The way CPython supports threading and asynchronous features has impacts on the accuracy of the timeout. For more background about this issue - that cannot be fixed - Please read Python gurus thoughts about Python threading, the GIL and context switching like these ones:

http://pymotw.com/2/threading/
https://wiki.python.org/moin/GlobalInterpreterLock

But I think it is a good idea to add this fit time into pipeline statistics.

@louisabraham
Copy link
Author

I think the relevant code is there

tpot/tpot/base.py

Lines 1236 to 1239 in 507b45d

parallel = Parallel(n_jobs=self._n_jobs, verbose=0, pre_dispatch='2*n_jobs')
tmp_result_scores = parallel(
delayed(partial_wrapped_cross_val_score)(sklearn_pipeline=sklearn_pipeline)
for sklearn_pipeline in sklearn_pipeline_list[chunk_idx:chunk_idx + chunk_size])

It seems you used the threading_timeoutable from stopit to handle the timeout. Why didn't you use instead the timeout parameter of joblib.Parallel?

@louisabraham
Copy link
Author

Oh, the timeout parameter of joblib.Parallel raises a timeout if any task lasts to long.

Would joblib/joblib#366 allow for a more precise time control?

@weixuanfu
Copy link
Contributor

Maybe, I will look into it. But two issues need attentions when using timeout in joblib:

  1. TPOT uses joblib in sklearn to avoid adding one more dependency, so we need watch if scikit-learn updates the built-in joblib.
  2. this timeout in joblib only works when n_jobs !=1. We need a workaround for this.

@louisabraham
Copy link
Author

  1. Maybe we should just integrate some custom joblib code?
  2. I'm not sure about what causes the issues with threads in the first place, but wouldn't multiprocessing.Process provide a timeout ability as well?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants