Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow TPOT to run for the shortest (or longest) between generations and max_time_mins #504

Open
PGijsbers opened this issue Jun 23, 2017 · 7 comments

Comments

@PGijsbers
Copy link
Contributor

PGijsbers commented Jun 23, 2017

When I set-up an experiment, I often find myself wanting to execute a set number of generations, but cut off if it exceeds X minutes.
For example, I want to run TPOT for 100 generations, but only use at most 60 minutes of time.
I can also imagine someone would want to do the exact opposite; run TPOT for at least 100 generations, but also run for at least X minutes of time.

As is, you can only pick between a set maximum of generations or a certain time.
When both are provided, the set amount of generations actually does nothing, and TPOT is run for max_time_mins (this is explicit in the code).

Besides the lack of above options, it also bothers me that max_time_mins does not so much specify a maximum time, but a set time for which the experiment will run (and, practically speaking, never shorter).

The changes are very minor in code (I did this for my own versions), but I wanted to discuss if this is a good addition, and how this should change the constructor parameters.

@PGijsbers
Copy link
Contributor Author

I am not sure what the best way to do this is.
The first thing that comes to mind is redesigning the parameters this way: replace max_time_mins with duration_mins and add stop_at_first_criteria (bool)

If either only duration_mins or generations is specified, then it will run for X minutes or generations, respectively (regardless of stop_at_first_criteria).

When both duration_mins are generations is specified and stop_at_first_criteria is True, then
it will stop when duration_mins have elapsed, or generations generations have been evaluated, whichever comes first.

When both duration_mins are generations is specified and stop_at_first_criteria is False, then
it will stop when duration_mins have elapsed and generations generations have been evaluated.

Alternatively don't rename max_time_mins to duration_mins, so it will not break code.
The problem with max_time_mins is that the name would only be accurate with stop at first criterion, not stop last criterion.

@weixuanfu
Copy link
Contributor

Thank you for your suggestion. I agree with the scenario when stop_at_first_criteria is True. It is a little confused about the scenario when stop_at_first_criteria is False. I feel that the parameter stop_at_first_criteria is not very clear. @rhiever do you think that we should add this function?

@rhiever
Copy link
Contributor

rhiever commented Jun 23, 2017

You're indeed right that the generations and max_time_mins parameters are in conflict with one another. When we implemented max_time_mins, we purposely decided to have max_time_mins override generations because max_time_mins is a more practical way to tell TPOT how long it has to run its optimization procedure, whereas generations is more for users like ourselves who want to run fixed-evaluation-count experiments because we're comparing optimization methods.

I think it's technically possible to achieve your goal by setting max_eval_time_mins to (max_time_mins / (population_size x generations) ). If max_eval_time_mins is indeed killing the evaluations on time, then the run should take no longer than the desired maximum amount of time, and will still run for the desired number of generations.

If we can figure out a good way for these parameters to interact, I'm not opposed to tweaking it slightly. But I'd very much prefer to avoid adding another parameter.

Perhaps what we can do is have max_time_mins not override generations. If the user only sets max_time_mins=30 and leaves the rest of the TPOT parameters as default, then TPOT will run for 100 generations of 100 population and only be interrupted if the process takes >=30 minutes.

@PGijsbers
Copy link
Contributor Author

Perhaps what we can do is have max_time_mins not override generations. If the user only sets max_time_mins=30 and leaves the rest of the TPOT parameters as default, then TPOT will run for 100 generations of 100 population and only be interrupted if the process takes >=30 minutes.

And for user specified number of generations, combined with max_time_mins, just have the current behavior? In that case I would just leave it as is and have the behavior be consistent.

But you are probably right in that specifying by generations is probably not that important in practice (compared to specifying by time).

@rhiever
Copy link
Contributor

rhiever commented Jun 24, 2017

In my proposed solution, if the user specifies max_time_mins and generations, then TPOT would quit either when max_time_mins is exceeded or generations iterations have passed. So TPOT could take less than max_time_mins if it completes generations iterations before that time limit, i.e., it won't continue until max_time_mins has elapsed as it currently does.

@PGijsbers
Copy link
Contributor Author

Okay, I understood that wrong then.
Does this take the ability away for a user to specify a certain duration (and not generations)?
Or would you for example then maybe allow an extra generations value (none or so) to indicate max_time_mins should be the only stopping criteria?

@rhiever
Copy link
Contributor

rhiever commented Jun 27, 2017

Or would you for example then maybe allow an extra generations value (none or so) to indicate max_time_mins should be the only stopping criteria?

That seems like a good idea to support both use cases.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants