-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat(Pipeline): optimize()
#230
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Minor points, but one big one: does this have no tests?
Not yet, I wanted to flesh it out properly first before testing it. Also the docs were taking to long to render so I have a PR that fixes that first, coming up |
Any advice on what to test would be appreciated. It's the weird kind of function which merely pieces things together but doesn't implement much, i.e. I'd be mainly testing that the matches do indeed match, the one thing being concrete would be the heuristic. More concretely, what should I test specifically about this function that wouldn't be caught by the tests of its individual pieces? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Easy to read and clean!
I don't actually see what you would need to test here to be honest, assuming that the classes in use here are already tested individually, which they seem to be.
908b4d8
to
6874f42
Compare
Thank you for the reviews @LennartPurucker @aron-bram! I made changes based on your comments. Feel free to review post-mortem for any issues but after adding some tests and verifying it seems to be working as intended, I will merge once the automated tests pass. Note: Ignore what I said about testing... I found small variable name bugs which caused behaviors not to occur. Live by the tests, die by the tests |
@@ -146,8 +146,7 @@ def create( | |||
!!! note | |||
|
|||
Subclasses should override this with more specific configuration | |||
but these 3 arguments should be all that's necessary to create | |||
the optimizer. | |||
but these arguments should be all that's necessary to create the optimizer. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Future proofing docs 🙂
@@ -1083,7 +1083,7 @@ def register_optimization_loop( # noqa: C901, PLR0915, PLR0912 | |||
walltime_limit=process_walltime_limit, | |||
cputime_limit=process_cputime_limit, | |||
) | |||
plugins = (*_plugins, plugin) | |||
_plugins = (*_plugins, plugin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I also tend to easily make these type of mistakes...
Alright, buckle up, this is a bigger one (780 lines added).
The objective of this PR was to enable
pipeline.optimize(...)
to support fastest defaults towards just, well evaluating a pipeline, while still enabling it to extend out further into the future and not be too locked up in terms of use case.Get me HPO results quickly
Let's start with the simplest procedure:
The first thing is the
.optimizer(target=...)
which should be a function that given aTrial
andNode
(pipeline), should return aTrial.Report
, formally:Callable[[Trial, Node], Trial.Report]
. This is the users target function to use HPO nomeclature.The
target=
can also be anEvaluationProtocol
, which will be used to define pre-defined flows so to speak. For example, a future addition would allow something like:Choosing an Optimizer and Creating one
The next parts revolve around the optimizer, namely
optimizer=
to select one, theseed=
,working_dir=
andmetrics=
it will expect. These constitue the changes to theOptimizer
class. By default, it'sNone
and just tries to get some optimizer installed and use that. There could be extra work to also detect what's compatible with the pipeline, but if they're using a custom seearch space defintion, then they likely will be able to pass in the specific optimizer they wanted anywho.Scheduling
Next things to consider were about task execution, namely around the
Scheduler
and running it. It just usesn_workers: int = 1
andscheduler: Scheduler | None = None
to mean that it runs the opimization in local processes with1
worker. Both of these parameters can be freely changed though. There are also a host of parameters related to error control, plugins andmax_trials=
and more which are documented in the code and understandable through there.Finer Control
The last major part to understand is
setup_only: bool = False
. We assume the default behaviour is as the code snippet above, "just get me some HPO runs". However more advanced users (me) might want to have multiple pipelines setup to run HPO, but I do not want to start them yet, and only care more about setting up the entire flow.In this case, the pipeline is ready to be optimized and will run as soon as I call
scheduler.run()
where notably, I am now in control of the scheduler and therun()
call.Other use cases considered:
History
object which may be shared between pipeline optimization runs.optimize(history=my_history)
. This won't lock or cause race conflicts as all callbacks happen async in the main process.on_begin
which allows you to do a host of custom things which have been documented.