feat(Pipeline): `optimize()` #230

eddiebergman · 2024-01-22T22:15:20Z

Closes [Feat] Easy `pipeline.optimize(task_to_optimize, metric, **scheduler_arguments) #229

Alright, buckle up, this is a bigger one (780 lines added).

The objective of this PR was to enable pipeline.optimize(...) to support fastest defaults towards just, well evaluating a pipeline, while still enabling it to extend out further into the future and not be too locked up in terms of use case.

Get me HPO results quickly

Let's start with the simplest procedure:

pipline = ...
history = pipeline.optimize(...)  # <- What needs to go in here

The first thing is the .optimizer(target=...) which should be a function that given a Trial and Node (pipeline), should return a Trial.Report, formally: Callable[[Trial, Node], Trial.Report]. This is the users target function to use HPO nomeclature.

The target= can also be an EvaluationProtocol, which will be used to define pre-defined flows so to speak. For example, a future addition would allow something like:

pipeline.optimize(
    target=SklearnCVProtocol(X, y, splitter=...),  # User can use predefined evaluation protocols
    ...
)

Choosing an Optimizer and Creating one

The next parts revolve around the optimizer, namely optimizer= to select one, the seed=, working_dir= and metrics= it will expect. These constitue the changes to the Optimizer class. By default, it's None and just tries to get some optimizer installed and use that. There could be extra work to also detect what's compatible with the pipeline, but if they're using a custom seearch space defintion, then they likely will be able to pass in the specific optimizer they wanted anywho.

Scheduling

Next things to consider were about task execution, namely around the Scheduler and running it. It just uses n_workers: int = 1 and scheduler: Scheduler | None = None to mean that it runs the opimization in local processes with 1 worker. Both of these parameters can be freely changed though. There are also a host of parameters related to error control, plugins and max_trials= and more which are documented in the code and understandable through there.

Finer Control

The last major part to understand is setup_only: bool = False. We assume the default behaviour is as the code snippet above, "just get me some HPO runs". However more advanced users (me) might want to have multiple pipelines setup to run HPO, but I do not want to start them yet, and only care more about setting up the entire flow.

pipeline = ...
scheduler = pipeline.optimizer(..., setup_only=True)

In this case, the pipeline is ready to be optimized and will run as soon as I call scheduler.run() where notably, I am now in control of the scheduler and the run() call.

Other use cases considered:

You want to use your own History object which may be shared between pipeline optimization runs. optimize(history=my_history). This won't lock or cause race conflicts as all callbacks happen async in the main process.
You may want custom callbacks, this is handled through on_begin which allows you to do a host of custom things which have been documented.

src/amltk/optimization/optimizer.py

LennartPurucker

Minor points, but one big one: does this have no tests?

src/amltk/_util.py

src/amltk/pipeline/node.py

eddiebergman · 2024-01-23T21:28:19Z

Not yet, I wanted to flesh it out properly first before testing it. Also the docs were taking to long to render so I have a PR that fixes that first, coming up

eddiebergman · 2024-01-23T21:29:46Z

Any advice on what to test would be appreciated. It's the weird kind of function which merely pieces things together but doesn't implement much, i.e. I'd be mainly testing that the matches do indeed match, the one thing being concrete would be the heuristic.

More concretely, what should I test specifically about this function that wouldn't be caught by the tests of its individual pieces?

aron-bram

Easy to read and clean!
I don't actually see what you would need to test here to be honest, assuming that the classes in use here are already tested individually, which they seem to be.

src/amltk/evalutors/evaluation_protocol.py

src/amltk/optimization/optimizer.py

context

eddiebergman · 2024-01-26T16:47:39Z

Thank you for the reviews @LennartPurucker @aron-bram! I made changes based on your comments.

Feel free to review post-mortem for any issues but after adding some tests and verifying it seems to be working as intended, I will merge once the automated tests pass.

Note: Ignore what I said about testing... I found small variable name bugs which caused behaviors not to occur. Live by the tests, die by the tests

aron-bram · 2024-01-26T21:06:22Z

src/amltk/optimization/optimizer.py

@@ -146,8 +146,7 @@ def create(
        !!! note

            Subclasses should override this with more specific configuration
-            but these 3 arguments should be all that's necessary to create
-            the optimizer.
+            but these arguments should be all that's necessary to create the optimizer.


Future proofing docs 🙂

aron-bram · 2024-01-26T21:09:08Z

src/amltk/pipeline/node.py

@@ -1083,7 +1083,7 @@ def register_optimization_loop(  # noqa: C901, PLR0915, PLR0912
                walltime_limit=process_walltime_limit,
                cputime_limit=process_cputime_limit,
            )
-            plugins = (*_plugins, plugin)
+            _plugins = (*_plugins, plugin)


I also tend to easily make these type of mistakes...

eddiebergman self-assigned this Jan 22, 2024

eddiebergman requested review from LennartPurucker, aron-bram and vladislavalerievich January 22, 2024 22:17

eddiebergman added the feature A new feature label Jan 22, 2024

eddiebergman commented Jan 22, 2024

View reviewed changes

src/amltk/optimization/optimizer.py Outdated Show resolved Hide resolved

src/amltk/optimization/optimizer.py Show resolved Hide resolved

LennartPurucker requested changes Jan 23, 2024

View reviewed changes

src/amltk/_util.py Outdated Show resolved Hide resolved

src/amltk/pipeline/node.py Outdated Show resolved Hide resolved

src/amltk/pipeline/node.py Show resolved Hide resolved

aron-bram reviewed Jan 25, 2024

View reviewed changes

eddiebergman added 9 commits January 26, 2024 16:32

feat(Pipeline): optimize() directly on a pipeline

eaa5082

doc: Fixup doc links

4a502b7

refactor(pipeline): Factor out setting up the optimization loop

66313fb

typing(History): Correctly type __getitem__

a713d2e

ux(task): print plugins in task __repr__

626afe2

feat(rich): is_jupyter() function to detect if running in display

ef9f33b

context

feat(scheduler): "auto" to display run() in notebook by default

7fa02bc

feat(pipeline): propogate "auto" for display to optimize()

f27db12

fix(pipeline): Correctly add threadpool ctl to plugins

6874f42

eddiebergman force-pushed the feat-pipeline-optimizer branch from 908b4d8 to 6874f42 Compare January 26, 2024 16:36

eddiebergman added 3 commits January 26, 2024 17:36

doc(optimizer): Correct counting typo (grrr)

edd24b9

doc: More verbosity in description of threadpool heuristic

377abab

doc(evaluation): Better doc description

24a2f84

fix(pipeline): Another plugin varname bug -_-

8b69e40

eddiebergman merged commit bded378 into main Jan 26, 2024
6 checks passed

eddiebergman deleted the feat-pipeline-optimizer branch January 26, 2024 17:02

eddiebergman mentioned this pull request Jan 26, 2024

[Doc] Document node.optimize() from #230 #235

Open

aron-bram reviewed Jan 26, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(Pipeline): `optimize()` #230

feat(Pipeline): `optimize()` #230

eddiebergman commented Jan 22, 2024 •

edited

Loading

LennartPurucker left a comment

eddiebergman commented Jan 23, 2024

eddiebergman commented Jan 23, 2024 •

edited

Loading

aron-bram left a comment

eddiebergman commented Jan 26, 2024 •

edited

Loading

aron-bram Jan 26, 2024

aron-bram Jan 26, 2024

feat(Pipeline): optimize() #230

feat(Pipeline): optimize() #230

Conversation

eddiebergman commented Jan 22, 2024 • edited Loading

Get me HPO results quickly

Choosing an Optimizer and Creating one

Scheduling

Finer Control

LennartPurucker left a comment

Choose a reason for hiding this comment

eddiebergman commented Jan 23, 2024

eddiebergman commented Jan 23, 2024 • edited Loading

aron-bram left a comment

Choose a reason for hiding this comment

eddiebergman commented Jan 26, 2024 • edited Loading

aron-bram Jan 26, 2024

Choose a reason for hiding this comment

aron-bram Jan 26, 2024

Choose a reason for hiding this comment

feat(Pipeline): `optimize()` #230

feat(Pipeline): `optimize()` #230

eddiebergman commented Jan 22, 2024 •

edited

Loading

eddiebergman commented Jan 23, 2024 •

edited

Loading

eddiebergman commented Jan 26, 2024 •

edited

Loading