Optimizer Evaluation Tools #155

byte-sculptor · 2020-10-31T02:20:50Z

This PR does not involve big changes to the optimizer. We are mostly just testing it more thoroughly.

The bulk of this PR lives in two files:

OptimizerEvaluationReport.py
OptimizerEvaluator.py

Their usage is presented in the OptimizerEvaluationTools.ipynb Notebook. I highly encourage you to start your review by looking at that fully rendered notebook. You can view it here: https://github.com/microsoft/MLOS/blob/860b966849180fde2a5a6140323ab2700c36606f/source/Mlos.Notebooks/OptimizerEvaluationTools.ipynb

The goal of this work is to produce a framework to efficiently compare optimizer configurations across a variety of synthetic objective functions. It has already served it's purpose by allowing me to:

Find a bunch of bugs in the code - since we are stressing the optimizer a lot more here.
Fine-tune the parameter ranges as we discovered that many of them didn't make sense.

Remaining work:

Linter
Address my own comments (mostly around making the unit tests stricter still)

source/Mlos.Proto/OptimizerService.proto

source/Mlos.Python/OptimizerEvaluationTools.ipynb

source/Mlos.Python/mlos/OptimizerEvaluationTools/OptimizerEvaluationReport.py

source/Mlos.Python/mlos/OptimizerEvaluationTools/OptimizerEvaluator.py

source/Mlos.Python/mlos/OptimizerEvaluationTools/unit_tests/TestOptimizerEvaluator.py

…com/byte-sculptor/MLOS into 2020/october/optimizer_evaluation_2

sergiy-k · 2020-11-03T02:45:21Z

In BayesianOptimization2.ipynb: "Our goal here is to find the global minimum of this function" but the objective says: "minimize=False)] # minimize=False => maximize=True"

sergiy-k · 2020-11-03T02:57:31Z

In BayesianOptimization2.ipynb: "We're first configuring the model to refit after every iteration and use 10 trees for the random forest"
I apologize for a naïve question but I'm curious why/how you picked 10? What would happen if you had 50 trees instead of 10? What if only 5?

sergiy-k · 2020-11-03T03:20:33Z

In OptimizerEvaluationTools.ipynb: "Let's create an objective function taht we will be configuring"
taht => that

Also in this section you assign objective_function_config = objective_function_config_store.get_config_by_name('5_mutually_exclusive_polynomials') but them immediately re-assign it to objective_function_config = objective_function_config_store.get_config_by_name('three_level_quadratic'). Why?

sergiy-k · 2020-11-03T03:24:32Z

OptimizerEvaluationTools.ipynb, Section "In [7]" keeps printing optimizer's config many times. It does not look very useful but it occupies lots of space and makes the notebook look large. Does this output really need to be checked in?

sergiy-k · 2020-11-03T03:35:39Z

Mlos.Notebooks/OptimizerEvaluationTools.ipynb and Mlos.Python/OptimizerEvaluationTools.ipynb have a lot of identical content. Is it intentional to have both of them checked in?

byte-sculptor · 2020-11-03T18:01:57Z

In BayesianOptimization2.ipynb: "We're first configuring the model to refit after every iteration and use 10 trees for the random forest"
I apologize for a naïve question but I'm curious why/how you picked 10? What would happen if you had 50 trees instead of 10? What if only 5?

Those are absolutely magic numbers that we will need to tune. The intuition I have for this parameter is this:

5 trees might just work. It would be 2x less computationally expensive. But we are running the risk of degrading the goodness of fit and the accuracy of uncertainty estimates.
50 trees would also work, but we would spend 10x resources on fitting models.
So I left it at 10, because it was good enough for the demo purposes.

But you bring up an important point - how do we develop good intuitions behind these parameters?

My hope is that the evaluation tools I'm proposing in this PR, will help us systematically evaluate these tradeoffs.

sergiy-k · 2020-11-03T18:06:49Z

@byte-sculptor, thank you! That's exactly the response I was looking for. :)
You might want to consider adding this explanation to the notebook.

byte-sculptor · 2020-11-03T18:08:37Z

In OptimizerEvaluationTools.ipynb: "Let's create an objective function taht we will be configuring"
taht => that

Also in this section you assign objective_function_config = objective_function_config_store.get_config_by_name('5_mutually_exclusive_polynomials') but them immediately re-assign it to objective_function_config = objective_function_config_store.get_config_by_name('three_level_quadratic'). Why?

I wanted to evaluate the simpler 'three_level_quadratic' here, just forgot to clean up. Fixed now.

byte-sculptor · 2020-11-03T18:25:10Z

@sergiy-k, thank you for the thorough review :) I believe that I have addressed all of the above comments. There is no way for me to close them, so I'll just recap them here:

Fixed the minimum => maximum in BayesianOptimization2.ipynb.
Fixed typo.
Removed noisy logger output.
Removed Mlos.Python/OptimizerEvaluationTools.ipynb.
Added a note on magic numbers to OptimizerEvaluationTools.ipynb.
Cleaned up objective function that was overwritten.

sergiy-k · 2020-11-03T18:32:22Z

Added a note on magic numbers to OptimizerEvaluationTools.ipynb.

Thank you! I guess that this magic number (the number of trees for the random forest) is something that we can optimize with MLOS? I.e. the case you were talking about in the past - optimize MLOS with MLOS.

byte-sculptor · 2020-11-03T18:42:37Z

Added a note on magic numbers to OptimizerEvaluationTools.ipynb.

Thank you! I guess that this magic number (the number of trees for the random forest) is something that we can optimize with MLOS? I.e. the case you were talking about in the past - optimize MLOS with MLOS.

Hopefully we can do exactly that! :) Like you mentioned in the past - this will be similar to self hosting compilers - in that a compiler compiles it's own next version.

…e and have the docstring reflect that.

amueller · 2020-11-03T23:45:12Z

In OptimizerEvaluationTools.ipynb:
optimzier -> optimizer
taht -> that

In the BO literature, I don't think "surrogate model goodness of fit" on hold-out data is a commonly employed measure. Usually you don't care how bad your approximation is for the suboptimal regions as long as you know not to look there.

You're running several experiments in parallel, right? Is not each surrogate model using multiple cores? Or does your random forest not train in parallel?

The notebook looks interesting, gonna look at the code now.

source/Mlos.Python/mlos/OptimizerEvaluationTools/OptimizerEvaluator.py

amueller · 2020-11-03T23:47:48Z

source/Mlos.Python/mlos/Grpc/OptimizerMicroservice.py

@@ -118,6 +118,16 @@ def CreateOptimizer(self, request: OptimizerService_pb2.CreateOptimizerRequest,
 self.logger.info(f"Created optimizer {optimizer_id}.")
 return OptimizerService_pb2.OptimizerHandle(Id=optimizer_id)

+ def IsTrained(self, request, context): # pylint: disable=unused-argument


I'm a bit confused by the camel case for method names but it seems consistent in this file.

Yeah, they come from .proto file. I believe gRPC demands consistent naming even between languages. I think Google's style guide recommends CamelCase for message types, couldn't find what they recommend for function names but it seemed sensible to stick to CamelCase there too.

amueller · 2020-11-03T23:49:49Z

source/Mlos.Python/mlos/OptimizerEvaluationTools/OptimizerEvaluationReport.py

+
+ def add_pickled_optimizer(self, iteration: int, pickled_optimizer: bytes):
+ assert iteration >= 0
+ self.pickled_optimizers_over_time[iteration] = pickled_optimizer


Do these store the surrogate model as separate objects? These could get quite big in memory.

Yes, they do. Down the road we can have them written straight to disk.

source/Mlos.Python/mlos/OptimizerEvaluationTools/OptimizerEvaluator.py

amueller · 2020-11-04T00:39:11Z

source/Mlos.Python/mlos/OptimizerEvaluationTools/OptimizerEvaluatorConfigStore.py

+)
+
+
+# Parallel unit tests config.


I generally prefer not to have testing code together with the main code but I guess that's a matter of taste.

#163

I created an issue to solicit opinions from the rest of the team. I think I could go either way. What are your arguments for and against?

amueller · 2020-11-04T00:41:29Z

source/Mlos.Python/mlos/OptimizerEvaluationTools/unit_tests/TestOptimizerEvaluator.py

+ mlos.global_values.tracer.trace_events.extend(optimizer_evaluation_report.execution_trace)
+
+ with pd.option_context('display.max_columns', 100):
+ print(optimizer_evaluation_report.regression_model_goodness_of_fit_state.get_goodness_of_fit_dataframe(DataSetType.TRAIN).tail())


Is the goal to manually check the output? Or why are we printing in the unit test?

Of course not :) But I find the output useful during development.

amueller · 2020-11-04T00:42:22Z

source/Mlos.Python/mlos/OptimizerEvaluationTools/unit_tests/TestOptimizerEvaluator.py

+ future = executor.submit(optimizer_evaluator.evaluate_optimizer)
+ outstanding_futures.add(future)
+
+ done_futures, outstanding_futures = concurrent.futures.wait(outstanding_futures, return_when=concurrent.futures.ALL_COMPLETED)


you could also use joblib.Parallel for a slightly simpler interface that allows multiple backends.

Thanks! What are some pros and cons of joblib vs. concurrent.futures aside from a simpler API?

@amueller, I found this when researching comparisons: openvax/mhcflurry#59

What are your thoughts?

joblib indeed does batching of small jobs and probably does faster serialization of large arrays.
That comparison is from 2016 so I'm not sure it holds any more. concurrent.futures has a much more powerful interface in terms of lazy evaluation, joblib is basically a simple parallel for.

…uator.py Co-authored-by: Andreas Mueller <t3kcit@gmail.com>

byte-sculptor · 2020-11-04T20:25:00Z

In OptimizerEvaluationTools.ipynb:
optimzier -> optimizer
taht -> that

In the BO literature, I don't think "surrogate model goodness of fit" on hold-out data is a commonly employed measure. Usually you don't care how bad your approximation is for the suboptimal regions as long as you know not to look there.

That makes sense.

You're running several experiments in parallel, right? Is not each surrogate model using multiple cores? Or does your random forest not train in parallel?

Correct, parallelizing the RF training is a work item, but "premature optimization is the root of all evil". We should do it though :)

Added optimizer evaluation tools

9c17281