Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimizer Evaluation Tools #155

Merged

Conversation

byte-sculptor
Copy link
Contributor

@byte-sculptor byte-sculptor commented Oct 31, 2020

This PR does not involve big changes to the optimizer. We are mostly just testing it more thoroughly.

The bulk of this PR lives in two files:

  • OptimizerEvaluationReport.py
  • OptimizerEvaluator.py

Their usage is presented in the OptimizerEvaluationTools.ipynb Notebook. I highly encourage you to start your review by looking at that fully rendered notebook. You can view it here: https://github.com/microsoft/MLOS/blob/860b966849180fde2a5a6140323ab2700c36606f/source/Mlos.Notebooks/OptimizerEvaluationTools.ipynb

The goal of this work is to produce a framework to efficiently compare optimizer configurations across a variety of synthetic objective functions. It has already served it's purpose by allowing me to:

  1. Find a bunch of bugs in the code - since we are stressing the optimizer a lot more here.
  2. Fine-tune the parameter ranges as we discovered that many of them didn't make sense.

Remaining work:

  • Linter
  • Address my own comments (mostly around making the unit tests stricter still)

@byte-sculptor byte-sculptor marked this pull request as ready for review October 31, 2020 02:38
@sergiy-k
Copy link

sergiy-k commented Nov 3, 2020

In BayesianOptimization2.ipynb: "Our goal here is to find the global minimum of this function" but the objective says: "minimize=False)] # minimize=False => maximize=True"

@sergiy-k
Copy link

sergiy-k commented Nov 3, 2020

In BayesianOptimization2.ipynb: "We're first configuring the model to refit after every iteration and use 10 trees for the random forest"
I apologize for a naïve question but I'm curious why/how you picked 10? What would happen if you had 50 trees instead of 10? What if only 5?

@sergiy-k
Copy link

sergiy-k commented Nov 3, 2020

In OptimizerEvaluationTools.ipynb: "Let's create an objective function taht we will be configuring"
taht => that

Also in this section you assign objective_function_config = objective_function_config_store.get_config_by_name('5_mutually_exclusive_polynomials') but them immediately re-assign it to objective_function_config = objective_function_config_store.get_config_by_name('three_level_quadratic'). Why?

@sergiy-k
Copy link

sergiy-k commented Nov 3, 2020

OptimizerEvaluationTools.ipynb, Section "In [7]" keeps printing optimizer's config many times. It does not look very useful but it occupies lots of space and makes the notebook look large. Does this output really need to be checked in?

@sergiy-k
Copy link

sergiy-k commented Nov 3, 2020

Mlos.Notebooks/OptimizerEvaluationTools.ipynb and Mlos.Python/OptimizerEvaluationTools.ipynb have a lot of identical content. Is it intentional to have both of them checked in?

@byte-sculptor
Copy link
Contributor Author

In BayesianOptimization2.ipynb: "We're first configuring the model to refit after every iteration and use 10 trees for the random forest"
I apologize for a naïve question but I'm curious why/how you picked 10? What would happen if you had 50 trees instead of 10? What if only 5?

Those are absolutely magic numbers that we will need to tune. The intuition I have for this parameter is this:

  • 5 trees might just work. It would be 2x less computationally expensive. But we are running the risk of degrading the goodness of fit and the accuracy of uncertainty estimates.
  • 50 trees would also work, but we would spend 10x resources on fitting models.
    So I left it at 10, because it was good enough for the demo purposes.

But you bring up an important point - how do we develop good intuitions behind these parameters?

My hope is that the evaluation tools I'm proposing in this PR, will help us systematically evaluate these tradeoffs.

@sergiy-k
Copy link

sergiy-k commented Nov 3, 2020

@byte-sculptor, thank you! That's exactly the response I was looking for. :)
You might want to consider adding this explanation to the notebook.

@byte-sculptor
Copy link
Contributor Author

In OptimizerEvaluationTools.ipynb: "Let's create an objective function taht we will be configuring"
taht => that

Also in this section you assign objective_function_config = objective_function_config_store.get_config_by_name('5_mutually_exclusive_polynomials') but them immediately re-assign it to objective_function_config = objective_function_config_store.get_config_by_name('three_level_quadratic'). Why?

I wanted to evaluate the simpler 'three_level_quadratic' here, just forgot to clean up. Fixed now.

@byte-sculptor
Copy link
Contributor Author

@sergiy-k, thank you for the thorough review :) I believe that I have addressed all of the above comments. There is no way for me to close them, so I'll just recap them here:

  • Fixed the minimum => maximum in BayesianOptimization2.ipynb.
  • Fixed typo.
  • Removed noisy logger output.
  • Removed Mlos.Python/OptimizerEvaluationTools.ipynb.
  • Added a note on magic numbers to OptimizerEvaluationTools.ipynb.
  • Cleaned up objective function that was overwritten.

@sergiy-k
Copy link

sergiy-k commented Nov 3, 2020

  • Added a note on magic numbers to OptimizerEvaluationTools.ipynb.

Thank you! I guess that this magic number (the number of trees for the random forest) is something that we can optimize with MLOS? I.e. the case you were talking about in the past - optimize MLOS with MLOS.

@byte-sculptor
Copy link
Contributor Author

  • Added a note on magic numbers to OptimizerEvaluationTools.ipynb.

Thank you! I guess that this magic number (the number of trees for the random forest) is something that we can optimize with MLOS? I.e. the case you were talking about in the past - optimize MLOS with MLOS.

Hopefully we can do exactly that! :) Like you mentioned in the past - this will be similar to self hosting compilers - in that a compiler compiles it's own next version.

@amueller
Copy link
Contributor

amueller commented Nov 3, 2020

In OptimizerEvaluationTools.ipynb:
optimzier -> optimizer
taht -> that

In the BO literature, I don't think "surrogate model goodness of fit" on hold-out data is a commonly employed measure. Usually you don't care how bad your approximation is for the suboptimal regions as long as you know not to look there.

You're running several experiments in parallel, right? Is not each surrogate model using multiple cores? Or does your random forest not train in parallel?

The notebook looks interesting, gonna look at the code now.

@@ -118,6 +118,16 @@ def CreateOptimizer(self, request: OptimizerService_pb2.CreateOptimizerRequest,
self.logger.info(f"Created optimizer {optimizer_id}.")
return OptimizerService_pb2.OptimizerHandle(Id=optimizer_id)

def IsTrained(self, request, context): # pylint: disable=unused-argument
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm a bit confused by the camel case for method names but it seems consistent in this file.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, they come from .proto file. I believe gRPC demands consistent naming even between languages. I think Google's style guide recommends CamelCase for message types, couldn't find what they recommend for function names but it seemed sensible to stick to CamelCase there too.


def add_pickled_optimizer(self, iteration: int, pickled_optimizer: bytes):
assert iteration >= 0
self.pickled_optimizers_over_time[iteration] = pickled_optimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do these store the surrogate model as separate objects? These could get quite big in memory.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, they do. Down the road we can have them written straight to disk.

)


# Parallel unit tests config.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I generally prefer not to have testing code together with the main code but I guess that's a matter of taste.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#163

I created an issue to solicit opinions from the rest of the team. I think I could go either way. What are your arguments for and against?

mlos.global_values.tracer.trace_events.extend(optimizer_evaluation_report.execution_trace)

with pd.option_context('display.max_columns', 100):
print(optimizer_evaluation_report.regression_model_goodness_of_fit_state.get_goodness_of_fit_dataframe(DataSetType.TRAIN).tail())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the goal to manually check the output? Or why are we printing in the unit test?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Of course not :) But I find the output useful during development.

future = executor.submit(optimizer_evaluator.evaluate_optimizer)
outstanding_futures.add(future)

done_futures, outstanding_futures = concurrent.futures.wait(outstanding_futures, return_when=concurrent.futures.ALL_COMPLETED)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you could also use joblib.Parallel for a slightly simpler interface that allows multiple backends.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! What are some pros and cons of joblib vs. concurrent.futures aside from a simpler API?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@amueller, I found this when researching comparisons: openvax/mhcflurry#59

What are your thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

joblib indeed does batching of small jobs and probably does faster serialization of large arrays.
That comparison is from 2016 so I'm not sure it holds any more. concurrent.futures has a much more powerful interface in terms of lazy evaluation, joblib is basically a simple parallel for.

byte-sculptor and others added 2 commits November 3, 2020 16:51
…uator.py

Co-authored-by: Andreas Mueller <t3kcit@gmail.com>
…uator.py

Co-authored-by: Andreas Mueller <t3kcit@gmail.com>
@byte-sculptor byte-sculptor merged commit 76100b0 into microsoft:main Nov 4, 2020
@byte-sculptor
Copy link
Contributor Author

byte-sculptor commented Nov 4, 2020

In OptimizerEvaluationTools.ipynb:
optimzier -> optimizer
taht -> that

In the BO literature, I don't think "surrogate model goodness of fit" on hold-out data is a commonly employed measure. Usually you don't care how bad your approximation is for the suboptimal regions as long as you know not to look there.

That makes sense.

You're running several experiments in parallel, right? Is not each surrogate model using multiple cores? Or does your random forest not train in parallel?

Correct, parallelizing the RF training is a work item, but "premature optimization is the root of all evil". We should do it though :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants