init checkin to add LassoCV and RERF to optimizers #263

edcthayer · 2021-08-06T21:35:22Z

Added LassoCrossValidated (LassoCV) and RegressionEnhancedRandomForest (RERF) regression models to the list of surrogate models available for optimizers. This required creating MultiObjective versions for each of these regression models. Fixed some bugs found via testing with random surrogate_model parameters. Details below by file added/changed:

source/Mlos.Python/mlos/Optimizers/BayesianOptimizerConfigStore.py:

Added LassoCV, MultiObjectiveLassoCV, RERF, MultiObjective RERF to surrogate_model_implementation list and expanded the resulting hyper grid. Didn't change the default from historical HomogeneousRandomForest.

source/Mlos.Python/mlos/Optimizers/BayesianOptimizer.py:

Added if-elif-else code to correctly instantiate the surrogate_model based on the optimizer_config.surrogate_model_implementation.
Extended assert test to include new surrogate models.

source/Mlos.Python/mlos/Optimizers/RegressionModels/LassoCrossValidatedConfigStore.py:

Corrected dimension type for LassoCV cv parameter (continuous --> discrete).
Restricted ranges on some model_configs to avoid Windows faults discovered in random config tests.

source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveLassoCrossValidated.py:
New class to allow LassoCV for multi-objective optimizations.

source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveRegressionEnhancedRandomForest.py:
New class to allow RERF for multi-objective optimizations.
Note: the .copy() on line 41 is needed b/c model_config.perform_initial_random_forest_hyper_parameter_search value is changed (True -> False) once grid search completes for the random forest fit().

source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveRegressionEnhancedRandomForest.py:

Correctly capture the random forest hyper parameters returned from the grid search (line 320).
Cleaned up some initializations.

source/Mlos.Python/mlos/Optimizers/RegressionModels/unit_tests/TestMultiObjectiveLassoCrossValidated.py:
New unit tests for new class.

source/Mlos.Python/mlos/Optimizers/RegressionModels/unit_tests/TestMultiObjectiveRegressionEnhancedRandomForest.py:
New unit tests for new class.

…e values in unit tests for BayesianOptimier

… models

…timizer tests

…tests

…succeed

…zers

byte-sculptor · 2021-09-08T20:06:17Z

source/Mlos.Python/mlos/Optimizers/BayesianOptimizer.py

@@ -59,20 +61,47 @@ def __init__(

 # Now let's put together the surrogate model.
 #
+ print(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}')


Suggested change

print(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}')

self.logger.info(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}')

source/Mlos.Python/mlos/Optimizers/BayesianOptimizer.py

byte-sculptor · 2021-09-08T20:09:45Z

source/Mlos.Python/mlos/Optimizers/RegressionModels/LassoCrossValidatedConfigStore.py

 CategoricalDimension(name="fit_intercept", values=[False, True]),
 CategoricalDimension(name="normalize", values=[False, True]),
 CategoricalDimension(name="precompute", values=[False, True]),
- DiscreteDimension(name="max_iter", min=0, max=10 ** 5),
- ContinuousDimension(name="tol", min=0, max=2 ** 10),
+ DiscreteDimension(name="max_iter", min=100, max=5 * 10 **3),


Suggested change

DiscreteDimension(name="max_iter", min=100, max=5 * 10 **3),

DiscreteDimension(name="max_iter", min=100, max=5 * (10 ** 3)),

byte-sculptor · 2021-09-08T20:16:09Z

source/Mlos.Python/mlos/Optimizers/RegressionModels/LassoCrossValidatedRegressionModel.py

@@ -89,6 +91,10 @@ def __init__(
 self.partial_hat_matrix_ = 0
 self.regressor_standard_error_ = 0

+ # THE HACK


We may need to explain a little more here. If I remember right:

When LassoCV is used as part of RERF, it cannot reasonably compute the upper and lower bounds on its input space dimensions, as they are a polynomial combination of inputs to RERF. Thus, it approximates them with the empirical min and max. These approximations are biased: the lower bound is too large, the upper bound is too small. Consequently, during scoring, LassoCV is likely to see input outside of these bounds, but we still want LassoCV to produce predictions for those points. So we introduce a little hack: whenever LassoCV is instantiated as part of RERF, it should skip input filtering on predict. This field, controls this behavior.

Feel free to just copy-paste that in, or polish it to your liking!

byte-sculptor · 2021-09-08T20:17:54Z

source/Mlos.Python/mlos/Optimizers/RegressionModels/LassoCrossValidatedRegressionModel.py

 # add small noise to x to remove singularity,
 # expect prediction confidence to be reduced (wider intervals) by doing this
 self.logger.info(
- f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10^10."
+ f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10^4."


10**4

Suggested change

f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10^4."

f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10**4."

It's clear what you mean... but my CDO strongly suggests that we should stick to the Python exponentiation operator :)

byte-sculptor · 2021-09-08T20:20:24Z

source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveLassoCrossValidated.py

+
+
+class MultiObjectiveLassoCrossValidated(NaiveMultiObjectiveRegressionModel):
+ """Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.


Suggested change

"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.

"""Maintains multiple LassoCrossValidatedRegressionModels each predicting a different objective.

byte-sculptor · 2021-09-08T20:21:35Z

source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveLassoCrossValidated.py

+ )
+
+
+ # We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space.


Suggested change

# We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space.

# We just need to assert that the model config belongs in lasso_cross_validated_config_store.parameter_space.

byte-sculptor · 2021-09-08T20:22:08Z

...Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveRegressionEnhancedRandomForest.py

+
+
+class MultiObjectiveRegressionEnhancedRandomForest(NaiveMultiObjectiveRegressionModel):
+ """Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.


Suggested change

"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.

"""Maintains multiple RegressionEnhancedRandomForestRegressionModel each predicting a different objective.

byte-sculptor · 2021-09-08T20:22:32Z

...Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveRegressionEnhancedRandomForest.py

+ )
+
+
+ # We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space.


Suggested change

# We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space.

# We just need to assert that the model config belongs in regression_enhanced_random_forest_config_store.parameter_space.

byte-sculptor · 2021-09-08T20:23:29Z

source/Mlos.Python/mlos/Optimizers/RegressionModels/MultiObjectiveLassoCrossValidated.py

+ for output_dimension in output_space.dimensions:
+ print(f'output_dimension.name: {output_dimension.name}')
+ lasso_model = LassoCrossValidatedRegressionModel(
+ model_config=model_config,


You copy the model_config in multi-objective RERF, but not here. Why?

Values in the model config are altered by the random forest GridSearchCV for the RERF. When these configs are assigned to different objectives, they stomped all over each other. I'll track down the lines in RERF model that alter the model_config and explain this in the MultiObjectiveRERF code where you've spotted this difference.

byte-sculptor · 2021-09-08T20:25:39Z

source/Mlos.Python/mlos/Optimizers/RegressionModels/RegressionEnhancedRandomForestModel.py

- # TODO : determine min sample needed to fit based on model configs
- random_forest_should_fit = True
- return root_base_model_should_fit and random_forest_should_fit
+ # since polynomial basis functions decrease the degrees of freedom (TODO: add reference),


This is neat :)

byte-sculptor · 2021-09-08T20:28:33Z

....Python/mlos/Optimizers/RegressionModels/unit_tests/TestMultiObjectiveLassoCrossValidated.py

+ num_testing_samples = 10
+ elif objective_function_config_name == '5_mutually_exclusive_polynomials':
+ num_training_samples = 100
+ num_testing_samples = 50


Suggested change

num_testing_samples = 50

num_testing_samples = 50

else:

assert False

byte-sculptor · 2021-09-08T20:30:32Z

...s/Optimizers/RegressionModels/unit_tests/TestMultiObjectiveRegressionEnhancedRandomForest.py

+ num_testing_samples = 10
+ elif objective_function_config_name == '5_mutually_exclusive_polynomials':
+ num_training_samples = 100
+ num_testing_samples = 50


Suggested change

num_testing_samples = 50

num_testing_samples = 50

else:

assert False

init checkin to add LassoCV and RERF to optimizers

34db416

edcthayer requested a review from byte-sculptor August 6, 2021 21:35

fixes to pylint catches + ...

107dd85

edcthayer requested a review from sergiy-k August 6, 2021 22:19

Ed Thayer added 14 commits August 6, 2021 17:14

addressing more random config failures

a575acc

continued to clean up RERF hyperparam config space and restricted som…

53c9640

…e values in unit tests for BayesianOptimier

fixes to pylint catches

710d803

cleaned up random model_config unit test failures in LassoCV and RERF…

0b48501

… models

force lassoCV cv parameter < num_samples

d8414c9

cleaned up comments, restricted sklearnRF max_samples range

e383d35

correcting incorrect DEFAULT point in SKLearnRF model_config

42c76b0

cleaned up hypergrid adapters to solve failing random model_config op…

5662a44

…timizer tests

cleaned up pylint issues

bbda9d6

fixed additional rerf random config failures and tried to accelerate …

81c71bb

…tests

decreased num random config tested from 100 to prev 10

c6a3ea6

fixes to allow gRPC random optimizer config unit tests to succeed

368b333

additional fixes to allow gRPC random optimizer config unit tests to …

a8874a2

…succeed

reduced unit test duration by reducing train/test sizes in new optimi…

8810557

…zers

byte-sculptor reviewed Sep 8, 2021

View reviewed changes

source/Mlos.Python/mlos/Optimizers/BayesianOptimizer.py Show resolved Hide resolved

byte-sculptor reviewed Sep 8, 2021

View reviewed changes

byte-sculptor approved these changes Sep 8, 2021

View reviewed changes

addressing review feedback

9efbcb0

edcthayer merged commit 791d670 into microsoft:main Sep 30, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

init checkin to add LassoCV and RERF to optimizers #263

init checkin to add LassoCV and RERF to optimizers #263

edcthayer commented Aug 6, 2021 •

edited

Loading

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021 •

edited

Loading

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

edcthayer Sep 10, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

byte-sculptor Sep 8, 2021

	print(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}')
	self.logger.info(f'self.optimizer_config.surrogate_model_implementation: {self.optimizer_config.surrogate_model_implementation}')

	DiscreteDimension(name="max_iter", min=100, max=5 * 10 **3),
	DiscreteDimension(name="max_iter", min=100, max=5 * (10 ** 3)),

	f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10^4."
	f"Adding noise to design matrix used for prediction confidence due to condition number {condition_number} > 10**4."



		class MultiObjectiveLassoCrossValidated(NaiveMultiObjectiveRegressionModel):
		"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.

	"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.
	"""Maintains multiple LassoCrossValidatedRegressionModels each predicting a different objective.

		)


		# We just need to assert that the model config belongs in homogeneous_random_forest_config_store.parameter_space.



		class MultiObjectiveRegressionEnhancedRandomForest(NaiveMultiObjectiveRegressionModel):
		"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.

	"""Maintains multiple HomogeneousRandomForestRegressionModels each predicting a different objective.
	"""Maintains multiple RegressionEnhancedRandomForestRegressionModel each predicting a different objective.

init checkin to add LassoCV and RERF to optimizers #263

init checkin to add LassoCV and RERF to optimizers #263

Conversation

edcthayer commented Aug 6, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

byte-sculptor Sep 8, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

edcthayer commented Aug 6, 2021 •

edited

Loading

byte-sculptor Sep 8, 2021 •

edited

Loading