Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SMAC optimizer does not support mixed input space #666

Closed
ephoris opened this issue Feb 5, 2024 · 5 comments · Fixed by #667
Closed

SMAC optimizer does not support mixed input space #666

ephoris opened this issue Feb 5, 2024 · 5 comments · Fixed by #667
Labels
bug Something isn't working mlos-core tests Add or fix unit tests

Comments

@ephoris
Copy link
Contributor

ephoris commented Feb 5, 2024

Using the SMAC optimizer in conjunction with a mix of different parameter space types will throw an error. Minimum reproducible example is attached.

import pandas as pd
import mlos_core.optimizers
import ConfigSpace as CS


def objective(x: pd.DataFrame):
    out = x.values
    out = out.sum()

    return out


def run_optimization(optimizer: mlos_core.optimizers.BaseOptimizer):
    suggested_value = optimizer.suggest()
    target_value = objective(suggested_value)
    print(f"{suggested_value}")
    print(f"{target_value=}")
    optimizer.register(suggested_value, pd.Series([target_value]))

    return


if __name__ == "__main__":
    parameter_space = CS.ConfigurationSpace(seed=0)
    dummy_val_one = CS.Float("dummy_one", (1, 10.0), default=5.0)
    dummy_val_two = CS.Integer("dummy_two", (1, 10), default=5)
    parameter_space.add_hyperparameters([dummy_val_one, dummy_val_two])

    optimizer = mlos_core.optimizers.SmacOptimizer(parameter_space=parameter_space)

    n_iterations = 10
    for _ in range(n_iterations):
        run_optimization(optimizer)

Results in an error

Traceback (most recent call last):
  File "*/tmp/mlos_tmp.py", line 33, in <module>
    run_optimization(optimizer)
  File "*/tmp/mlos_tmp.py", line 18, in run_optimization
    optimizer.register(suggested_value, pd.Series([target_value]))
  File "*/anaconda3/lib/python3.9/site-packages/mlos_core/optimizers/optimizer.py", line 91, in register
    return self._register(configurations, scores, context)
  File "*/anaconda3/lib/python3.9/site-packages/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py", line 247
, in _register
    for config, score in zip(self._to_configspace_configs(configurations), scores.tolist()):
  File "*/anaconda3/lib/python3.9/site-packages/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py", line 337
, in _to_configspace_configs
    return [
  File "*/anaconda3/lib/python3.9/site-packages/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py", line 338
, in <listcomp>
    ConfigSpace.Configuration(self.optimizer_parameter_space, values=config.to_dict())
  File "*/anaconda3/lib/python3.9/site-packages/ConfigSpace/configuration.py", line 90, in __init__
    raise IllegalValueError(hp, value)
ConfigSpace.exceptions.IllegalValueError: Value 1.0: (<class 'float'>) is not allowed for hyperparameter dummy_two, Type: UniformInteger, Ran
ge: [1, 10], Default: 5

The SMAC optimizer register function results in a call to SmacOptimizer._to_configspace_configs which calls a Dataframe.iterrows() method. As per the pandas documentation on pandas.Dataframe.iterrows, the iterrows method does not preserve dtypes when consumed (see note 1).

Unless I am using the API wrong, the change seems to be a simple fix to the SmacOptimizer._to_configspace_configs method to use NamedTuples. I can submit a PR if this issue is real.

@bpkroth
Copy link
Contributor

bpkroth commented Feb 5, 2024

Hi @ephoris , thanks so much for reporting this. This does indeed look like a bug, though I'm surprised we missed that test case. We'd be happy to accept a PR. Could you please include a test case for it as well? Thanks!

@bpkroth bpkroth added bug Something isn't working tests Add or fix unit tests mlos-core labels Feb 5, 2024
@bpkroth
Copy link
Contributor

bpkroth commented Feb 5, 2024

@motus - fyi
I seem to recall you mentioning something about that converter recently.

@bpkroth
Copy link
Contributor

bpkroth commented Feb 5, 2024

We should probably replace all instances of iterrows with itertuples if it's easy. There's only 4 cases atm.

@ephoris
Copy link
Contributor Author

ephoris commented Feb 5, 2024

I know that itertuples will instead return a NamedTuple. To prevent too many downstream changes, NamedTuples can be converted back to dictionaries using the _asdict method, albeit it's a bit slow.

I wrote a small test following the format of some of the others, seems the bug only applies if you have multiple numeric types. If you add in a CategoricalHyperParameter all dtypes are preserved.

I created a test but my fix doesn't seem to work in the pytest infrastructure. I can take a closer look to see what's going on.

@bpkroth
Copy link
Contributor

bpkroth commented Feb 5, 2024

Please at least submit a draft PR for now. Then we can try and help take a look. Thanks!

@bpkroth bpkroth linked a pull request Feb 6, 2024 that will close this issue
bpkroth added a commit that referenced this issue Feb 7, 2024
Addressing issue discussed in #666

---------

Co-authored-by: Sergiy Matusevych <sergiy.matusevych@gmail.com>
Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working mlos-core tests Add or fix unit tests
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants