Fix mixed numeric datatypes for optimizers #667

ephoris · 2024-02-05T20:31:03Z

Addressing issue discussed in #666

…conversion

Note that if you have a dataframe will all numeric types, df.iloc[0] will convert all columns to float64 because Series cannot be mixed type.

…dataframes

This reverts commit 6c3a482.

ephoris · 2024-02-05T20:34:12Z

@microsoft-github-policy-service agree

This reverts commit 55bf6b3. Realized this is required as again, df.iloc[0] will convert all items to a similar type because pandas.Series cannot be mixed types. In the case of numeric values, everything is implicitly translated into numpy.float64 types.

motus

Looks nice! Next step, we should make sure LlamaTune does not choke on ConfigSpace instances that have conditionals (like, with the tunables that have special values). For starters, LlamaTune should be able to support that input: tunable_to_configspace_test.py:32

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py

mlos_core/mlos_core/spaces/adapters/llamatune.py

mlos_core/mlos_core/tests/optimizers/optimizer_test.py

Co-authored-by: Sergiy Matusevych <sergiy.matusevych@gmail.com>

ephoris · 2024-02-06T02:13:59Z

Hm okay I have noticed that the llamatune_opt_test.py:49 failed. Looking into it, I see that some test in mlos_core_opt_df_test.py:45 define parameters with special characters. For example kernel_sched_migration_cost_ns!type. Normally this column name is fine if we're converting from pandas.DataFrame to dict converting, but using a NamedTuples received from itertuples() will parse these columns strings with undefined behavior, which is why I am getting a test failure. I think we might have to use a different solution instead of pandas.DataFrame.itertuples().

ephoris · 2024-02-06T20:22:57Z

Since I am assuming MLOS wants to support generic strings for hyper parameters, we can revert back to iterrows(). Then by typecasting the dataframe to the object type, we can preserve numeric dtypes when calling iterrows() (see smac_optimizers.py:339). It's not as elegant unfortunately, but this prevents pandas from upcasting integers to floats.

ephoris · 2024-02-06T21:34:45Z

Linter failed on 983763f. Fixed then reran linter locally, sorry for the spam. I think workflows should pass this time around.

mlos_core/mlos_core/tests/optimizers/optimizer_test.py

bpkroth · 2024-02-07T15:56:52Z

Hm okay I have noticed that the llamatune_opt_test.py:49 failed. Looking into it, I see that some test in mlos_core_opt_df_test.py:45 define parameters with special characters. For example kernel_sched_migration_cost_ns!type. Normally this column name is fine if we're converting from pandas.DataFrame to dict converting, but using a NamedTuples received from itertuples() will parse these columns strings with undefined behavior, which is why I am getting a test failure. I think we might have to use a different solution instead of pandas.DataFrame.itertuples().

You could also try using itertuples(name=None) to get non-NamedTuples back.

bpkroth · 2024-02-07T15:57:47Z

Linter failed on 983763f. Fixed then reran linter locally, sorry for the spam. I think workflows should pass this time around.

No worries. I generally suggest using the devcontainer and just locally running make check and make test when you're doing things. It should run the same tests we did in CI then.

bpkroth · 2024-02-07T16:00:22Z

Final thought: I think there's a few places in mlos_bench where we do a to_numeric that might need re-examine with this.

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

mlos_core/mlos_core/tests/optimizers/optimizer_test.py

bpkroth

LGTM. Thanks!

bpkroth · 2024-02-07T18:45:57Z

Final thought: I think there's a few places in mlos_bench where we do a to_numeric that might need re-examine with this.

We can take this up elsewhere I think.

mlos_bench/mlos_bench/storage/util.py

ephoris · 2024-02-07T19:09:39Z

Hm okay I have noticed that the llamatune_opt_test.py:49 failed. Looking into it, I see that some test in mlos_core_opt_df_test.py:45 define parameters with special characters. For example kernel_sched_migration_cost_ns!type. Normally this column name is fine if we're converting from pandas.DataFrame to dict converting, but using a NamedTuples received from itertuples() will parse these columns strings with undefined behavior, which is why I am getting a test failure. I think we might have to use a different solution instead of pandas.DataFrame.itertuples().

You could also try using itertuples(name=None) to get non-NamedTuples back.

Unfortunately, itertuples preserves dtypes, but it also preserves pandas.NA values. Normally I would say this is good, however, pandas.NA value is not treated the same as None values. This results in situations where the ConfigSpace package will throw errors as it only handles None values. For now I guess pandas.DataFrame.astype('O') might be the most elegant solution for the time being as iterrows() will implicitly convert pandas.NA to None values which most of the code paths were set up to deal with.

ephoris · 2024-02-07T19:10:28Z

Thanks @bpkroth and @motus. Great learning experience on my end, I hope this bug fix will be helpful and prevent errors in the future.

ephoris added 5 commits February 5, 2024 15:33

[Fix] Add fix for pandas.Dataframe.iterrows() dropping dtypes during …

0eafaf0

…conversion

[WIP] Add test for mixed input spaces

8395321

[WIP] Change suggestion building in pytest example

71afa15

Note that if you have a dataframe will all numeric types, df.iloc[0] will convert all columns to float64 because Series cannot be mixed type.

[Fix] Change iterrows to itertuples to preserve datatypes in numeric …

9f129a5

…dataframes

Revert "[WIP] Change suggestion building in pytest example"

55bf6b3

This reverts commit 6c3a482.

ephoris force-pushed the ephoris/smac/iter_bug branch from 729ca08 to 55bf6b3 Compare February 5, 2024 20:33

motus approved these changes Feb 6, 2024

View reviewed changes

mlos_core/mlos_core/optimizers/bayesian_optimizers/smac_optimizer.py Outdated Show resolved Hide resolved

mlos_core/mlos_core/spaces/adapters/llamatune.py Outdated Show resolved Hide resolved

mlos_core/mlos_core/tests/optimizers/optimizer_test.py Outdated Show resolved Hide resolved

ephoris and others added 2 commits February 5, 2024 19:53

Update mlos_core/mlos_core/spaces/adapters/llamatune.py

bd272c8

Co-authored-by: Sergiy Matusevych <sergiy.matusevych@gmail.com>

[Revert] Copyright lines removed by mistake

617afe9

motus added the mlos-core label Feb 6, 2024

[Fix] Revert to iterrows, but typecast dataframe to object type

983763f

bpkroth linked an issue Feb 6, 2024 that may be closed by this pull request

SMAC optimizer does not support mixed input space #666

Closed

bpkroth added the bug Something isn't working label Feb 6, 2024

ephoris marked this pull request as ready for review February 6, 2024 20:24

ephoris requested a review from a team as a code owner February 6, 2024 20:24

[Test] Fix tagging on args

ff33ce4

Merge branch 'main' into ephoris/smac/iter_bug

6c6a30a

bpkroth reviewed Feb 7, 2024

View reviewed changes

mlos_core/mlos_core/tests/optimizers/optimizer_test.py Outdated Show resolved Hide resolved

bpkroth reviewed Feb 7, 2024

View reviewed changes

mlos_core/mlos_core/tests/optimizers/optimizer_test.py Outdated Show resolved Hide resolved

bpkroth reviewed Feb 7, 2024

View reviewed changes

mlos_core/mlos_core/tests/optimizers/optimizer_test.py Outdated Show resolved Hide resolved

bpkroth reviewed Feb 7, 2024

View reviewed changes

mlos_core/mlos_core/tests/optimizers/optimizer_test.py Outdated Show resolved Hide resolved

bpkroth reviewed Feb 7, 2024

View reviewed changes

mlos_core/mlos_core/tests/optimizers/optimizer_test.py Outdated Show resolved Hide resolved

ephoris and others added 3 commits February 7, 2024 12:39

Update mlos_core/mlos_core/tests/optimizers/optimizer_test.py

4b4ec33

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

Update mlos_core/mlos_core/tests/optimizers/optimizer_test.py

423d582

Co-authored-by: Brian Kroth <bpkroth@users.noreply.github.com>

[Test] Clean optimizer_test to include seed and checks

2bc2e50

ephoris force-pushed the ephoris/smac/iter_bug branch from fb31586 to 2bc2e50 Compare February 7, 2024 18:20

bpkroth reviewed Feb 7, 2024

View reviewed changes

mlos_core/mlos_core/tests/optimizers/optimizer_test.py Show resolved Hide resolved

Update mlos_core/mlos_core/tests/optimizers/optimizer_test.py

05becea

bpkroth approved these changes Feb 7, 2024

View reviewed changes

bpkroth reviewed Feb 7, 2024

View reviewed changes

mlos_bench/mlos_bench/storage/util.py Show resolved Hide resolved

bpkroth merged commit 9175d18 into microsoft:main Feb 7, 2024
11 of 12 checks passed

bpkroth mentioned this pull request Feb 7, 2024

Add tests for kv_df_to_dict and mixed integer types #669

Closed

ephoris deleted the ephoris/smac/iter_bug branch February 7, 2024 21:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix mixed numeric datatypes for optimizers #667

Fix mixed numeric datatypes for optimizers #667

ephoris commented Feb 5, 2024

ephoris commented Feb 5, 2024

motus left a comment •

edited

Loading

ephoris commented Feb 6, 2024 •

edited

Loading

ephoris commented Feb 6, 2024

ephoris commented Feb 6, 2024

bpkroth commented Feb 7, 2024

bpkroth commented Feb 7, 2024

bpkroth commented Feb 7, 2024

bpkroth left a comment

bpkroth commented Feb 7, 2024

ephoris commented Feb 7, 2024

ephoris commented Feb 7, 2024

Fix mixed numeric datatypes for optimizers #667

Fix mixed numeric datatypes for optimizers #667

Conversation

ephoris commented Feb 5, 2024

ephoris commented Feb 5, 2024

motus left a comment • edited Loading

Choose a reason for hiding this comment

ephoris commented Feb 6, 2024 • edited Loading

ephoris commented Feb 6, 2024

ephoris commented Feb 6, 2024

bpkroth commented Feb 7, 2024

bpkroth commented Feb 7, 2024

bpkroth commented Feb 7, 2024

bpkroth left a comment

Choose a reason for hiding this comment

bpkroth commented Feb 7, 2024

ephoris commented Feb 7, 2024

ephoris commented Feb 7, 2024

motus left a comment •

edited

Loading

ephoris commented Feb 6, 2024 •

edited

Loading