Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bayesian optimization on molecules test #268

Merged
merged 25 commits into from
Aug 28, 2023

Conversation

Copy link
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Swagata,

thank you very much. I just let some comments regardin the benchmark stuff. Start with fixing them and I will later add comments regarding the rest.

Best,

Johannes

def __init__(
self,
filename: str,
benchmark: Dict,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
benchmark: Dict,
domain: Domain,

Just do it via the domain, there is all info that you need.


def __init__(
self,
filename: str,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
filename: str,
experiments: pd.DataFrame

Let us directly dump in the dataframe, then the user should care about reading in the stuff. We only need then to check that the columns in the dataframe match the feature keys in the domain.

self.main_file[self.benchmark["output"]] = (
df[self.benchmark["output"]].dropna().to_numpy().reshape(-1, 1)
)
input_feature = CategoricalMolecularInput(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is then all obsolete, as you get the domain directly by the user.

from bofire.data_models.objectives.api import MaximizeObjective


class Molecule_benchmark(Benchmark):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would make this more agnostic and not have it just for molecules. It is a "LookUpTableBenchmark", as we pass in some combination of inputs to f and return the value from the table/dataframe, this can also work for other data than molecules. Ofc, we have to raise an error in _f if the combination is not found.


Returns:
pd.DataFrame: output values of the function. Columns are benchmark["output"] and valid_benchmark["output"].
"""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

raise an error if the input combination from X is not found in the data.

Copy link
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi Swagata, I let some comments.

def __init__(
self,
domain: Domain,
LookUpTable: pd.DataFrame,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
LookUpTable: pd.DataFrame,
lookup_table: pd.DataFrame,

just call it lookup_table, arguments are always snake case. You also have to adjust below.

from bofire.data_models.domain.api import Domain


class LookUpTableBenchmark(Benchmark):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
class LookUpTableBenchmark(Benchmark):
class LookupTableBenchmark(Benchmark):

It is one word "lookup".

location = []
for i in X.index:
condition = np.ones(len(self.LookUpTable), dtype=bool)
for k in self.domain.inputs.get_keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looping over the single entries is kind of cumbersome. Try something like this:

import pandas as pd

# create a sample dataframe
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C': [True, False, True]})

# create a sample series
s = pd.Series([2, 'b', False])

# check if the values of the series show up in a row of the dataframe
mask = df.isin(s)

# get the index of the rows that match the values in the series
index = mask.index[mask.all(axis=1)]

print(index)

If it matches all values, you can then use the index to return via loc the correct output values.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isin() does not work with input keys having similar categories. I used pd.merge which is also a nice substitute for for loops

X_temp = self.LookUpTable.loc[location]
X_temp.index = pd.RangeIndex(len(X_temp))
Y = pd.DataFrame()
for k in self.domain.outputs.get_keys():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dataframe which is read in in the first place should already have the valid keywords. When you then have the complete set of matching indices, you can just loc the columns and rows that you need and return them with resetted index.

df._merge == "left_only", df.columns != "_merge"
].proxy_index.to_list()
raise ValueError(f"Input combination {indices} not found in Look up table")
Y = X_temp[self.domain.outputs.get_keys()]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do it in one line ;)

Copy link
Contributor

@jduerholt jduerholt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. Thanks!

@jduerholt jduerholt merged commit b7fc7a7 into experimental-design:main Aug 28, 2023
10 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants