-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bayesian optimization on molecules test #268
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Swagata,
thank you very much. I just let some comments regardin the benchmark stuff. Start with fixing them and I will later add comments regarding the rest.
Best,
Johannes
def __init__( | ||
self, | ||
filename: str, | ||
benchmark: Dict, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
benchmark: Dict, | |
domain: Domain, |
Just do it via the domain, there is all info that you need.
|
||
def __init__( | ||
self, | ||
filename: str, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
filename: str, | |
experiments: pd.DataFrame |
Let us directly dump in the dataframe, then the user should care about reading in the stuff. We only need then to check that the columns in the dataframe match the feature keys in the domain.
self.main_file[self.benchmark["output"]] = ( | ||
df[self.benchmark["output"]].dropna().to_numpy().reshape(-1, 1) | ||
) | ||
input_feature = CategoricalMolecularInput( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is then all obsolete, as you get the domain directly by the user.
from bofire.data_models.objectives.api import MaximizeObjective | ||
|
||
|
||
class Molecule_benchmark(Benchmark): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would make this more agnostic and not have it just for molecules. It is a "LookUpTableBenchmark", as we pass in some combination of inputs to f
and return the value from the table/dataframe, this can also work for other data than molecules. Ofc, we have to raise an error in _f
if the combination is not found.
|
||
Returns: | ||
pd.DataFrame: output values of the function. Columns are benchmark["output"] and valid_benchmark["output"]. | ||
""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
raise an error if the input combination from X
is not found in the data.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi Swagata, I let some comments.
def __init__( | ||
self, | ||
domain: Domain, | ||
LookUpTable: pd.DataFrame, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LookUpTable: pd.DataFrame, | |
lookup_table: pd.DataFrame, |
just call it lookup_table
, arguments are always snake case. You also have to adjust below.
from bofire.data_models.domain.api import Domain | ||
|
||
|
||
class LookUpTableBenchmark(Benchmark): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
class LookUpTableBenchmark(Benchmark): | |
class LookupTableBenchmark(Benchmark): |
It is one word "lookup".
location = [] | ||
for i in X.index: | ||
condition = np.ones(len(self.LookUpTable), dtype=bool) | ||
for k in self.domain.inputs.get_keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looping over the single entries is kind of cumbersome. Try something like this:
import pandas as pd
# create a sample dataframe
df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C': [True, False, True]})
# create a sample series
s = pd.Series([2, 'b', False])
# check if the values of the series show up in a row of the dataframe
mask = df.isin(s)
# get the index of the rows that match the values in the series
index = mask.index[mask.all(axis=1)]
print(index)
If it matches all values, you can then use the index to return via loc the correct output values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isin() does not work with input keys having similar categories. I used pd.merge which is also a nice substitute for for loops
X_temp = self.LookUpTable.loc[location] | ||
X_temp.index = pd.RangeIndex(len(X_temp)) | ||
Y = pd.DataFrame() | ||
for k in self.domain.outputs.get_keys(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dataframe which is read in in the first place should already have the valid keywords. When you then have the complete set of matching indices, you can just loc the columns and rows that you need and return them with resetted index.
df._merge == "left_only", df.columns != "_merge" | ||
].proxy_index.to_list() | ||
raise ValueError(f"Input combination {indices} not found in Look up table") | ||
Y = X_temp[self.domain.outputs.get_keys()] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do it in one line ;)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. Thanks!
@jduerholt ,
Replicated https://github.com/leojklarner/gauche/blob/main/notebooks/Bayesian%20Optimisation%20Over%20Molecules.ipynb