Bayesian optimization on molecules test #268

swagataroy123 · 2023-08-23T11:54:25Z

@jduerholt ,
Replicated https://github.com/leojklarner/gauche/blob/main/notebooks/Bayesian%20Optimisation%20Over%20Molecules.ipynb

jduerholt

Hi Swagata,

thank you very much. I just let some comments regardin the benchmark stuff. Start with fixing them and I will later add comments regarding the rest.

Best,

Johannes

jduerholt · 2023-08-23T15:17:41Z

bofire/benchmarks/molecule_benchmark.py

+    def __init__(
+        self,
+        filename: str,
+        benchmark: Dict,


Suggested change

benchmark: Dict,

domain: Domain,

Just do it via the domain, there is all info that you need.

jduerholt · 2023-08-23T15:19:15Z

bofire/benchmarks/molecule_benchmark.py

+
+    def __init__(
+        self,
+        filename: str,


Suggested change

filename: str,

experiments: pd.DataFrame

Let us directly dump in the dataframe, then the user should care about reading in the stuff. We only need then to check that the columns in the dataframe match the feature keys in the domain.

jduerholt · 2023-08-23T15:19:54Z

bofire/benchmarks/molecule_benchmark.py

+        self.main_file[self.benchmark["output"]] = (
+            df[self.benchmark["output"]].dropna().to_numpy().reshape(-1, 1)
+        )
+        input_feature = CategoricalMolecularInput(


this is then all obsolete, as you get the domain directly by the user.

jduerholt · 2023-08-23T15:22:03Z

bofire/benchmarks/molecule_benchmark.py

+from bofire.data_models.objectives.api import MaximizeObjective
+
+
+class Molecule_benchmark(Benchmark):


I would make this more agnostic and not have it just for molecules. It is a "LookUpTableBenchmark", as we pass in some combination of inputs to f and return the value from the table/dataframe, this can also work for other data than molecules. Ofc, we have to raise an error in _f if the combination is not found.

jduerholt · 2023-08-23T15:22:49Z

bofire/benchmarks/molecule_benchmark.py

+
+        Returns:
+            pd.DataFrame: output values of the function. Columns are benchmark["output"] and valid_benchmark["output"].
+        """


raise an error if the input combination from X is not found in the data.

jduerholt

Hi Swagata, I let some comments.

jduerholt · 2023-08-25T07:21:47Z

bofire/benchmarks/LookUpTableBenchmark.py

+    def __init__(
+        self,
+        domain: Domain,
+        LookUpTable: pd.DataFrame,


Suggested change

LookUpTable: pd.DataFrame,

lookup_table: pd.DataFrame,

just call it lookup_table, arguments are always snake case. You also have to adjust below.

jduerholt · 2023-08-25T07:22:13Z

bofire/benchmarks/LookUpTableBenchmark.py

+from bofire.data_models.domain.api import Domain
+
+
+class LookUpTableBenchmark(Benchmark):


Suggested change

class LookUpTableBenchmark(Benchmark):

class LookupTableBenchmark(Benchmark):

It is one word "lookup".

jduerholt · 2023-08-25T07:28:57Z

bofire/benchmarks/LookUpTableBenchmark.py

+        location = []
+        for i in X.index:
+            condition = np.ones(len(self.LookUpTable), dtype=bool)
+            for k in self.domain.inputs.get_keys():


This looping over the single entries is kind of cumbersome. Try something like this:

import pandas as pd # create a sample dataframe df = pd.DataFrame({'A': [1, 2, 3], 'B': ['a', 'b', 'c'], 'C': [True, False, True]}) # create a sample series s = pd.Series([2, 'b', False]) # check if the values of the series show up in a row of the dataframe mask = df.isin(s) # get the index of the rows that match the values in the series index = mask.index[mask.all(axis=1)] print(index)

If it matches all values, you can then use the index to return via loc the correct output values.

isin() does not work with input keys having similar categories. I used pd.merge which is also a nice substitute for for loops

jduerholt · 2023-08-25T07:30:32Z

bofire/benchmarks/LookUpTableBenchmark.py

+        X_temp = self.LookUpTable.loc[location]
+        X_temp.index = pd.RangeIndex(len(X_temp))
+        Y = pd.DataFrame()
+        for k in self.domain.outputs.get_keys():


The dataframe which is read in in the first place should already have the valid keywords. When you then have the complete set of matching indices, you can just loc the columns and rows that you need and return them with resetted index.

jduerholt · 2023-08-28T07:50:51Z

bofire/benchmarks/LookupTableBenchmark.py

+                df._merge == "left_only", df.columns != "_merge"
+            ].proxy_index.to_list()
+            raise ValueError(f"Input combination {indices} not found in Look up table")
+        Y = X_temp[self.domain.outputs.get_keys()]


Do it in one line ;)

…to molecule

jduerholt

Looks good. Thanks!

swagataroy123 and others added 8 commits August 23, 2023 13:53

Test Bo on molecules

764babb

Merge branch 'experimental-design:main' into molecule

f044fb2

check path

a6a1b3a

change path

189affc

test path in git

b294e5e

test path in git

e9442c7

changed path

4611394

full file path

858acfc

jduerholt requested changes Aug 23, 2023

View reviewed changes

swagataroy123 added 2 commits August 24, 2023 17:55

changes

90f55b7

changes with git path

f73146e

jduerholt requested changes Aug 25, 2023

View reviewed changes

swagataroy123 and others added 6 commits August 25, 2023 14:47

changes 2

451e743

changes 2.1

58767a4

Merge branch 'experimental-design:main' into molecule

02e97d7

Rename LookUpTableBenchmark.py to LookupTableBenchmark.py

cf6e02e

Rename test_LookUpTable_benchmark.py to test_LookupTable_benchmark.py

07bf62b

Merge branch 'experimental-design:main' into molecule

9e6e158

jduerholt reviewed Aug 28, 2023

View reviewed changes

swagataroy123 and others added 9 commits August 28, 2023 10:13

Merge branch 'experimental-design:main' into molecule

668e080

changes final 1.0

32805f3

Merge branch 'molecule' of https://github.com/swagataroy123/bofire in…

0b8c724

…to molecule

lint check on json file

e73ad03

pyright changes

a7fdff9

pyright

6f315d6

pyright 2.0

f66c09c

pyright 3.0

ce77c5a

changes final

fcab950

jduerholt approved these changes Aug 28, 2023

View reviewed changes

jduerholt merged commit b7fc7a7 into experimental-design:main Aug 28, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Bayesian optimization on molecules test #268

Bayesian optimization on molecules test #268

swagataroy123 commented Aug 23, 2023

jduerholt left a comment

jduerholt Aug 23, 2023

jduerholt Aug 23, 2023

jduerholt Aug 23, 2023

jduerholt Aug 23, 2023

jduerholt Aug 23, 2023

jduerholt left a comment

jduerholt Aug 25, 2023

jduerholt Aug 25, 2023

jduerholt Aug 25, 2023

swagataroy123 Aug 25, 2023

jduerholt Aug 25, 2023

jduerholt Aug 28, 2023

jduerholt left a comment

		from bofire.data_models.objectives.api import MaximizeObjective


		class Molecule_benchmark(Benchmark):

		from bofire.data_models.domain.api import Domain


		class LookUpTableBenchmark(Benchmark):

Bayesian optimization on molecules test #268

Bayesian optimization on molecules test #268

Conversation

swagataroy123 commented Aug 23, 2023

jduerholt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jduerholt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jduerholt left a comment

Choose a reason for hiding this comment