Tanimoto Random Features

This repository provides code to accompany the paper Tanimoto Random Features for Scalable Molecular Machine Learning published at NeurIPS 2023. It contains a minimal python implementation of the methods described in the paper, code to reproduce the experimental results, all numerical results presented in the paper, and code to reproduce the plots.

The purpose of this code is to reproduce the results of the paper in a simple way, not to provide the best possible Tanimoto random features. This means:

The code will not be updated with future improvements to the method (e.g. newer, more accurate random features), since such improvements are not part of the original paper.
If you wish to deploy Tanimoto random features in practice, you should probably modify/improve this code (even though it could be a good starting point).

If you use this code or wish to deploy it, feel free to contact Austin via email or open a GitHub issue.

Code overview

The layout of this repository is as follows:

trf23/: a minimal python package implementing Tanimoto random features and Tanimoto Gaussian processes
tests/: code to test trf23
experiment_scripts/: main python scripts to run experiments
official_results/: the results of all experiments performed in this paper
plotting_scripts/: python scripts to make plots

Running instructions

Experiments

First, set up a python environment. We provide two files to help with this:

environment.yml: a minimal conda environment file
environment-exact.yml: the exact conda environment used to run the experiments

The easiest thing to do is create a new conda environment:

conda env create -f environment.yml
conda activate trf23

However, feel free to set up the environment any way you like: the results should not be highly dependent on which exact versions of the packages are used. The remaining instructions assume you have a python environment set up.

To check your python environment, you can run tests.

Random feature analysis

To reproduce the random feature analysis, run:

bash run_random_feature_analysis.sh

This will write outputs to results/random_feature_analysis

Regression

The arguments for the regression experiments are more complicated, so we provide a python script which prints the commands to launch the experiments. The experiments can be run in parallel, for example by running:

python print_regression_expt_commands.py | xargs -I {} -P 2 bash -c {}

This will write outputs to results/regression

Bayesian optimization

Similar to the regression experiments, these experiments can be launched by running a script to print commands:

bash print_bo_expt_commands.py |  xargs -I {} -P 2 bash -c {}

Plotting

Plots can be generated using the following commands:

python plotting_scripts/random_feature_analysis.py --results_dir official_results/random_feature_analysis --output_dir plots/  # random features
python plotting_scripts/tabulate_regression_results.py --results_dir official_results/regression --output_dir plots/  # regression
python plotting_scripts/bo_experiments.py --results_dir official_results/bo/F2 --output_dir plots/  # BO

Citation

If you find this work useful we would appreciate a citation! Until the NeurIPS 2023 proceedings are released, feel free to cite our arXiv version:

@article{tripp2023tanimoto,
  title={Tanimoto Random Features for Scalable Molecular Machine Learning},
  author={Tripp, Austin and Bacallado, Sergio and Singh, Sukriti and Hern{\'a}ndez-Lobato, Jos{\'e} Miguel},
  journal={arXiv preprint arXiv:2306.14809},
  year={2023}
}

Development

Although this repo is unlikely to be actively developed, we nonetheless encourage the use of pre-commit and testing.

Formatting

Use pre-commit to enforce formatting, large file checks, etc.

If not already installed in your environment, run:

conda install pre-commit

To install the precommit hooks:

pre-commit install

Testing

We use pytest to run tests. Install pytest and run:

python -m pytest tests/

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Tanimoto Random Features

Code overview

Running instructions

Experiments

Random feature analysis

Regression

Bayesian optimization

Plotting

Citation

Development

Formatting

Testing

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
experiment_scripts		experiment_scripts
official_results		official_results
plots		plots
plotting_scripts		plotting_scripts
tests		tests
trf23		trf23
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
environment-exact.yml		environment-exact.yml
environment.yml		environment.yml
print_bo_expt_commands.py		print_bo_expt_commands.py
print_regression_expt_commands.py		print_regression_expt_commands.py
pyproject.toml		pyproject.toml
run_random_feature_analysis.sh		run_random_feature_analysis.sh

License

AustinT/tanimoto-random-features-neurips23

Folders and files

Latest commit

History

Repository files navigation

Tanimoto Random Features

Code overview

Running instructions

Experiments

Random feature analysis

Regression

Bayesian optimization

Plotting

Citation

Development

Formatting

Testing

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages