This repository provides code to accompany the paper Tanimoto Random Features for Scalable Molecular Machine Learning published at NeurIPS 2023. It contains a minimal python implementation of the methods described in the paper, code to reproduce the experimental results, all numerical results presented in the paper, and code to reproduce the plots.
The purpose of this code is to reproduce the results of the paper in a simple way, not to provide the best possible Tanimoto random features. This means:
- The code will not be updated with future improvements to the method (e.g. newer, more accurate random features), since such improvements are not part of the original paper.
- If you wish to deploy Tanimoto random features in practice, you should probably modify/improve this code (even though it could be a good starting point).
If you use this code or wish to deploy it, feel free to contact Austin via email or open a GitHub issue.
The layout of this repository is as follows:
trf23/
: a minimal python package implementing Tanimoto random features and Tanimoto Gaussian processestests/
: code to testtrf23
experiment_scripts/
: main python scripts to run experimentsofficial_results/
: the results of all experiments performed in this paperplotting_scripts/
: python scripts to make plots
First, set up a python environment. We provide two files to help with this:
environment.yml
: a minimal conda environment fileenvironment-exact.yml
: the exact conda environment used to run the experiments
The easiest thing to do is create a new conda
environment:
conda env create -f environment.yml
conda activate trf23
However, feel free to set up the environment any way you like: the results should not be highly dependent on which exact versions of the packages are used. The remaining instructions assume you have a python environment set up.
To check your python environment, you can run tests.
To reproduce the random feature analysis, run:
bash run_random_feature_analysis.sh
This will write outputs to results/random_feature_analysis
The arguments for the regression experiments are more complicated, so we provide a python script which prints the commands to launch the experiments. The experiments can be run in parallel, for example by running:
python print_regression_expt_commands.py | xargs -I {} -P 2 bash -c {}
This will write outputs to results/regression
Similar to the regression experiments, these experiments can be launched by running a script to print commands:
bash print_bo_expt_commands.py | xargs -I {} -P 2 bash -c {}
Plots can be generated using the following commands:
python plotting_scripts/random_feature_analysis.py --results_dir official_results/random_feature_analysis --output_dir plots/ # random features
python plotting_scripts/tabulate_regression_results.py --results_dir official_results/regression --output_dir plots/ # regression
python plotting_scripts/bo_experiments.py --results_dir official_results/bo/F2 --output_dir plots/ # BO
If you find this work useful we would appreciate a citation! Until the NeurIPS 2023 proceedings are released, feel free to cite our arXiv version:
@article{tripp2023tanimoto,
title={Tanimoto Random Features for Scalable Molecular Machine Learning},
author={Tripp, Austin and Bacallado, Sergio and Singh, Sukriti and Hern{\'a}ndez-Lobato, Jos{\'e} Miguel},
journal={arXiv preprint arXiv:2306.14809},
year={2023}
}
Although this repo is unlikely to be actively developed, we nonetheless encourage the use of pre-commit and testing.
Use pre-commit to enforce formatting, large file checks, etc.
If not already installed in your environment, run:
conda install pre-commit
To install the precommit hooks:
pre-commit install
We use pytest
to run tests.
Install pytest and run:
python -m pytest tests/