SuperFold is a derivative of AlphaFold2, AlphaFold-Multimer, and ColabFold with some novel improvements. It is intended only for running single-sequence predictions very quickly and takes advantage of some time and memory saving hacks found by krypton and others in the community. This package is also intended only for use by IPD labs and was written with our computing resources (digs, janelia, perlmutter, hyak [todo]) in mind.
Any publication that discloses findings arising from using this source code should contact the contributors for
citation or authorhip agreements, as some relevant code (for instance, the initial_guess
functionality) may
need to be published beforehand or concurrently since they constitute novel work.
TODO
- Download this git repo
$git clone git@github.com:rdkibler/superfold.git
$cd superfold
- Download the alphafold weights or find an existing path to the weights
$realpath /path/to/alphafold_weights/ > alphafold_weights.pth
. "params/" should be a child dir of the alphafold_weights dir- (optional, if you don't want to install pyrosetta)
$conda config --add channels https://username:password@conda.graylab.jhu.edu
use the username and password provided by Comotion when you licensed it. This is technically optional because we don't currently use pyrosetta because of silent_tools, so you could remove the pyrosetta line from the .yml and be fine $conda create --name pyroml --file pyroml.yml
$source activate ~/.conda/envs/pyroml
$pip install absl-py dm-tree tensorflow ml-collections
$pip install --upgrade "jax[cuda]" -f https://storage.googleapis.com/jax-releases/jax_releases.html
- `$pip install dm-haiku
- Compile MMalign
While the AlphaFold code is licensed under the Apache 2.0 License, the AlphaFold parameters are made available for non-commercial use only under the terms of the CC BY-NC 4.0 license. Please see the Disclaimer below for more detail.
The AlphaFold parameters are available from
https://storage.googleapis.com/alphafold/alphafold_params_2021-10-27.tar, and
are downloaded as part of the scripts/download_all_data.sh
script. This script
will download parameters for:
- 5 models which were used during CASP14, and were extensively validated for structure prediction quality (see Jumper et al. 2021, Suppl. Methods 1.12 for details).
- 5 pTM models, which were fine-tuned to produce pTM (predicted TM-score) and (PAE) predicted aligned error values alongside their structure predictions (see Jumper et al. 2021, Suppl. Methods 1.9.7 for details).
- 5 AlphaFold-Multimer models that produce pTM and PAE values alongside their structure predictions.
Place a path to the directory containing the model parameters in a file called
alphafold_weights.pth
The outputs will be saved in the directory provided via the --output_dir
flag
(defaults to output/
). The outputs include the unrelaxed structure, the relaxed structure
if the --amber_relax
flag is used, a reports.txt
metadata summary for all predictions if the --summarize
flag is used
, and prediction metadata for each prediction in individual .json files.
The contents of each output file are as follows:
*_unrelaxed.pdb
– A PDB format text file containing the predicted structure with chain IDs rearranged to best match that of the input PDB file, if provided. The b-factor column contains the per-residue pLDDT scores ranging from0
to100
, where100
means most confident. Note that, because low is better for real experimental B-factor, care must be taken when running applications that interpret the B-factor column, such as molecular replacement.*_relaxed*.pdb
– A PDB format text file containing the predicted structure, after performing an Amber relaxation procedure on the unrelaxed structure prediction (see Jumper et al. 2021, Suppl. Methods 1.8.6 for details). The chain IDs are rearranged and the b-factor column is filled like for the*_unrelaxed.pdb
files.*_prediction_results.json
– A JSON format text file containing the times taken to run each section of the AlphaFold pipeline.- Mean pLDDT scores in
mean_plddt
serve as an overall per-target monomer confidence score and is the average over all per-residue plddts per target. The range of possible values is from0
to100
, where100
means most confident). - Present only if using pTM models: predicted TMalign-score (
pTMscore
field contains a scalar). As a predictor of a global superposition metric, this score is designed to also assess whether the model is confident in the overall domain packing. - Present only if using the
--output_pae
flag and using pTM models: predicted pairwise aligned errors.pae
contains a NumPy array of shape [N_res, N_res] with the range of possible values from0
tomax_predicted_aligned_error
, where0
means most confident). This can serve for a visualisation of domain packing confidence within the structure. - Present only if using pTM models: predicted pairwise aligned error summaries.
mean_pae
: the mean of the entire [N_res, N_res] pae matrixpae_intrachain_X
: The mean of the pae matrix between residues corresponding to chain Xpae_interchain_XY
: The mean of the pae matrix between residues corresponding to chain X going to residues corresponding to chain Y and vise versa
- Mean pLDDT scores in
- Ryan Kibler
- Adam Broerman
- Phil Leung
SuperFold communicates with and/or references the following separate libraries and packages:
- Abseil
- Biopython
- Chex
- Haiku
- Immutabledict
- JAX
- matplotlib
- ML Collections
- NumPy
- OpenMM
- OpenStructure
- pandas
- pymol3d
- SciPy
- Sonnet
- TensorFlow
- Tree
- tqdm
- MMalign
We thank all their contributors and maintainers!
This is not an officially supported Google product.
Copyright 2021 DeepMind Technologies Limited.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at https://www.apache.org/licenses/LICENSE-2.0.
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.
The AlphaFold parameters are made available for non-commercial use only, under the terms of the Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0) license. You can find details at: https://creativecommons.org/licenses/by-nc/4.0/legalcode
Use of the third-party software, libraries or code referred to in the Acknowledgements section above may be governed by separate terms and conditions or license provisions. Your use of the third-party software, libraries or code is subject to any such terms and you should check that you can comply with any applicable restrictions or terms and conditions before use.
MM-align (Version 20210816): complex structure alignment
References: S Mukherjee, Y Zhang. Nucl Acids Res 37(11):e83 (2009)
Please email comments and suggestions to yangzhanglab@umich.edu