Skip to content

Ultra-fast in-silico structure mutation

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE_RoseTTAFold
Notifications You must be signed in to change notification settings

kWeissenow/EMBER3D

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

22 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

EMBER3D

EMBER

Konstantin Weissenow, Michael Heinzinger, Burkhard Rost

Technical University of Munich

This repository contains the code for the EMBER3D protein structure and mutation effect prediction system. EMBER3D is currently provided as a prototype release for preview purposes. The system is still under active development.

A Google Colab notebook for structure prediction and the rendering of protein mutation movies (PMM) can be found here.

System requirements & installation

Please install EMBER3D on a Linux machine.

Create a new virtual environment, e.g. using conda:

conda create -n EMBER3D python=3.8
conda activate EMBER3D

If you use CUDA 11, you can use the provided requirements.txt to install dependencies (taking a couple of minutes on a normal desktop computer, depending on internet connectivity):

pip install -r requirements.txt

If you plan to render protein mutation movies, you additionally need PyMOL and ffmpeg, which you can install with e.g.

conda install -c conda-forge ffmpeg
conda install -c conda-forge pymol-open-source

Note for different CUDA versions: We currently don't yet provide package lists for different CUDA versions. If you use a different version, please use pip or conda to install the following packages:

torch (1.11)
dgl
pyg (aka torch-geometric)
e3nn
psutil
transformers
sentencepiece
biopython
matplotlib

Regular prediction mode

You can compute structure predictions based on FASTA sequences using

python predict.py -i <FASTA> -o <OUTPUT_DIRECTORY>

The ProtT5 protein language model used to generate sequence embeddings will be downloaded on first use (2.3 GB) and stored by default in the directory 'ProtT5-XL-U50'. You can change this directory with the --t5_model parameter. By default, the script will produce PDB files and distance maps. You can disable outputs using the parameters --no-pdb and --no-distance-maps respectively.

Predictions for average-length protein sequences take less than a second, but the initial model loading causes a one-time cost of several seconds (depending on system speed). For efficiency, provide a single FASTA file with multiple sequences instead of calling the script multiple times with single-sequence inputs.

Mutation effect prediction

You can predict structures for all single amino-acid variants (SAVs) for the sequence(s) in a FASTA file using

python predict_sav.py -i <FASTA> -o <OUTPUT_DIRECTORY>

In addition to PDB files and distance maps, the SAV prediction script computes the structural difference between predictions for each mutant and the wild-type measured in lDDT (1.0 = most similar, 0.0 = least similar). These structure deltas are both rendered as an image (mutation_matrix.png) as well as provided as a text file for downstream consumption (mutation_log.txt).

Mutation movie rendering

From previously computed SAV predictions (see above), you can render movies using

python render_mutation_movie.py <FASTA> <OUTPUT_DIRECTORY>

Alternatively, you can do both steps (prediction + movie rendering) at once using

./create_SAV_movie.sh <FASTA> <OUTPUT_DIRECTORY>
3KDE_3_C.mp4

Webserver mode

You can run a simple webserver for the visualization of predictions by starting

python webserver.py

and directing your browser at http://localhost:24398/ or using the address of the machine the server is running on. You can change the default port number using the -d parameter when starting the webserver.

Acknowledgements

We reused several modules of the RoseTTAFold architecture. We use the SE(3)-Transformer implementation from NVIDIA.

Citing

For now, please cite this work as follows:

@article {Weissenow2022.11.14.516473,
	author = {Weissenow, Konstantin and Heinzinger, Michael and Steinegger, Martin and Rost, Burkhard},
	title = {Ultra-fast protein structure prediction to capture effects of sequence variation in mutation movies},
	elocation-id = {2022.11.14.516473},
	year = {2022},
	doi = {10.1101/2022.11.14.516473},
	publisher = {Cold Spring Harbor Laboratory},
	URL = {https://www.biorxiv.org/content/early/2022/11/16/2022.11.14.516473},
	eprint = {https://www.biorxiv.org/content/early/2022/11/16/2022.11.14.516473.full.pdf},
	journal = {bioRxiv}
}

About

Ultra-fast in-silico structure mutation

Resources

License

MIT, MIT licenses found

Licenses found

MIT
LICENSE
MIT
LICENSE_RoseTTAFold

Stars

Watchers

Forks

Packages

No packages published