Skip to content

Latest commit

 

History

History
68 lines (46 loc) · 2.24 KB

README.md

File metadata and controls

68 lines (46 loc) · 2.24 KB

AlphaFold

See FASRC Docs

Running AlphaFold Examples

We recommend running AlphaFold on GPU partitions because it runs much faster than solely using CPUs -- due to AlphaFold's GPU optimization. See slurm partitions for the specifics of each partition.

Below you will find a slurm script example run_alphafold.sh that uses the fasta file 5ZE6_1.fasta. This example assumes that run_alphafold.sh and my_fasta are located in the same directory. If they are located in different directories, you will have to edit my_fasta_path.

You will have to edit in the run_alphafold.sh script:

  • SBATCH directives to suit your needs (e.g. time -t, number of cores -c, amount of memory --mem)
  • my_fasta to suit to your own fasta file
  • (optional) my_output_dir if you would like your output to go somewhere else
  • (optional) my_fasta_path

Note: AlphaFold screen output goes to the stderr file (.err) rather than the stdout file (.out).

Monomer batch job

This example takes about 1 hour to run on Cannon in the gpu partition with 8 cores (-c 8).

Slurm script

#!/bin/bash
#SBATCH -J AF_monomer # Job name
#SBATCH -p gpu # Partition(s) (separate with commas if using multiple)
#SBATCH --gres=gpu:1 # number of GPUs
#SBATCH -c 8 # Number of cores
#SBATCH -t 03:00:00 # Time (D-HH:MM:SS)
#SBATCH --mem=60G # Memory
#SBATCH -o AF_mono_%j.out # Name of standard output file
#SBATCH -e AF_mono_%j.err # Name of standard error file
# set fasta file name
# NOTE: assumes this is in the directory you are running this script in
# note that you can run multiple proteins _sequentially_ (with the same model type)
# the names need to be provided as "protein1.fasta,protein2.fasta"
# if running multimer, provide one multifasta file
# indicate oligomeric state by including extra copies of a sequence
# they still require different _names_ though
my_fasta=5ZE6_1.fasta
# create and set path of output directory
my_output_dir=output
mkdir -p $my_output_dir
# set model type (monomer, multimer, monomer_casp14, monomer_ptm)
# see notes under fasta file if running multimer
my_model_type=monomer
# max pdb age
# use if you want to avoid recent templates
# format yyyy-mm-dd
my_max_date="2100-01-01"
# run AlphaFold monomer using Singularity
singularity run --nv --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0,OPENMM_CPU_THREADS=$SLURM_CPUS_PER_TASK,LD_LIBRARY_PATH=/usr/local/cuda-11.1/targets/x86_64-linux/lib/ --bind /n/holylfs04-ssd2/LABS/FAS/alphafold_database:/data /n/singularity_images/FAS/alphafold/alphafold_2.3.1.sif \
--data_dir=/data/ \
--bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--db_preset=full_dbs \
--fasta_paths=$my_fasta \
--max_template_date=$my_max_date \
--mgnify_database_path=/data/mgnify/mgy_clusters_2022_05.fa \
--model_preset=$my_model_type \
--obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat \
--output_dir=$my_output_dir \
--pdb70_database_path=/data/pdb70/pdb70 \
--template_mmcif_dir=/data/pdb_mmcif/mmcif_files \
--uniref30_database_path=/data/uniref30/UniRef30_2021_03 \
--uniref90_database_path=/data/uniref90/uniref90.fasta \
--use_gpu_relax=True

Fasta file

>5ZE6_1
MNLEKINELTAQDMAGVNAAILEQLNSDVQLINQLGYYIVSGGGKRIRPMIAVLAARAVGYEGNAHVTIAALIEFIHTATLLHDDVVDESDMRRGKATANAAFGNAASVLVGDFIYTRAFQMMTSLGSLKVLEVMSEAVNVIAEGEVLQLMNVNDPDITEENYMRVIYSKTARLFEAAAQCSGILAGCTPEEEKGLQDYGRYLGTAFQLIDDLLDYNADGEQLGKNVGDDLNEGKPTLPLLHAMHHGTPEQAQMIRTAIEQGNGRHLLEPVLEAMNACGSLEWTRQRAEEEADKAIAALQVLPDTPWREALIGLAHIAVQRDR

Multimer batch job

This example takes about 1-2 hours to run on Cannon in the gpu partition with 8 cores (-c 8).

Slurm script

#!/bin/bash
#SBATCH -J AF_multimer # Job name
#SBATCH -p gpu # Partition(s) (separate with commas if using multiple)
#SBATCH --gres=gpu:1 # number of GPUs
#SBATCH -c 8 # Number of cores
#SBATCH -t 03:00:00 # Time (D-HH:MM:SS)
#SBATCH --mem=60G # Memory
#SBATCH -o AF_multi_%j.out # Name of standard output file
#SBATCH -e AF_multi_%j.err # Name of standard error file
# set fasta file name
# NOTE: assumes this is in the directory you are running this script in
# note that you can run multiple proteins _sequentially_ (with the same model type)
# the names need to be provided as "protein1.fasta,protein2.fasta"
# if running multimer, provide one multifasta file
# indicate oligomeric state by including extra copies of a sequence
# they still require different _names_ though
my_fasta=T1083_T1084.fasta
# set fasta-specific subfolder and filepath
# handling different possible .fasta suffixes
fasta_name="${my_fasta//.fasta}"
fasta_name="${fasta_name//.faa}"
fasta_name="${fasta_name//.fa}"
mkdir -p $fasta_name
cp $my_fasta $PWD/$fasta_name
my_fasta_path=$PWD/$fasta_name/$my_fasta
# create and set path of output directory
my_output_dir=af2_out
mkdir -p $PWD/$fasta_name/$my_output_dir
my_output_dir_path=$PWD/$fasta_name/$my_output_dir
# set model type (monomer, multimer, monomer_casp14, monomer_ptm)
# see notes under fasta file if running multimer
my_model_type=multimer
# max pdb age
# use if you want to avoid recent templates
# format yyyy-mm-dd
my_max_date="2100-01-01"
# run AlphaFold multimer using Singularity
singularity run --nv --env TF_FORCE_UNIFIED_MEMORY=1,XLA_PYTHON_CLIENT_MEM_FRACTION=4.0,OPENMM_CPU_THREADS=$SLURM_CPUS_PER_TASK,LD_LIBRARY_PATH=/usr/local/cuda-11.1/targets/x86_64-linux/lib/ --bind /n/holylfs04-ssd2/LABS/FAS/alphafold_database:/data -B .:/etc --pwd /app/alphafold /n/singularity_images/FAS/alphafold/alphafold_2.3.1.sif \
--data_dir=/data/ \
--bfd_database_path=/data/bfd/bfd_metaclust_clu_complete_id30_c90_final_seq.sorted_opt \
--db_preset=full_dbs \
--fasta_paths=$my_fasta_path \
--max_template_date=$my_max_date \
--mgnify_database_path=/data/mgnify/mgy_clusters_2022_05.fa \
--model_preset=$my_model_type \
--obsolete_pdbs_path=/data/pdb_mmcif/obsolete.dat \
--output_dir=$my_output_dir_path \
--template_mmcif_dir=/data/pdb_mmcif/mmcif_files \
--uniref30_database_path=/data/uniref30/UniRef30_2021_03 \
--uniref90_database_path=/data/uniref90/uniref90.fasta \
--pdb_seqres_database_path=/data/pdb_seqres/pdb_seqres.txt \
--uniprot_database_path=/data/uniprot/uniprot.fasta \
--use_gpu_relax=True

Fasta file

>T1083
GAMGSEIEHIEEAIANAKTKADHERLVAHYEEEAKRLEKKSEEYQELAKVYKKITDVYPNIRSYMVLHYQNLTRRYKEAAEENRALAKLHHELAIVED
>T1084
MAAHKGAEHHHKAAEHHEQAAKHHHAAAEHHEKGEHEQAAHHADTAYAHHKHAEEHAAQAAKHDAEHHAPKPH

Submitting a slurm batch job that runs AlphaFold

Log in to Cannon (see login instructions). Go to the directory where run_alphafold.sh is located. Then submit a slurm batch job with the command:

# monomer job
sbatch run_alphafold.sh

# multimer job
sbatch run_alphafold_multi.sh