samplchallenges · davidlmobley · Dec 1, 2021 · Dec 1, 2021
diff --git a/host_guest/Analysis/Submissions/WP6-ponder.txt b/host_guest/Analysis/Submissions/WP6-ponder.txt
@@ -0,0 +1,108 @@
+#
+# Results for SAMPL9 WP6 Host-Guest Challenge
+#
+# PREDICTIONS
+#
+Predictions:
+WP6-G1,   -5.56, 0.07, 2.0,,,
+WP6-G2,  -11.57, 0.08, 2.0,,,
+WP6-G3,   -6.13, 0.05, 2.0,,,
+WP6-G4,   -4.75, 0.05, 2.0,,,
+WP6-G5,   -5.05, 0.09, 2.0,,,
+WP6-G6,   -5.40, 0.06, 2.0,,,
+WP6-G7,   -4.93, 0.05, 2.0,,,
+WP6-G8,   -1.20, 0.07, 2.0,,,
+WP6-G9,   -4.24, 0.06, 2.0,,,
+WP6-G10,  -9.47, 0.07, 2.0,,,
+WP6-G11,  -5.57, 0.05, 2.0,,,
+WP6-G12, -10.97, 0.10, 2.0,,,
+WP6-G13, -15.33, 0.06, 2.0,,,
+
+#
+# PARTICIPANT NAME
+#
+Participant name:
+Jay Ponder
+
+#
+# PARTICIPANT ORGANIZATION
+#
+Participant organization:
+Washington University in St. Louis
+
+#
+# NAME OF METHOD
+#
+Name:
+DDM/AMOEBA/BAR
+
+#
+# SOFTWARE
+#
+Software:
+Tinker8 V8.10 (CPU)
+Tinker9 V1.0 (GPU)
+Psi4 V1.4
+
+#
+# METHODOLOGY
+#
+Method:
+We have computed absolute binding free energies for all the WP6 host-guest
+systems via explicit solvent all-atom molecular dynamics simulations using
+a standard double decoupling protocol and the polarizable atomic multipole
+AMOEBA force field. All simulations were performed with the Tinker8 and
+Tinker9 software running on CPUs and GPUs, respectively. All calculations
+used the AMOEBA force field. AMOEBA parameters were generated manually by
+members of the Ponder lab, or via the AMOEBA FORGE parameterization engine
+developed by Chris Ho in collaboration with the Ponder lab. Our standard
+parameterization protocols and guidelines from the published literature
+were followed. Each guest was modeled as a either a mono-cation or di-cation
+as appropriate. The WP6 host was parameterized as all carboxylates and with
+a total charge of -12. A 1:1 stoichiometry was assumed for each complex.
+
+For each guest, a series of MD simulations were performed starting from the
+guest in water (solvation leg) and from the host-guest complex in water
+(host-guest leg). In both legs a series windows were used to first annihilate
+electrostatics in the guest, followed by decoupling of guest vdw interactions.
+The calculations were performed on initial 50 Ang cubic systems under the NPT
+ensemble, and with twelve chloride ions added to the solvation simulations
+to match the net charge of the host in the host-guest simulations. All of the
+simulations used PME for long range electrostatics, and a 9 Ang cutoff on vdw
+terms incremented by an isotropic vdw long range correction. A two-stage
+RESPA-style integrator was used for the MD with a 2 fs outer time step. MD
+trajectory snapshots were saved every 1 ps. For host-guest MD windows, a
+single flat-bottomed harmonic distance restraint between groups of atoms
+was used to maintain binding of the guest. These restraints were chosen such
+that they were not violated during unrestrained simulations runs on the bound
+host-guest complex.
+
+Each sampling window was simulated for 10 ns and the initial 1 ns was
+discarded as equilibration. The production simulations beyond the initial
+1 ns were then analyzed using the standard BAR method between adjacent
+windows to compute free energy differences. The difference between the
+sum of the solvation and host-guest legs, after analytical correction of
+the host-guest sum for release of the flat-bottomed harmonic restraint,
+was taken as the binding energy estimate. Statistical error was estimated
+for each BAR calculation, using the analytical formula suggested in Bennett's
+original paper on the BAR method. These errors were combined to get a total
+statistical error for each overall binding free energy prediction.
+
+Alternative binding poses were explored via initial simulations or full
+binding free energy calculations for some guests. For guests 3 and 12,
+which are chiral, we computed the binding energy of each guest enantiomer to
+a single enantiomer of the host. For both of these guests the binding free
+energy was similar for the two stereoisomers, and we have chosen to report
+the value for the tighter binding stereoisomer.
+
+#
+# METHOD CATEGORY
+#
+Category:
+Alchemical
+
+#
+# RANKED PREDICTION
+#
+Ranked:
+True
diff --git a/host_guest/Analysis/Submissions/WP6-voelz-lab_EE_RL_8_unranked.txt b/host_guest/Analysis/Submissions/WP6-voelz-lab_EE_RL_8_unranked.txt
diff --git a/host_guest/Analysis/Submissions/WP6-voelz-lab_EE_all_data_unranked.txt b/host_guest/Analysis/Submissions/WP6-voelz-lab_EE_all_data_unranked.txt
diff --git a/host_guest/Analysis/Submissions/WP6-voelz-lab_EE_ranked.txt b/host_guest/Analysis/Submissions/WP6-voelz-lab_EE_ranked.txt
diff --git a/host_guest/Analysis/Submissions/WP6_submissions.txt b/host_guest/Analysis/Submissions/WP6_submissions.txt
@@ -0,0 +1,115 @@
+# Results for WP6
+#
+# This file will be automatically parsed.  It must contain the following seven elements:
+# predictions, participant name, participant organization, name of method, software listing, method, method category, and ranked.
+# These elements must be provided in the order shown.
+# The file name must begin with the word "WP6" and then be followed by an underscore or dash.
+#
+# FILE FORMAT: All comment lines in this file (which begin with #) will be ignored.
+# Please use only UTF-8 characters in the non-comment fields. If your information (e.g. your name, etc.)
+# contains a non-UTF-8 character, you may note it in comments near that entry.
+#
+#
+# PREDICTIONS
+# Please explicitly describe how you handle ions and pKa effects.
+#
+# The data in each prediction line should be structured as follows, with all (up to six) numbers in kcal/mol.
+# host-guest ID (note that the host varies!), Free energy, free energy SEM, free energy model uncertainty,
+# enthalpy, enthalpy SEM, enthalpy model uncertainty
+# The free energy, free energy SEM, and free energy model uncertainty are REQUIRED.
+# The corresponding quantities for binding enthalpy are optional.
+#
+# Note that the "model uncertainty" should be your estimate of ACCURACY of this particular approach
+# for the compound considered.
+#
+#
+# The list of predictions must begin with the "Prediction:" keyword, as illustrated here.
+Predictions:
+WP6-G1, -10.2, 0.1, 1.0
+WP6-G2, -12.5, 0.1, 1.0
+WP6-G3, -9.5, 0.1, 1.0
+WP6-G5, -11.5, 0.1, 1.0
+WP6-G6, -9.8, 0.1, 1.0
+WP6-G7, -7.8, 0.1, 1.0
+WP6-G8, -6.9, 0.1, 1.0
+WP6-G9, -6.2, 0.1, 1.0
+WP6-G10, -10.7, 0.1, 1.0
+WP6-G11, -8.7, 0.1, 1.0
+WP6-G12, -11.8, 0.1, 1.0
+WP6-G13, -10.2, 0.1, 1.0
+
+#
+#
+# Please list your name, using only UTF-8 characters as described above. The "Participant name:" entry is required.
+Participant name:
+Xibing He
+#
+#
+# Please list your organization/affiliation, using only UTF-8 characters as described above.
+Participant organization:
+University of Pittsburgh
+#
+#
+# Please provide a brief (40 character limit) informal yet informative name of the method used.
+# If using an MD-based method we suggest using the format: Method/EnergyModel/WaterModel/Sampling/[Additional-details-here] , though your name must respect the 40 character limit.
+# otherwise you may create your own following the sample text; please edit to your taste.
+# The "Name:" keyword is required, as shown here.
+# 40 character limit.
+Name:
+ELIE/GAFF2-ABCG2/TIP3P/MD/MMPBSA
+#
+# All major software packages used and their versions
+# Following is sample text; please edit to your taste.
+# The "Software:" keyword is required.
+Software:
+Amber 18
+#
+# Methodology and computational details.
+# Level of detail should be at least that used in a publication.
+# Please include the values of key parameters, with units, and explain how any
+# statistical uncertainties were estimated.
+# Use as many lines of text as you need.
+# Please explicitly describe how you handle ions (e.g. counterions) and pKa effects
+# Following is sample text; please edit to your taste.
+# All text following the "Method:" keyword will be regarded as part of your free text methods description.
+Method:
+All MD simulations were performed with the pmemd.cuda program from
+AMBER18, then in-house programs/scripts were used for MM-PBSA 
+calculation and ELIE fitting.
+
+Each MD simulation last for 100 ns time. The bonded and initial Lennard-Jones
+parameters were obtained from GAFF2. Partial atomic charges were
+generated with out newly developed ABCG2 methods.
+Sodium or chloride counterions, were added only as needed to
+neutralize the total charge of each host-guest system; no additional
+counterions were added. The starting structures were obtained by
+docking from Glide initially, but were relaxed through short preliminary
+MD simulations. Each system was solvated with ~1900 TIP3P waters in an
+orthorhombic box whose dimensions were approximately 43 x 43 x 43
+cubic Angstroms.
+
+Production simulations were run in the NPT ensemble, with temperature
+control using a Langevin thermostat with collision frequency 5.0 ps-1
+and pressure control provided by the default barostat. Direct
+space nonbonded interactions were truncated with a 10.0 Angstrom cutoff,
+whereas long-range electrostatics were handled with the PME method,
+using default AMBER settings. SHAKE constraints were applied to bonds
+involving hydrogen, and the simulation time step was set to 2 fs.
+#
+#
+# METHOD CATEGORY SECTION
+#
+# State which method category your prediction method is better described as:
+# `Alchemical`, `Quantum`, `Other Physical` `Empirical`, `Mixed`, or `Other`.
+# Pick only one category label.
+# The `Category:` keyword is required.
+Category:
+Other Physical
+#
+# All submissions must either be ranked or non-ranked.
+# Only one ranked submission per participant is allowed.
+# Multiple ranked submissions from the same participant will not be judged.
+# Non-ranked submissions are accepted so we can verify that they were made before the deadline.
+# The "Ranked:" keyword is required, and expects a Boolean value (True/False)
+Ranked:
+True
diff --git a/host_guest/Analysis/Submissions/WP6_submissions_dserillon.txt b/host_guest/Analysis/Submissions/WP6_submissions_dserillon.txt
@@ -0,0 +1,123 @@
+# Results for WP6
+#
+# This file will be automatically parsed.  It must contain the following seven elements:
+# predictions, participant name, participant organization, name of method, software listing, method, method category, and ranked.
+# These elements must be provided in the order shown.
+# The file name must begin with the word "WP6" and then be followed by an underscore or dash.
+#
+# FILE FORMAT: All comment lines in this file (which begin with #) will be ignored.
+# Please use only UTF-8 characters in the non-comment fields. If your information (e.g. your name, etc.)
+# contains a non-UTF-8 character, you may note it in comments near that entry.
+#
+#
+# PREDICTIONS
+# Please explicitly describe how you handle ions and pKa effects.
+#
+# The data in each prediction line should be structured as follows, with all (up to six) numbers in kcal/mol.
+# host-guest ID (note that the host varies!), Free energy, free energy SEM, free energy model uncertainty,
+# enthalpy, enthalpy SEM, enthalpy model uncertainty
+# The free energy, free energy SEM, and free energy model uncertainty are REQUIRED.
+# The corresponding quantities for binding enthalpy are optional.
+#
+# Note that the "model uncertainty" should be your estimate of ACCURACY of this particular approach
+# for the compound considered.
+#
+#
+# The list of predictions must begin with the "Prediction:" keyword, as illustrated here.
+Predictions:
+WP6-G1, -8.47, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G2, -11.39, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G3, -7.84, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G4, -8.35, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G5, -5.20, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G6, -6.99, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G7, -7.64, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G8, -9.53, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G9, -8.81, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G10, -10.47, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G11, -7.16, 0.57, 0.61, 0.0,  0.0, 0.0
+WP6-G12, -6.38, 0.57, 0.61, 0.0, 0.0, 0.0
+WP6-G13, -10.17, 0.57, 0.61, 0.0, 0.0, 0.0
+
+#
+#
+# Please list your name, using only UTF-8 characters as described above. The "Participant name:" entry is required.
+Participant name:
+Dylan SERILLON
+#
+#
+# Please list your organization/affiliation, using only UTF-8 characters as described above.
+Participant organization:
+University of Barcelona
+#
+#
+# Please provide a brief (40 character limit) informal yet informative name of the method used.
+# If using an MD-based method we suggest using the format: Method/EnergyModel/WaterModel/Sampling/[Additional-details-here] , though your name must respect the 40 character limit.
+# otherwise you may create your own following the sample text; please edit to your taste.
+# The "Name:" keyword is required, as shown here.
+# 40 character limit.
+Name:
+MACHINE-LEARNING/NNET/DRAGON-descriptors
+#
+# All major software packages used and their versions
+# Following is sample text; please edit to your taste.
+# The "Software:" keyword is required.
+Software:
+xtb Version 6.1 
+Gromacs Versions 2018.1
+Open Babel 2.3.2
+AutoDock Vina 1.1.2
+MOE Version 2018
+chimera production version 1.13.1
+VMD for LINUXAMD64, version 1.9.3
+#
+# Methodology and computational details.
+# Level of detail should be at least that used in a publication.
+# Please include the values of key parameters, with units, and explain how any
+# statistical uncertainties were estimated.
+# Use as many lines of text as you need.
+# Please explicitly describe how you handle ions (e.g. counterions) and pKa effects
+# Following is sample text; please edit to your taste.
+# All text following the "Method:" keyword will be regarded as part of your free text methods description.
+Method:
+SUBMISSION #1  -- RANKED SUBMISSION -- NEURAL NETWORK
+
+
+Free binding energy prediction using machine learning methods:
+
+1) All the binding data from the Binding Database have been extracted and parsed. All the guest involved in the BindingDB and SAMPL(s) challenges are reconstructed in two steps from SMILES using obabel & CORINA : 
+(i) generating 150 3D conformers based on Genetic-Algorithm 
+(ii) and the selecting the lowest energy conformers. 
+This conformers are then minimized at semi empirical level using xtb-GFN2B giving us an optimized 3D structure. 
+The guest from SAMPL6 - SAMPL7 and SAMPL8-DRUG-ABUSE challenges are extracted from repository and only minimized at semi empirical level using xtb-GFN2B giving us an optimized 3D structure. 
+807 structures in total are extracted following this approach.
+
+The same methods is used to reconstruct the hosts. In total, 29 different HOSTs are extracted and constructed from SMILES provided by the binding-DB following the same protocol.    
+- For the hosts: BDBM197280, BDBM197287, BDBM197309, BDBM197310, BDBM36281 from the previous SAMPL challenges, SMILES reconstruction failed and we had to extract the 3D structure from different SAMPL-repositories followed by minimization at semi empirical level using xtb-GFN2B giving us an optimized 3D structure. 
+- BDBM36250 as well was impossible to reconstruct from SMILES, the cyclodextrine was so extracted from 4J3U pdb code that were structurally close, and manually modified with molecular builder, then minimized at semi empirical level with same procedure as before.
+2) For both host and Guest structures, the DRAGON molecular descriptors are calculated.
+3) The descriptors of the Guest-dataset and the Host-dataset are reduced separatly using the R software with different approaches: a) deleting the descriptors that have a near zero variance using Caret package ; b) deleting the most correlated descriptors using Caret package ; c) using principal component analysis (PCA) to combine descriptors that explain the most the variability.
+4) Host-dataset and Guest-dataset are merged to form the final dataset where each lines correspond to a guest interacting with a specific host.
+
+In order to predict the binding free energy, several machine learning models using regression are used: Neural network, knn, polynomial SVM and random forest. By modifying the parameters and using a repeated 10-fold cross validation on those ML models, thousands of different models are generated (respectively 186.500 nnet, 245.700 rf, 54.600 rf and 20 knn).
+Both 30/70, 25/75 and 15/85 data partition were tested and our prediction and 15/85 partition is the one selected for the prediction, resulting in a set of 687 cases for training and 120 cases for the test set.
+
+Our bestmodel, used to make predictions on SAMPL9, is a neural network using "nnet" function, which provided a TrainRMSE=0.57, TrainRsquare=0.92, TrainMAE=0.30 performances for trainingset and RMSE=0.61, Rsquare=0.93 and MAE=0.34 for the Testset, suggesting that the prediction is not excessively biased by overtraining.
+#
+#
+# METHOD CATEGORY SECTION
+#
+# State which method category your prediction method is better described as:
+# `Alchemical`, `Quantum`, `Other Physical` `Empirical`, `Mixed`, or `Other`.
+# Pick only one category label.
+# The `Category:` keyword is required.
+Category:
+machine learning
+#
+# All submissions must either be ranked or non-ranked.
+# Only one ranked submission per participant is allowed.
+# Multiple ranked submissions from the same participant will not be judged.
+# Non-ranked submissions are accepted so we can verify that they were made before the deadline.
+# The "Ranked:" keyword is required, and expects a Boolean value (True/False)
+Ranked:
+True