-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Docking studies on MurC ligase #46
Comments
@jhjensen2 Would be really interesting if you could include a variety of scoring functions. |
Do you mean scoring functions in Glide (like SP and HTVS), or other docking programs, like Vina? |
I was thinking of GNINA, RF-Score, X-score etc. We could also see if other groups would like to use this dataset to test other custom scoring functions. |
Ah, OK. Is the idea to get some kind of consensus model? |
Firstly, amazed by your work! Secondly, Using different scoring functions (rfscore, xscore, similar to Schrodinger's glides SP & XP) to evaluate the same dataset. So, yes, like a consensus model (not sure if I am using the right terminology, like a crossing docking?) And I remember the SMINA programme has rfscore functionality compiled in it. IDK if I am right, and hope this would help! And please bear my shallow understandings! |
smina does not have rfscore built-in, but it isn't hard to get rfscore working (https://github.com/oddt/rfscorevs). Do any of these compounds have known activity? If you grab the gnina binary (https://github.com/gnina/gnina/releases/download/v1.0.1/gnina) you can rescore your already docked poses with
Embedded in the output file will be the Vina default empirical score (minimizedAffinity), and estimates of the pose quality (CNNscore) and binding affinity (CNNaffinity) from a convolutional neural network (there is also a CNN_VS field which is CNNscore*CNNaffinity, but I don't know yet if this is a useful thing). This command will be much faster if run on a machine with a GPU, but a GPU is not required. It would also be interesting to redock the full set with GNINA. |
@jhjensen2 would it be possible to provide a link to a file containing the 260K structures that were docked? |
@drc007 The structures of the zinc database files used for docking are available here: https://www.dropbox.com/s/yzgv8emshj8391j/zinc_sdf.tar.bz2?dl=0 The enamine database structures are here: |
Great! @cstein do you also have the Enamine compounds? |
@dkoes none of the compounds have known activity against MurC AFAIK. The aim is to find possible alternative leads to the compound family found by AZ. We hope to identify good binders using genetic algorithms/docking but the advantage of the compounds in the Enamine set is that they are purchasable. |
I have updated my comment above with both structure files from the zinc database and the enamine database. |
Great work, great discussion all. We'll be discussing next steps Tue 6th July 2pm London if you'd like to join - see #47. Obvious question: the top scoring compounds identified from Zinc (v nice scores) - how do we most easily get our hands on them? |
Copy all the SMILES into the SMILES search in MolPort: For MTO vendors, you'll have to go through them individually. |
I have a python script that can search the ZINC database for vendors. |
@drc007 If you send me the script I can try to include it in the notebook |
@jhjensen2 I've emailed it to you. |
Got it. Looks like I need a ZINC ID though, which I don't have. Any idea how I get that from the SMILES? Or is it possible to modify the script to search with with SMILES? |
I presumed they would be with the structures downloaded from ZINC? |
Unfortunately not. I got the smiles from this repo a few years back |
This script might do the trick |
I've played around with different options for finding vendors for the ZINC compounds and MolPort seems to work the best, i.e. it is most "honest" about what in fact can and cannot be purchased and suggests similar compounds in the latter case. From what I understand, many molecules in ZINC are not actually purchaseable. |
ZINC have an option when selecting tranches for "in stock" I don't know how the molecules were selected. |
Hi @jhjensen2 @cstein @dkoes @drc007 forgive me if this is a naive question, but the structures of MurC with the Enamine/Zinc structures docked. Are we able to download and visualise in e.g. PyMol, or is proprietary software needed? Interested in the extent of overlap, and whether we're "painting" an interior surface that is available for ligand binding, and hence novel compound design. |
@mattodd @jhjensen2 @dkoes If the docked structures can be exported in sdf format then they could be viewed in PyMOL using the 6X9F crystal structure https://www.rcsb.org/structure/6X9F. |
Would the “maestro” original file also be helpful if people have the Schrödinger academic free visualiser? Whatever is simple and low effort here, I guess. |
@cstein might have saved these files, otherwise we can easily redock a few of the best scoring molecules. He's on vacation this week, though. |
I ran the top 20 ZINC molecules through MolPort. Compounds 2, 4 12, 13, and 18 are in-stock at Enamine, while compound 14 is in stock at Eximed. Let me know if I should check additional ones. |
I played around with 6X9F for a bit in Pharmit. It was surprisingly difficult to find ligands that matched the hydrogen bond network of the cognate ligand and had good steric complementarity to the receptor. I didn't find anything in MolPort, but there were a few hits in the make-on-demand libraries (MCULE and Chemspace). They still aren't great and some are decidedly non-drug-like. At best they score about the same as the native ligand, not really better. For reference, the native ligand has a Vina score of -7.3 kcal/mol, CNNscore of 0.8, and CNNaffinity of 5.7. The hits I find most interesting are QZBDWZONHQAXTD-UHFFFAOYSA-N,XAIARAJIYMRDTF-UHFFFAOYSA-N, and CSC076664421. I've attached the two pharmit screens (json). The results are from applying these screens, minimizing within pharmit (with filtering criteria), and then doing an offline minimization/scoring with gnina ( Let me know if there is anything of interest. |
@mattodd I have the pose-viewer files for the ZINC database, but it is 1.4 GB in size. Would the first 100 structures be of interest or another subset I can extract for you? The enamine pose-viewer file is only 55 mb so that is easily sharable 👍 The pose-viewer file contains both the 6x9f prepared structure as well as all ligands bound. These can be viewed in the free Maestro interface. |
Yes, murC. Good idea about docking AZ8074 to murD and murE |
No we haven't. Can Glide do this and, if so, how? |
Here are the XP docking scores for AZ8074 from @cstein system PDB docking_score |
Here are the top 88 molecules from the GA search with murD and murE scores |
During our last Zoom meeting the question came up about the reliability of docking scores. So Casper and I decided to investigate this using the D4 dataset from this paper. Here they used DOCKER to dock 138 million molecules to the D4 dopamine receptor (5WIU) and they made 549 of these molecules and measured their activity (% antagonist displacement at 10 μM). Crucially the molecules were selected to span all docking scores and are structurally quite diverse (i.e. no homology series that are so common in activity data sets). 122 molecules (22%) showed significant activity (>50% antagonist displacement). Here’s a plot of the activity vs docking score from the paper Here I count the total number and number of active molecules within certain score ranges. As you can see the proportion of active molecules increases with lower (better) docking scores. So if you pick a random molecule with a docking score < -65 there is a 37% chance that it is active. Let’s compare this to picking a random molecule from the 138 million (i.e. without using docking). 86% have a score less than -40 and 13% have a score between -50 and -40. A random molecule thus has <1% of being active (0.86*0 + 13*0.06). That’s the value of docking. Let’s look at how Glide performs. Here are the results for XP with and without using LigPrep (I have adjusted the cutoffs to get roughly the same number of molecules in each bin). The percentage of actives is a bit higher when using LigPrep, so this is probably what we should use going forward. The percentage of actives is higher for Glide than for DOCKER. However, notice that the chance of randomly picking an active molecule from the 549 molecule-dataset is 22%, so Glide is not better than random at identifying inactive molecules, while DOCKER is. Unfortunately there are very few molecules with very low docking scores, so that percentage of actives has quite a large error bar. So it’s not really certain that molecules with a score < -8 is more likely to be active than a molecule with a score between -8 and -6 based on this data. Using larger bin sizes gives us more precise percentages but for a larger range of scores. Finally, we’ve talked about using several docking programs to create a consensus score. So here are the same results where I have removed all molecules with DOCKER scores > -55 Unfortunately, using DOCKER doesn’t help to weed out inactive molecules with good Glide scores. If we assume that these results are representative of murC ligase, then we can say that molecules with docking scores < -7 are likely to have a 50% chance of being active. That is much, much better than picking a molecule at random, but I am not sure how it compares to an expert MedChemist. There is some indication that molecules with scores < -8 are more likely to be active than those between -8 and -7, but there are too few examples to be certain of this. There is simply too little data to be able to say anything about scores <-9 vs (-9, -8]. In general, the chances of finding molecules with XP Glide scores <-9 in molecule libraries are very low. GAs can be used to generate many more, so I hope that can be tested. |
My code has a bug so the last table in the previous post is wrong. So here are the same results where I have removed all molecules with DOCKER scores > -55 So, yes, the success rate for molecules with good Glide scores can be increased to almost 60% by also using DOCKER. Another thing we talked about is focusing on molecules that have good docking scores for murC, murD, and murE. If we assume that the success rate is 50% for all three target and we have a molecule with good Glide scores for all three, then there's only a 12.5% chance (0.5^3) that the molecule will be active on all three targets. So, with only a 50% success rate this probably doesn't make sense. |
Hello everyone, I am Kato, an Erasmus Master student from KU Leuven - Belgium, who has recently joined Professor Todd's lab at UCL. As part of my internship and master's thesis, I will be working on this OSA MurLigase project for the next nine months. I am looking forward to this multidisciplinary and international collaboration! I was asked to look up quotes on MCule or Enamine for some compounds predicted by @jhjensen2. I am still waiting for the quote from Enamine, but attached you can find the one from MCule. However, I noticed that when searching via MolPort, there is a remarkable price difference for the same supplier 'UkrOrgSynthesis'. Quote attached as well, just to be sure! |
Welcome aboard @KatoLeonard! Just out of curiosity, which molecules did you pick? |
Thank you! I picked the first two of the Enamine series, so Z1603489873 (SMILES: Cc1ccc(CC(=O)Nc2cccc(-n3ccc(C(=O)O)n3)c2)o1 ) and Z2581487631 (SMILES: Cn1cnc(C(=O)NC2(CC(=O)O)CCOCC2)c1 ) |
I have chosen them randomly, but perhaps if I can access the original data file, I can take into account the different poses of the molecules in MurC? |
Dear Casper (Prof. Steinmann @cstein), sorry we missed your message two months ago, but we would really appreciate it if you could share both the zinc and Enamine pose-viewer files to us! Is that possible that you could share the data through Dropbox to us? Many thanks! @KatoLeonard This message may answer your questions! |
If you are testing compounds, would it be possible to test some of the compounds I proposed (see Jul 6 message)? I'd be happy to order the MCULE compounds (send me shipping info via email). The ChemSpace compound is available from ENAMINE (Z1980956983) so it would be more cost effective to add it to an existing order. |
@KatoLeonard But those are from the screen of the smaller DDS. We have subsequently screened the larger HLL set and found molecules with better docking scores (see this post from July 15). @cstein has rescored them using LigPrep and we'll share the results (including docking poses) here within the next few days. Stay tuned |
Turns out that we get worse docking scores with LigPrep, so I suggest selecting molecules from this batch. @cstein is working on getting the docking poses. Time permitting, he'll also dock them with DOCKER. |
I have (finally) extracted the 20 best poses of our docking studies on the HLL database. They are available for download as a .zip file with both SP and XP results. You can match the docking scores for the SP results with our (@jhjensen2) July 15th post (#46 (comment)). Any feedback is welcome of course! |
Here are our results for the Enamine Hit Locator library of 234K molecules. The docking is performed with Glide using the SP scoring methodology using the 6X9F crystal structure. The 1000 best binders from each set are then redocked using the more accurate XP scoring methodology. The scoring is done with LigPrep. Compounds with more than 5 rotateable bonds (bad for accumulation) and docking scores > -7.0 (AZ8074 has a docking score of -7.2) are removed as are molecules with DOCK scores > -50. All compounds have a globularity < 0.25, which is good for accumulation. But some of them have computed logP values >3.5 (Z57909504, Z2038227779, and Z57907808). If our results for the D4 dopamine receptor are transferable, then there's a 60% chance these molecules are "good" binders. @cstein can you upload the docking poses for these molecules? The csv file can be found here HLL_top_ligprep.csv |
The poseviews for the above ligands are extracted and available here (compare the titles in Maestro with the title id in the .csv file) Download link: https://www.dropbox.com/s/llc4ofxqahgr8ei/dock_sorted_poses.zip?dl=0 |
Here are the results from a genetic algorithm search for primary amines (reminder: primary amines are less likely to be pumped out of the cell). You can see the first 10 molecules above (the rest are in this csv file: amines_20211126.csv), with the name, docking score, and number of synthetic steps predicted by Postera’s Manifold retrosynthesis program. The molecules are sorted in order of number of synthesis steps and then by docking scores. Things to help guide molecule selection:
|
Here are the link for a .zip-file with two pose-viewer (rigid and confgen) files for the results that @jhjensen2 posted a few days ago. When you load them up, they are in the same order as in the .csv files that Jan posted. Link for download: https://www.dropbox.com/s/d7leyvs7yul3opc/amines_GB-GA_poses_20211129.zip?dl=0 |
I have made a few slides for tomorrow's meeting. I've gone through and picked seven molecules for possible synthesis, based on their poses, ease of synthesis, etc. |
In preparation for our study on using a genetic algorithm to find good binders for MurC (more on that later), Casper Steinmann (@cstein) has docked 250K molecules from the ZINC database and 10K molecules from Enamines Diversity Discovery Set.
The docking is performed with Glide using the SP scoring methodology using the 6X9F crystal structure. The 1000 best binders from each set are then redocked using the more accurate XP scoring methodology. Below we show the 100 molecules with the best (lowest) docking scores.
For comparison AZ8074 has a docking score of -7.2 using the same methodology.
Here are screenshots of the top 20 molecules for Enamine and ZINC, but all the data can be accessed in this notebook.
Enamine
ZINC
The text was updated successfully, but these errors were encountered: