Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Docking studies on MurC ligase #46

Open
jhjensen2 opened this issue Jul 2, 2021 · 59 comments
Open

Docking studies on MurC ligase #46

jhjensen2 opened this issue Jul 2, 2021 · 59 comments

Comments

@jhjensen2
Copy link

In preparation for our study on using a genetic algorithm to find good binders for MurC (more on that later), Casper Steinmann (@cstein) has docked 250K molecules from the ZINC database and 10K molecules from Enamines Diversity Discovery Set.

The docking is performed with Glide using the SP scoring methodology using the 6X9F crystal structure. The 1000 best binders from each set are then redocked using the more accurate XP scoring methodology. Below we show the 100 molecules with the best (lowest) docking scores.

For comparison AZ8074 has a docking score of -7.2 using the same methodology.

Here are screenshots of the top 20 molecules for Enamine and ZINC, but all the data can be accessed in this notebook.

Enamine
Screenshot 2021-07-02 at 11 00 06
ZINC
Screenshot 2021-07-02 at 11 00 51

@drc007
Copy link
Contributor

drc007 commented Jul 2, 2021

@jhjensen2 Would be really interesting if you could include a variety of scoring functions.

@jhjensen2
Copy link
Author

Do you mean scoring functions in Glide (like SP and HTVS), or other docking programs, like Vina?

@drc007
Copy link
Contributor

drc007 commented Jul 2, 2021

I was thinking of GNINA, RF-Score, X-score etc. We could also see if other groups would like to use this dataset to test other custom scoring functions.

@jhjensen2
Copy link
Author

Ah, OK. Is the idea to get some kind of consensus model?
We have some experience with SMINA, but not the other ones you mention.

@Yuhang-CADD
Copy link
Contributor

Yuhang-CADD commented Jul 2, 2021

Firstly, amazed by your work!

Secondly,
If I am not thinking in the wrong way, my understanding of Chris's suggestion is:

Using different scoring functions (rfscore, xscore, similar to Schrodinger's glides SP & XP) to evaluate the same dataset. So, yes, like a consensus model (not sure if I am using the right terminology, like a crossing docking?)

And I remember the SMINA programme has rfscore functionality compiled in it.

IDK if I am right, and hope this would help! And please bear my shallow understandings!

@dkoes
Copy link

dkoes commented Jul 2, 2021

smina does not have rfscore built-in, but it isn't hard to get rfscore working (https://github.com/oddt/rfscorevs).

Do any of these compounds have known activity?

If you grab the gnina binary (https://github.com/gnina/gnina/releases/download/v1.0.1/gnina) you can rescore your already docked poses with

gnina -r receptor.pdb -l docked_poses.sdf.gz --minimize -o gnina_minimized.sdf.gz

Embedded in the output file will be the Vina default empirical score (minimizedAffinity), and estimates of the pose quality (CNNscore) and binding affinity (CNNaffinity) from a convolutional neural network (there is also a CNN_VS field which is CNNscore*CNNaffinity, but I don't know yet if this is a useful thing). This command will be much faster if run on a machine with a GPU, but a GPU is not required.

It would also be interesting to redock the full set with GNINA.

@drc007
Copy link
Contributor

drc007 commented Jul 2, 2021

@jhjensen2 would it be possible to provide a link to a file containing the 260K structures that were docked?

@cstein
Copy link

cstein commented Jul 2, 2021

@drc007 The structures of the zinc database files used for docking are available here: https://www.dropbox.com/s/yzgv8emshj8391j/zinc_sdf.tar.bz2?dl=0

The enamine database structures are here:
https://www.dropbox.com/s/dg34ay1utkaqvtw/enamine_sdf.tar.bz2?dl=0

@jhjensen2
Copy link
Author

Great! @cstein do you also have the Enamine compounds?

@jhjensen2
Copy link
Author

@dkoes none of the compounds have known activity against MurC AFAIK. The aim is to find possible alternative leads to the compound family found by AZ. We hope to identify good binders using genetic algorithms/docking but the advantage of the compounds in the Enamine set is that they are purchasable.

@cstein
Copy link

cstein commented Jul 3, 2021

I have updated my comment above with both structure files from the zinc database and the enamine database.

@mattodd
Copy link
Contributor

mattodd commented Jul 5, 2021

Great work, great discussion all. We'll be discussing next steps Tue 6th July 2pm London if you'd like to join - see #47. Obvious question: the top scoring compounds identified from Zinc (v nice scores) - how do we most easily get our hands on them?

@dkoes
Copy link

dkoes commented Jul 5, 2021

Copy all the SMILES into the SMILES search in MolPort:
https://www.molport.com/shop/find-chemicals-by-smiles

For MTO vendors, you'll have to go through them individually.

@drc007
Copy link
Contributor

drc007 commented Jul 6, 2021

I have a python script that can search the ZINC database for vendors.

@jhjensen2
Copy link
Author

@drc007 If you send me the script I can try to include it in the notebook

@drc007
Copy link
Contributor

drc007 commented Jul 6, 2021

@jhjensen2 I've emailed it to you.

@jhjensen2
Copy link
Author

Got it. Looks like I need a ZINC ID though, which I don't have. Any idea how I get that from the SMILES? Or is it possible to modify the script to search with with SMILES?

@drc007
Copy link
Contributor

drc007 commented Jul 6, 2021

I presumed they would be with the structures downloaded from ZINC?

@jhjensen2
Copy link
Author

Unfortunately not. I got the smiles from this repo a few years back

@jhjensen2
Copy link
Author

This script might do the trick

@jhjensen2
Copy link
Author

I've played around with different options for finding vendors for the ZINC compounds and MolPort seems to work the best, i.e. it is most "honest" about what in fact can and cannot be purchased and suggests similar compounds in the latter case. From what I understand, many molecules in ZINC are not actually purchaseable.

@drc007
Copy link
Contributor

drc007 commented Jul 6, 2021

ZINC have an option when selecting tranches for "in stock" I don't know how the molecules were selected.

@mattodd
Copy link
Contributor

mattodd commented Jul 6, 2021

Hi @jhjensen2 @cstein @dkoes @drc007 forgive me if this is a naive question, but the structures of MurC with the Enamine/Zinc structures docked. Are we able to download and visualise in e.g. PyMol, or is proprietary software needed? Interested in the extent of overlap, and whether we're "painting" an interior surface that is available for ligand binding, and hence novel compound design.

@drc007
Copy link
Contributor

drc007 commented Jul 6, 2021

@mattodd @jhjensen2 @dkoes If the docked structures can be exported in sdf format then they could be viewed in PyMOL using the 6X9F crystal structure https://www.rcsb.org/structure/6X9F.

@mattodd
Copy link
Contributor

mattodd commented Jul 6, 2021

Would the “maestro” original file also be helpful if people have the Schrödinger academic free visualiser? Whatever is simple and low effort here, I guess.

@jhjensen2
Copy link
Author

@cstein might have saved these files, otherwise we can easily redock a few of the best scoring molecules. He's on vacation this week, though.

@jhjensen2
Copy link
Author

I ran the top 20 ZINC molecules through MolPort. Compounds 2, 4 12, 13, and 18 are in-stock at Enamine, while compound 14 is in stock at Eximed.

Let me know if I should check additional ones.

@dkoes
Copy link

dkoes commented Jul 6, 2021

I played around with 6X9F for a bit in Pharmit. It was surprisingly difficult to find ligands that matched the hydrogen bond network of the cognate ligand and had good steric complementarity to the receptor. I didn't find anything in MolPort, but there were a few hits in the make-on-demand libraries (MCULE and Chemspace). They still aren't great and some are decidedly non-drug-like. At best they score about the same as the native ligand, not really better. For reference, the native ligand has a Vina score of -7.3 kcal/mol, CNNscore of 0.8, and CNNaffinity of 5.7.

The hits I find most interesting are QZBDWZONHQAXTD-UHFFFAOYSA-N,XAIARAJIYMRDTF-UHFFFAOYSA-N, and CSC076664421.

I've attached the two pharmit screens (json). The results are from applying these screens, minimizing within pharmit (with filtering criteria), and then doing an offline minimization/scoring with gnina (gnina -r rec.pdb --minimize minimized_results.sdf.gz -o results.sdf.gz). A PyMOL session file is provided as well.

Let me know if there is anything of interest.

pharmit_screen.tar.gz

@cstein
Copy link

cstein commented Jul 6, 2021

Would the “maestro” original file also be helpful if people have the Schrödinger academic free visualiser? Whatever is simple and low effort here, I guess.

@mattodd I have the pose-viewer files for the ZINC database, but it is 1.4 GB in size. Would the first 100 structures be of interest or another subset I can extract for you? The enamine pose-viewer file is only 55 mb so that is easily sharable 👍 The pose-viewer file contains both the 6x9f prepared structure as well as all ligands bound. These can be viewed in the free Maestro interface.

@eyermanncj
Copy link
Contributor

eyermanncj commented Aug 2, 2021 via email

@jhjensen2
Copy link
Author

jhjensen2 commented Aug 2, 2021

Yes, murC. Good idea about docking AZ8074 to murD and murE

@eyermanncj
Copy link
Contributor

eyermanncj commented Aug 2, 2021 via email

@jhjensen2
Copy link
Author

No we haven't. Can Glide do this and, if so, how?

@eyermanncj
Copy link
Contributor

eyermanncj commented Aug 2, 2021 via email

@jhjensen2
Copy link
Author

jhjensen2 commented Aug 3, 2021

Here are the XP docking scores for AZ8074 from @cstein

system PDB docking_score
MurC 6x9f -6.85162
MurD 5a5f -2.15905
MurE 7b6g -4.75534

@jhjensen2
Copy link
Author

Here are the top 88 molecules from the GA search with murD and murE scores
GA_top_SA.csv

@jhjensen2
Copy link
Author

jhjensen2 commented Aug 15, 2021

During our last Zoom meeting the question came up about the reliability of docking scores. So Casper and I decided to investigate this using the D4 dataset from this paper.

Here they used DOCKER to dock 138 million molecules to the D4 dopamine receptor (5WIU) and they made 549 of these molecules and measured their activity (% antagonist displacement at 10 μM). Crucially the molecules were selected to span all docking scores and are structurally quite diverse (i.e. no homology series that are so common in activity data sets). 122 molecules (22%) showed significant activity (>50% antagonist displacement).

Here’s a plot of the activity vs docking score from the paper

Screenshot 2021-08-15 at 14 39 56

Here I count the total number and number of active molecules within certain score ranges.

Screenshot 2021-08-15 at 14 40 41

As you can see the proportion of active molecules increases with lower (better) docking scores. So if you pick a random molecule with a docking score < -65 there is a 37% chance that it is active.

Let’s compare this to picking a random molecule from the 138 million (i.e. without using docking). 86% have a score less than -40 and 13% have a score between -50 and -40. A random molecule thus has <1% of being active (0.86*0 + 13*0.06). That’s the value of docking.

Let’s look at how Glide performs. Here are the results for XP with and without using LigPrep (I have adjusted the cutoffs to get roughly the same number of molecules in each bin).

Screenshot 2021-08-15 at 14 41 26

The percentage of actives is a bit higher when using LigPrep, so this is probably what we should use going forward.

The percentage of actives is higher for Glide than for DOCKER. However, notice that the chance of randomly picking an active molecule from the 549 molecule-dataset is 22%, so Glide is not better than random at identifying inactive molecules, while DOCKER is.

Unfortunately there are very few molecules with very low docking scores, so that percentage of actives has quite a large error bar. So it’s not really certain that molecules with a score < -8 is more likely to be active than a molecule with a score between -8 and -6 based on this data.

Using larger bin sizes gives us more precise percentages but for a larger range of scores.

Screenshot 2021-08-15 at 14 42 40

Finally, we’ve talked about using several docking programs to create a consensus score. So here are the same results where I have removed all molecules with DOCKER scores > -55

Screenshot 2021-08-15 at 14 43 11

Unfortunately, using DOCKER doesn’t help to weed out inactive molecules with good Glide scores.

If we assume that these results are representative of murC ligase, then we can say that molecules with docking scores < -7 are likely to have a 50% chance of being active. That is much, much better than picking a molecule at random, but I am not sure how it compares to an expert MedChemist.

There is some indication that molecules with scores < -8 are more likely to be active than those between -8 and -7, but there are too few examples to be certain of this.

There is simply too little data to be able to say anything about scores <-9 vs (-9, -8]. In general, the chances of finding molecules with XP Glide scores <-9 in molecule libraries are very low. GAs can be used to generate many more, so I hope that can be tested.

@jhjensen2
Copy link
Author

My code has a bug so the last table in the previous post is wrong. So here are the same results where I have removed all molecules with DOCKER scores > -55

Screenshot 2021-08-16 at 12 17 51

So, yes, the success rate for molecules with good Glide scores can be increased to almost 60% by also using DOCKER.

Another thing we talked about is focusing on molecules that have good docking scores for murC, murD, and murE. If we assume that the success rate is 50% for all three target and we have a molecule with good Glide scores for all three, then there's only a 12.5% chance (0.5^3) that the molecule will be active on all three targets. So, with only a 50% success rate this probably doesn't make sense.

@KatoLeonard
Copy link

Hello everyone,

I am Kato, an Erasmus Master student from KU Leuven - Belgium, who has recently joined Professor Todd's lab at UCL. As part of my internship and master's thesis, I will be working on this OSA MurLigase project for the next nine months. I am looking forward to this multidisciplinary and international collaboration!

I was asked to look up quotes on MCule or Enamine for some compounds predicted by @jhjensen2. I am still waiting for the quote from Enamine, but attached you can find the one from MCule. However, I noticed that when searching via MolPort, there is a remarkable price difference for the same supplier 'UkrOrgSynthesis'. Quote attached as well, just to be sure!

MCule Ukrorgsyntez.pdf

MolPort UKOrgSynthesis.pdf

@jhjensen2
Copy link
Author

Welcome aboard @KatoLeonard! Just out of curiosity, which molecules did you pick?

@KatoLeonard
Copy link

Thank you! I picked the first two of the Enamine series, so Z1603489873 (SMILES: Cc1ccc(CC(=O)Nc2cccc(-n3ccc(C(=O)O)n3)c2)o1 ) and Z2581487631 (SMILES: Cn1cnc(C(=O)NC2(CC(=O)O)CCOCC2)c1 )

@KatoLeonard
Copy link

I have chosen them randomly, but perhaps if I can access the original data file, I can take into account the different poses of the molecules in MurC?

@Yuhang-CADD
Copy link
Contributor

Yuhang-CADD commented Sep 23, 2021

Would the “maestro” original file also be helpful if people have the Schrödinger academic free visualiser? Whatever is simple and low effort here, I guess.

@mattodd I have the pose-viewer files for the ZINC database, but it is 1.4 GB in size. Would the first 100 structures be of interest or another subset I can extract for you? The enamine pose-viewer file is only 55 mb so that is easily sharable 👍 The pose-viewer file contains both the 6x9f prepared structure as well as all ligands bound. These can be viewed in the free Maestro interface.

Dear Casper (Prof. Steinmann @cstein), sorry we missed your message two months ago, but we would really appreciate it if you could share both the zinc and Enamine pose-viewer files to us! Is that possible that you could share the data through Dropbox to us? Many thanks!

@KatoLeonard This message may answer your questions!

@dkoes
Copy link

dkoes commented Sep 23, 2021

If you are testing compounds, would it be possible to test some of the compounds I proposed (see Jul 6 message)? I'd be happy to order the MCULE compounds (send me shipping info via email). The ChemSpace compound is available from ENAMINE (Z1980956983) so it would be more cost effective to add it to an existing order.

@jhjensen2
Copy link
Author

@KatoLeonard But those are from the screen of the smaller DDS. We have subsequently screened the larger HLL set and found molecules with better docking scores (see this post from July 15). @cstein has rescored them using LigPrep and we'll share the results (including docking poses) here within the next few days. Stay tuned

@jhjensen2
Copy link
Author

Turns out that we get worse docking scores with LigPrep, so I suggest selecting molecules from this batch. @cstein is working on getting the docking poses. Time permitting, he'll also dock them with DOCKER.

@cstein
Copy link

cstein commented Oct 5, 2021

I have (finally) extracted the 20 best poses of our docking studies on the HLL database. They are available for download as a .zip file with both SP and XP results. You can match the docking scores for the SP results with our (@jhjensen2) July 15th post (#46 (comment)).

Any feedback is welcome of course!

@jhjensen2
Copy link
Author

jhjensen2 commented Oct 11, 2021

Here are our results for the Enamine Hit Locator library of 234K molecules. The docking is performed with Glide using the SP scoring methodology using the 6X9F crystal structure. The 1000 best binders from each set are then redocked using the more accurate XP scoring methodology. The scoring is done with LigPrep.

Compounds with more than 5 rotateable bonds (bad for accumulation) and docking scores > -7.0 (AZ8074 has a docking score of -7.2) are removed as are molecules with DOCK scores > -50. All compounds have a globularity < 0.25, which is good for accumulation. But some of them have computed logP values >3.5 (Z57909504, Z2038227779, and Z57907808).

If our results for the D4 dopamine receptor are transferable, then there's a 60% chance these molecules are "good" binders.

@cstein can you upload the docking poses for these molecules?

The csv file can be found here HLL_top_ligprep.csv

Screenshot 2021-10-11 at 15 48 23

@cstein
Copy link

cstein commented Oct 29, 2021

The poseviews for the above ligands are extracted and available here (compare the titles in Maestro with the title id in the .csv file)

Download link: https://www.dropbox.com/s/llc4ofxqahgr8ei/dock_sorted_poses.zip?dl=0

@jhjensen2
Copy link
Author

Screenshot 2021-11-26 at 14 01 36

Here are the results from a genetic algorithm search for primary amines (reminder: primary amines are less likely to be pumped out of the cell).

You can see the first 10 molecules above (the rest are in this csv file: amines_20211126.csv), with the name, docking score, and number of synthetic steps predicted by Postera’s Manifold retrosynthesis program. The molecules are sorted in order of number of synthesis steps and then by docking scores.

Things to help guide molecule selection:

  • Look at the docking poses (@cstein will post these soon).  Does the ligand make the same kinds of interactions (e.g. H-bonds) with the protein as AZ5595 and AZ8074 do? Is the protonation state realistic?
  • Consider the ease of synthesis. To see Manifold’s retrosynthesis predictions you can paste the smiles strings from the csv file in their website https://postera.ai/manifold/. (You have to make an account but it’s free). Do the suggestions make sense? Are the building blocks available at a reasonable price? How easy would it be to make derivatives? You can see an example Manifold output below
  • Consider chemical diversity. For example several molecules have a quinolinone group. Maybe just try a 1-2 of those?

Screenshot 2021-11-26 at 14 08 59

@cstein
Copy link

cstein commented Nov 29, 2021

Here are the link for a .zip-file with two pose-viewer (rigid and confgen) files for the results that @jhjensen2 posted a few days ago. When you load them up, they are in the same order as in the .csv files that Jan posted.

Link for download: https://www.dropbox.com/s/d7leyvs7yul3opc/amines_GB-GA_poses_20211129.zip?dl=0

@jhjensen2
Copy link
Author

I have made a few slides for tomorrow's meeting. I've gone through and picked seven molecules for possible synthesis, based on their poses, ease of synthesis, etc.
Amine_poses.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

8 participants