Clarification regarding the number of molecular building blocks. Why they are different from JT-VAE? #9

Srilok · 2023-01-03T19:30:13Z

Hello,

First, I really enjoyed reading the paper. Amazing work!

I have a question regarding the number of building blocks used for generating small molecules. Appendix A.3 of the paper states that there are a total of 105 unique building blocks (after accounting for different attachment points) and that they were obtained by the process suggested by the JT-VAE paper. (Jin et al. (2020)). However, in the JT-VAE paper, the total vocabulary size is $|\chi|=780$ obtained from the same ZINC dataset. My understanding is they are both the same. If that is correct, why are the number of building blocks different here? What am I missing? If they are not the same, can you please explain the difference?

Thank you so much for your help

MKorablyov · 2023-01-03T19:48:26Z

The building blocks in two papers are not the same but quite similar. In both cases we represent molecules as junction trees - that means there are no cycles. Ours are obtained by BRICS followed by Bemis-Murcko decomposition. Finally, we had a chemist who curated our set of building blocks. In the end, I think our building blocks ended up slightly smaller and more rigid compared to JT-VAE and worked better for us in practice.

BRICS: https://www.rdkit.org/docs/source/rdkit.Chem.BRICS.html
Bemis-Murcko: https://rdkit.org/docs/source/rdkit.Chem.Scaffolds.MurckoScaffold.html

Srilok · 2023-01-10T15:15:25Z

Thank you. After performing the BRICS followed by Bemis-Murcko decomposition on the 250k SMILES dataset, I get 8962 unique building blocks. Can you please comment a bit more about the curation process? How did you narrow down to a smaller list of 105 building blocks?

Also, how did you determine the attachment points (block_r in data/blocks_PDB_105.json)?

Thank you

Srilok · 2023-01-18T00:02:06Z

Hi @MKorablyov , just following up on my comment earlier. It would be really helpful if you could provide those details. Thank you so much!

yuxuan9982 · 2024-10-31T05:39:59Z

Met the same situation, following the JT-VAE paper, there will be more than 700 blocks. If do BRICS followed by Bemis-Murcko decomposition, the number of unique building blocks is also 8900+, no idea about how to narrow it down. Besides, how to decide the attachment points is a challenge for me. Thanks if anyone can give any suggestion!

bengioe mentioned this issue Apr 1, 2024

Want to know the detailed preparation process of dataset #18

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clarification regarding the number of molecular building blocks. Why they are different from JT-VAE? #9

Clarification regarding the number of molecular building blocks. Why they are different from JT-VAE? #9

Srilok commented Jan 3, 2023

MKorablyov commented Jan 3, 2023

Srilok commented Jan 10, 2023

Srilok commented Jan 18, 2023

yuxuan9982 commented Oct 31, 2024

Clarification regarding the number of molecular building blocks. Why they are different from JT-VAE? #9

Clarification regarding the number of molecular building blocks. Why they are different from JT-VAE? #9

Comments

Srilok commented Jan 3, 2023

MKorablyov commented Jan 3, 2023

Srilok commented Jan 10, 2023

Srilok commented Jan 18, 2023

yuxuan9982 commented Oct 31, 2024