-
Notifications
You must be signed in to change notification settings - Fork 76
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clarification regarding the number of molecular building blocks. Why they are different from JT-VAE? #9
Comments
The building blocks in two papers are not the same but quite similar. In both cases we represent molecules as junction trees - that means there are no cycles. Ours are obtained by BRICS followed by Bemis-Murcko decomposition. Finally, we had a chemist who curated our set of building blocks. In the end, I think our building blocks ended up slightly smaller and more rigid compared to JT-VAE and worked better for us in practice. BRICS: https://www.rdkit.org/docs/source/rdkit.Chem.BRICS.html |
Thank you. After performing the BRICS followed by Bemis-Murcko decomposition on the 250k SMILES dataset, I get 8962 unique building blocks. Can you please comment a bit more about the curation process? How did you narrow down to a smaller list of 105 building blocks? Also, how did you determine the attachment points ( Thank you |
Hi @MKorablyov , just following up on my comment earlier. It would be really helpful if you could provide those details. Thank you so much! |
Met the same situation, following the JT-VAE paper, there will be more than 700 blocks. If do BRICS followed by Bemis-Murcko decomposition, the number of unique building blocks is also 8900+, no idea about how to narrow it down. Besides, how to decide the attachment points is a challenge for me. Thanks if anyone can give any suggestion! |
Hello,
First, I really enjoyed reading the paper. Amazing work!
I have a question regarding the number of building blocks used for generating small molecules. Appendix A.3 of the paper states that there are a total of 105 unique building blocks (after accounting for different attachment points) and that they were obtained by the process suggested by the JT-VAE paper. (Jin et al. (2020)). However, in the JT-VAE paper, the total vocabulary size is$|\chi|=780$ obtained from the same ZINC dataset. My understanding is they are both the same. If that is correct, why are the number of building blocks different here? What am I missing? If they are not the same, can you please explain the difference?
Thank you so much for your help
The text was updated successfully, but these errors were encountered: