Ask a question-问题请教 #6

moyu3003 · 2022-11-02T09:05:38Z

Hello Authors.

Regarding your publications "Inductive transfer learning for molecula activity prediction : Next-Gen QSAR Models with MolPMoFiT"and

"SMILES Pair Encoding: a Data-Driven Substructure Tokenization Algorithm for Deep Learning".

I have encountered the following problems in duplicating your work and would like to ask you for advice.

In the first paper, what is the coding basis of the data enhancement part in the code utils.py you uploaded, and how the enhanced molecules are determined to have the same properties as the original molecules; also, I would like to ask what is the reason for the partial error in this code.
In the second paper, you used the SPE form to divide the molecules, which is higher than the ECFP coding form in terms of effect, but is the sub-structure accurate in terms of interpretation; I also want to ask, after the molecules are divided in this part, what is the form of data input to the network model.
Can you share a complete code.

I hope to get your reply, thank you very much!

Translated with www.DeepL.com/Translator (free version)

Lu

2022.11.02

作者您好：

关于您发表的《SMILES Pair Encoding: A Data-Driven Substructure Tokenization Algorithm for Deep Learning》和

《Inductive transfer learning for molecula activity prediction : Next-Gen QSAR Models with MolPMoFiT》期刊，

我在重复您的工作过程中遇到了以下问题，特此向您请教；

1、在第二篇文献中，您上传的代码utils.py中的数据增强部分的编码依据是什么，增强后的分子如何确定与原分子具有相同的属性；同时想问一下，不知道是什么原因该代码存在部分错误；

2、在第一篇文献中，您使用SPE形式对分子进行划分，在效果上是高于ECFP编码形式，但是在解释上子结构是否准确；同时想问一下，该部分对分子划分后，是以什么形式进行数据输入到网络模型当中的；

3、是否可以分享一份完整代码。

希望得到您的回复，非常感谢！

陆同学

2022.11.02

Provide feedback