Question about getting the best entire route cost before ValueMLP network is ready #9

Mike575 · 2020-12-28T13:19:12Z

Hi, I am sincerely thankful for your code and paper.

I'm currently trying to train ValueMLP network on my own resulting dataset. But I got some problem when I trying to construct resulting dataset from training routes. The resulting dataset need the best entire route cost vi. But I have no idea to get the best entire route cost before ValueMLP network is ready. It's something like the chicken and egg problem.

binghong-ml · 2021-01-04T21:47:22Z

Hi Mike, the route cost is the sum of the costs of the individual reactions in the route. The reaction costs can be arbitrary and should be defined by the user. In our experiment, the reaction cost is defined as the negative log-likelihood of the reaction (sec 5.2), which is given by the one-step model (Eq. 1). You don't need to use the value network to generate the cost before building the dataset.

Mike575 · 2021-01-06T07:43:59Z

Thanks for your answer sincerely. I still got some question when I trying to build resulting dataset.

Why the num of (fps, values) pairs in the training dataset is 398581 at the beginning？And how to filter them into 299102 pairs?
How to build negative samples? For example, for a given target molecule, there are 19 candidate reactions and 1 best reaction in root node, so can I generate 19 negative samples? For each negative sample, should I append the candidate reaction's cost in data_dict['reaction_costs'] , append the true route cost in data_dict['target_values'], append the candidate reactant's fingerprint in data_dict['reactant_fps']? Am I right?

binghong-ml · 2021-01-12T01:49:38Z

Why the num of (fps, values) pairs in the training dataset is 398581 at the beginning？And how to filter them into 299102 pairs?

I don't remember the exact numbers. But it seems that you are talking about the total number of data points in the whole dataset and in the training split. If that is the case, the training data points were sampled randomly from the entire dataset. The remaining data points are used for validation and testing.

How to build negative samples? For example, for a given target molecule, there are 19 candidate reactions and 1 best reaction in root node, so can I generate 19 negative samples?

Yes, that's what I did.

For each negative sample, should I append the candidate reaction's cost in data_dict['reaction_costs'] ,

Yes.

append the true route cost in data_dict['target_values'],

Yes, the best route cost.

append the candidate reactant's fingerprint in data_dict['reactant_fps']?

Yes.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Question about getting the best entire route cost before ValueMLP network is ready #9

Question about getting the best entire route cost before ValueMLP network is ready #9

Mike575 commented Dec 28, 2020

binghong-ml commented Jan 4, 2021

Mike575 commented Jan 6, 2021 •

edited

Loading

binghong-ml commented Jan 12, 2021

Question about getting the best entire route cost before ValueMLP network is ready #9

Question about getting the best entire route cost before ValueMLP network is ready #9

Comments

Mike575 commented Dec 28, 2020

binghong-ml commented Jan 4, 2021

Mike575 commented Jan 6, 2021 • edited Loading

binghong-ml commented Jan 12, 2021

Mike575 commented Jan 6, 2021 •

edited

Loading