Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about getting the best entire route cost before ValueMLP network is ready #9

Open
Mike575 opened this issue Dec 28, 2020 · 3 comments

Comments

@Mike575
Copy link

Mike575 commented Dec 28, 2020

Hi, I am sincerely thankful for your code and paper.

I'm currently trying to train ValueMLP network on my own resulting dataset. But I got some problem when I trying to construct resulting dataset from training routes. The resulting dataset need the best entire route cost vi. But I have no idea to get the best entire route cost before ValueMLP network is ready. It's something like the chicken and egg problem.

@binghong-ml
Copy link
Owner

Hi Mike, the route cost is the sum of the costs of the individual reactions in the route. The reaction costs can be arbitrary and should be defined by the user. In our experiment, the reaction cost is defined as the negative log-likelihood of the reaction (sec 5.2), which is given by the one-step model (Eq. 1). You don't need to use the value network to generate the cost before building the dataset.

@Mike575
Copy link
Author

Mike575 commented Jan 6, 2021

Thanks for your answer sincerely. I still got some question when I trying to build resulting dataset.

  1. Why the num of (fps, values) pairs in the training dataset is 398581 at the beginning?And how to filter them into 299102 pairs?
  2. How to build negative samples? For example, for a given target molecule, there are 19 candidate reactions and 1 best reaction in root node, so can I generate 19 negative samples? For each negative sample, should I append the candidate reaction's cost in data_dict['reaction_costs'] , append the true route cost in data_dict['target_values'], append the candidate reactant's fingerprint in data_dict['reactant_fps']? Am I right?

@binghong-ml
Copy link
Owner

  1. Why the num of (fps, values) pairs in the training dataset is 398581 at the beginning?And how to filter them into 299102 pairs?

I don't remember the exact numbers. But it seems that you are talking about the total number of data points in the whole dataset and in the training split. If that is the case, the training data points were sampled randomly from the entire dataset. The remaining data points are used for validation and testing.

  1. How to build negative samples? For example, for a given target molecule, there are 19 candidate reactions and 1 best reaction in root node, so can I generate 19 negative samples?

Yes, that's what I did.

For each negative sample, should I append the candidate reaction's cost in data_dict['reaction_costs'] ,

Yes.

append the true route cost in data_dict['target_values'],

Yes, the best route cost.

append the candidate reactant's fingerprint in data_dict['reactant_fps']?

Yes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants