Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add function to check if there is a route with a given set of starting molecules #34

Closed
wants to merge 5 commits into from

Conversation

AustinT
Copy link
Collaborator

@AustinT AustinT commented Nov 14, 2023

In the paper FusionRetro they proposed checking whether the starting molecules for a route found during multi-step planning match the starting materials used in an actual synthesis route. This is essentially a less strict version of checking whether an exact ground-truth route is found. This PR implements code to check for the existence of a route with a specific set of starting materials.

In FusionRetro this metric is called "set-wise exact match", however this name is very vague so I instead called it is_route_with_starting_mols. I only wrote a version for AND/OR graphs because these typically contain a prohibitively large number of routes to explicitly enumerate them. The method is essentially a brute-force search with pruning of search branches. The inner workings of the methods are described reasonably-well in the docstrings (but I want to improve the description before merging)

I also fixed a random typo in a conftest.py docstring that I found while writing tests.

TODO:

  • I need to add a CHANGELOG entry for this.
  • should appr
  • Check that pruned search works reasonably well in practice

@AustinT AustinT requested a review from kmaziarz November 14, 2023 17:47
Copy link
Contributor

@kmaziarz kmaziarz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A very general question: what would happen if we just set the is_purchasable attribute to False for all nodes apart from those in starting_mols, and then computed the has_solution attribute using run_message_passing? Would that not give us what we want? Presumably this would give an affirmative answer even if a route exists that uses a subset of starting_mols, so a related question would be whether we only care about routes that use exactly that set of leaves instead of its subset?

(Even more generally, I wonder how useful this graph-based formulation of this metric is: if a graph contains a combinatorial number of routes, then stating that there exists a route with the right set of leaves seems not so practically useful. But I guess it doesn't hurt to have a function to compute this.)

Comment on lines +79 to +83
# Initialize to empty sets, except for nodes with purchasable mols
node_to_mols: dict[ANDOR_NODE, set[Molecule]] = {n: set() for n in graph.nodes()}
for n in graph.nodes():
if isinstance(n, OrNode):
node_to_mols[n].add(n.mol)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says "purchasable", but is this reflected in the code?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch: I used to check whether a molecule is purchasable but removed it to make the method more general (i.e. the nodes marked purchasable in the graph don't need to be the starting molecules). I will fix this by changing the comment.

@kmaziarz kmaziarz requested a review from mrwnmsr November 14, 2023 21:55
@AustinT
Copy link
Collaborator Author

AustinT commented Nov 14, 2023

A very general question: what would happen if we just set the is_purchasable attribute to False for all nodes apart from those in starting_mols, and then computed the has_solution attribute using run_message_passing? Would that not give us what we want? Presumably this would give an affirmative answer even if a route exists that uses a subset of starting_mols, so a related question would be whether we only care about routes that use exactly that set of leaves instead of its subset?

(Even more generally, I wonder how useful this graph-based formulation of this metric is: if a graph contains a combinatorial number of routes, then stating that there exists a route with the right set of leaves seems not so practically useful. But I guess it doesn't hurt to have a function to compute this.)

You are right the has_solution approach would compute whether there exists a route that uses any subset of the starting molecules. The FusionRetro paper specifies an exact match. I'm not totally convinced by this metric but I think it is a proxy for a proposed route not being too different than a real route. I agree that the route existing "somewhere in the graph" is maybe not super useful.

I thought a bit about how to do this but could not think of a better way than what I coded: my impression is that a dynamic programming solution is not possible here because it there is no clear way to divide the problem into independent subproblems.

@AustinT AustinT marked this pull request as draft November 15, 2023 13:51
@AustinT
Copy link
Collaborator Author

AustinT commented Nov 15, 2023

I converted this to a draft PR since some experiments I am running are very slow. I might change the implementation.

@AustinT
Copy link
Collaborator Author

AustinT commented Nov 15, 2023

Just added some extra pruning code to increase the efficiency. I updated the main PR description to reflect this.

@AustinT
Copy link
Collaborator Author

AustinT commented Nov 21, 2023

I think I will close this PR because:

  1. I did not end up using the function it provided
  2. In some instances the function was very slow (taking hours for just a modest graph). Unclear why this is.
  3. I am not 100% sure it is correct (I wrote tests but they were not super thorough)
  4. The precursor matching metric can be calculated directly from synthesis routes (added in Add method to get starting molecules of a synthesis route #36)

We can re-visit and re-open this PR later if we want to add this function.

@AustinT AustinT closed this Nov 21, 2023
AustinT added a commit that referenced this pull request Nov 21, 2023
First fix: error message for converting an AND/OR graph to a synthesis
route would say there was more than one node if in fact there were zero
nodes.

Second fix: reaction string in `conftest.py`. This was also fixed in
#34, but since I am not merging that I decided to fix it here.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants