-
Notifications
You must be signed in to change notification settings - Fork 790
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use cycle aware algorithm for propagating markers #4645
Conversation
b104dcc
to
f139e6b
Compare
petgraph::Graph is a delicate datastructure where node, edge identifiers are only stable across certain operations. Edges have to be removed carefully.
The strongly connected components algorithm both finds all cyclic subgraphs, and a topological order of them. This seems to be a good way to replace Topo with something cycle-aware. The new algorithm uses the fact that in a strongly connected component, all vertices reach each other through some edge. That means we can combine markers of all internal edges in a component. As an optimization, only outgoing edges pointing out of the current component are updated. I didn't look outside this function, so don't know if that's a reasonable choice.
Any reason in particular you chose Kosaraju's algorithm over Tarjan's? I thought the latter was faster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for looking at this -- I really appreciate it! I'm trying to reason through this with examples... Given this graph:
I think we want the marker on the blue node to be A and B and E
, but with this strategy, would it be... (A or B or C or D) and E
? Since we'd OR
the edges into the scc along with all the edges within the scc?
Oops, I haven't given and/or any thought at all, just moved existing code around, so you're probably right. If we take that picture, but you had two edge at the start, A and A', both with the same target.
|
Same picture but more colors: Each edge is a marker condition, and we want each node to have the complete marker condition required for it to be included in a a resolution... So I think the desired end state here is:
Yellow could also be written as: A or (A and B and C and D), but that just simplifies to A. In general, what I was trying to do was: edges into a node are |
@Peiffap the reason was that this implementation is non-recursive so I know it don't have any stack issues. Don't know if it matters, though 🙂 |
Thanks. So I'm trying to find the rule for the cycle. Before we propagate, what's the node weight for Yellow, it's already A? And for Red, already B? |
It starts off as |
There is no nice property for each component/cyclic part as a unit. I think the current code (feedback arc set) seems ok so far then. I don't know how it selects edges to remove, so if it makes the wrong choice the result could be incorrect? Based on two cases I've looked at, reverse postorder seems to work well, it would skip the edges we want to skip? (No definition - probably doesn't hold up) I think your graph has exactly one initial node (one root), which makes this easier. |
I'm trying to follow along and understand what you are trying to do here (both with this particular implementation and the general propagation problem you are attempting to solve). If I understand correctly, you are trying to find, for each node N in the graph, all possible paths starting from a root node (i.e. a node with no incoming edges) and ending in N, and then you want to combine the markers found along the various paths. Is that right? |
Yes, that's right. |
Thanks for confirming. DFS from the root nodes, propagating markers along as long as there is no cycle; when there is a cycle, you simply backtrack without propagating (because the propagation would lead to something like (markers to A) or ((markers to A) and (markers in cycle)), which simplifies to (markers to A)). |
I was looking at this graph. It had to be a bit complicated to show that the simple approaches I was thinking of didn't work. (You could hope this graph is not realistic but I think it's a possible graph?) Finding all paths without cycles in the path sounds right. For example, to the node 3 I think all these are valid:
You could imagine the two last paths being realistic and relevant when the marker A is inactive? It's probably an optimization to run the all paths search in each SCC separately. I think the cycle-free thing means that we can ask for that no path has a prefix that is another complete path (to that node). It's only necessary to find paths starting with the incident edges to the SCC (we should always have some edge incident, because the root is never in a cycle, hopefully?) @Peiffap So if we run dfs from both node 2 and 4, then we find all the required paths from 2 and 4 to 3. I think that the dfs from the root (0) would not find all the required paths to 3. graph in Rust let mut gr = petgraph::Graph::new();
let n0 = gr.add_node("0");
let n1 = gr.add_node("1");
let n2 = gr.add_node("2");
let n3 = gr.add_node("3");
let n4 = gr.add_node("4");
let n5 = gr.add_node("5");
let n6 = gr.add_node("6");
let n7 = gr.add_node("7");
let n8 = gr.add_node("8");
gr.add_edge(n0, n2, "A");
gr.add_edge(n0, n1, "B");
gr.add_edge(n2, n3, "D");
gr.add_edge(n3, n6, "I");
gr.add_edge(n3, n4, "E");
gr.add_edge(n1, n4, "C");
gr.add_edge(n4, n5, "F");
gr.add_edge(n5, n6, "H");
gr.add_edge(n6, n2, "J");
gr.add_edge(n5, n7, "K");
gr.add_edge(n6, n8, "L");
gr.add_edge(n5, n3, "G"); |
I looked at computing all non repeating paths here: petgraph/petgraph#650 It can be used on the whole graph or a subgraph (intention: used on one strongly connected component at a time.) I'm not sure if it does the useful thing we need, but it might. I wanted to write the algorithm separately from the uv logic, I think it's better that way and easier to understand the code. (Not saying the code has to live in petgraph.)
I meant the root of the graph in uv's code - the white node in @charliermarsh's picture and node 0 in my picture. IIUC, this is always present and it means any nontrivial SCC (has more than one node) does not contain the root and always has edges incident to the SCC (from the root or from other predecessor scc). This is useful in my example - we know that there are incident edges, we only need to compute paths from 2 and 4 in the scc? |
@bluss - did you determine that doing what we did previously but with reverse-postorder rather than topo doesn't work? |
That PR looks nice! I'll give it a more detailed look later, but it makes a lot of sense to me to focus on trails (i.e. no repeated edges) rather than paths (no repeated nodes). I think you can make the assumption that the starting node never has incident edges; I believe it just represents a starting state, i.e. with no propagated markers. If that isn't the case yet for some reason, you should still be able to extend the graph by adding a new "root" node with no incident edges? I think you're right about incident edges; for the example graph, 2 and 4 are the only possible entry points to the SCC, so you could limit yourself to computing the trails to those and then computing the trails starting in those to the rest of the SCC. Perhaps it also makes sense to only compute trails to nodes that have outgoing edges from the SCC (i.e. 5 and 6), since all the "internal" nodes (only 3 in your example) in the SCC will be covered by the trails from (2 or 4) to (5 or 6)? |
(One slight hitch is that we want to remove markers if you have an inbound edge that doesn't have any markers. Right now that's represented by |
Do you mean an inbound edge from the root node, or any inbound edge? If it's the former and you apply the change you're discussing, wouldn't you end up with |
Yes that's right -- inbound edge from the root node (propagated outward), so we could simplify at the end. |
@charliermarsh yes I think with my example graph it doesn't work to visit in that order. |
The complexity of the solution doesn't feel right, it feels unproportionally complicated. 🙂 I think we have an algorithm to do it now, using all_simple_paths_from petgraph/petgraph#650 The choices for how to use node vs edge weights doesn't matter so much (can be changed?) but I think it would be good to use edge weights as read-only inputs and only update node weights.
Example
Compute
We process all paths from all input nodes together because all input nodes to an scc need to be considered in parallel. If there never was more than one input node, most of the complexity would be gone. We have the following rules for how 'no marker'/None combines I believe: None OR X = None; None AND X = X; |
This is my attempt at this - I don't exactly have much prior experience with
Uv's codebase and w.r.t graph algorithms I'm pretty rusty.
Let's look at this critically and try to find out if it does the right thing.
The strongly connected components algorithm both finds all cyclic
subgraphs, and a topological order of them. This seems to be a good
way to replace Topo with something cycle-aware.
The new algorithm uses the fact that in a strongly connected component,
all vertices reach each other through some edge. That means we can
combine markers of all internal edges in a component.
As an optimization, only outgoing edges pointing out of the current
component are updated. I didn't look outside this function, so don't
know if that's a reasonable choice.
There are useful pictures in this article:
https://en.wikipedia.org/wiki/Strongly_connected_component
Test Plan
Using existing tests