Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

irinterp: add Tarjan SCC algorithm for reachability #52966

Merged
merged 2 commits into from
Jan 23, 2024

Conversation

topolarity
Copy link
Member

@topolarity topolarity commented Jan 18, 2024

This PR optimizes kill_edge! for IR interp.

The basic algorithm flow is:

   1. Check whether `target` of dead edge is unreachable, which is true iff:
       - Reducible CFG node: there are no live incoming forward edges (predecessors)
       - Irreducible CFG node: Tarjan's SCC algorithm reports no live incoming forward edges to the SCC
   2. If `target` is dead, repeat (1) for all of its outgoing edges

This maintains reachability information very efficiently, especially for reducible CFG's which are overwhelmingly common.

As an added bonus, CFGReachability can also be consulted to check which BasicBlocks are part of an irreducible loop.

@topolarity topolarity requested a review from Keno January 18, 2024 16:38
@topolarity
Copy link
Member Author

topolarity commented Jan 18, 2024

This is 95% ready for review, but there's still a bit of clean-up and a test or two that I want to add before merging.

All done!

@topolarity topolarity force-pushed the ct/tarjan-reachability branch 2 times, most recently from d2cdc21 to bf659ab Compare January 19, 2024 02:29
@topolarity topolarity requested a review from aviatesk January 19, 2024 04:25
This optimizes `kill_edge!` for IR interp. The basic algorithm flow is:

   1. Check whether `target` of dead edge is unreachable, i.e. iff:

       - Reducible CFG node: there is no live incoming forward edge

       - Irreducible CFG node: Tarjan's SCC algorithm reports no live
         incoming forward edges to the SCC

   2. If `target` is dead, repeat (1) for all of its outgoing edges

This maintains reachability information very efficiently, especially
for reducible CFG's which are overwhelmingly common. As an added bonus
CFGReachability can also be consulted to check which BasicBlocks are
part of an irreducible loop.
@topolarity topolarity force-pushed the ct/tarjan-reachability branch 2 times, most recently from 1d015fb to d6adba1 Compare January 19, 2024 16:37
@Keno
Copy link
Member

Keno commented Jan 19, 2024

I'm quite suspicious of the use of the domtree in this. Domtree by itself already maintains reachability, so if the domtree is used internally, it needs to be maintained, so this will be no faster. What I would have expected here is an implementation of decremental scc maintenance (see https://theory.stanford.edu/~virgi/cs267/papers/lacki-dec-scc.pdf for the description of the algorithm - there's also more recent algorithms that are asymptotically better but way more complicated https://arxiv.org/abs/1901.03615) and no domtree.

@topolarity
Copy link
Member Author

I'm quite suspicious of the use of the domtree in this. Domtree by itself already maintains reachability, so if the domtree is used internally, it needs to be maintained, so this will be no faster. What I would have expected here is an implementation of decremental scc maintenance (see https://theory.stanford.edu/~virgi/cs267/papers/lacki-dec-scc.pdf for the description of the algorithm - there's also more recent algorithms that are asymptotically better but way more complicated https://arxiv.org/abs/1901.03615) and no domtree.

I'm not 100% sure it's the same as the paper you link, but this is a decremental SCC maintenance algorithm (the key optimization is that when removing an edge it only traverses nodes in the SCC of the target block).

The novel trick is that if you know that a dominates b, then we can safely ignore back-edges b -> a when considering reachability in that graph (because the entry block reaches a through that edge only if it can reach a through some other edge).

Since removing edges only adds dominance relations, there's no requirement to keep the domtree up-to-date

@Keno
Copy link
Member

Keno commented Jan 20, 2024

Alright, I've finally understood what you're trying to do here. What you're maintaining is basically the reducible condensation of the CFG.

@topolarity
Copy link
Member Author

Yeah, that sounds right.

Just to put it in other words, this is computing the condensation of the CFG with reducible back-edges removed. Probably could have said that more clearly, sorry 😅

Copy link
Member

@Keno Keno left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is sensible. At some point, I'd like some more docs and explanations (probably in the docstring of CFGReachability). I'd also like to see a refactor of the tarjan code to split out the filter explicitly (or maybe we should define a graph abstraction and then have a filtered version). But since this solves immediate performance issues, I think this is fine to go in.

This still isn't an entirely generic version of the algorithm, but it's
certainly a step in the right direction.
@topolarity
Copy link
Member Author

I'm going to merge this for now, and we can circle back on the domtree-related questions when we resolve #53013 and #53011

@topolarity topolarity merged commit 5824c73 into JuliaLang:master Jan 23, 2024
7 checks passed
@topolarity
Copy link
Member Author

Also just to record a few benchmarks:

     CFG size |    old     |   new     | speedup
     6 blocks | 2.922e-06  | 5.79e-07  | (5.04663x faster)
    25 blocks | 5.174e-06  | 3.48e-07  | (14.8678x faster)
   134 blocks | 3.7766e-05 | 6.58e-07  | (57.3951x faster)
 49500 blocks | 0.00761317 | 2.852e-06 | (2669.42x faster)

These are the average edge-deletion times for a few different IR's - mostly from the sysimage, except for the last one which is a downstream application with very large functions.

The smaller cases are overwhelmingly more common, so this won't make a significant difference for most normal functions.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants