-
Notifications
You must be signed in to change notification settings - Fork 12.7k
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Auto merge of #85013 - Mark-Simulacrum:dominators-bitset, r=pnkfelix
Replace dominators algorithm with simple Lengauer-Tarjan This PR replaces our dominators implementation with that of the simple Lengauer-Tarjan algorithm, which is (to my knowledge and research) the currently accepted 'best' algorithm. The more complex variant has higher constant time overheads, and Semi-NCA (which is arguably a variant of Lengauer-Tarjan too) is not the preferred variant by the first paper cited in the documentation comments: simple Lengauer-Tarjan "is less sensitive to pathological instances, we think it should be preferred where performance guarantees are important" - which they are for us. This work originally arose from noting that the keccak benchmark spent a considerable portion of its time (both instructions and cycles) in the dominator computations, which sparked an interest in potentially optimizing that code. The current algorithm largely proves slow on long "parallel" chains where the nearest common ancestor lookup (i.e., the intersect function) does not quickly identify a root; it is also inherently a pointer-chasing algorithm so is relatively slow on modern CPUs due to needing to hit memory - though usually in cache - in a tight loop, which still costs several cycles. This was replaced with a bitset-based algorithm, previously studied in literature but implemented directly from dataflow equations in our case, which proved to be a significant speed up on the keccak benchmark: 20% instruction count wins, as can be seen in [this performance report](https://perf.rust-lang.org/compare.html?start=377d1a984cd2a53327092b90aa1d8b7e22d1e347&end=542da47ff78aa462384062229dad0675792f2638). This algorithm is also relatively simple in comparison to other algorithms and is easy to understand. However, these performance results showed a regression on a number of other benchmarks, and I was unable to get the bitsets to perform well enough that those regressions could be fully mitigated. The implementation "attempt" is seen here in the first commit, and is intended to be kept primarily so that future optimizers do not repeat that path (or can easily refer to the attempt). The final version of this PR chooses the simple Lengauer-Tarjan algorithm, and implements it along with a number of optimizations found in literature. The current implementation is a slight improvement for many benchmarks, with keccak still being an outlier at ~20%. The implementation in this PR first implements the most basic variant of the algorithm directly from the pseudocode on page 16, physical, or 28 in the PDF of the first paper ("Linear-Time Algorithms for Dominators and Related Problems"). This is then followed by a number of commits which update the implementation to apply various performance improvements, as suggested by the paper. Finally, the last commit annotates the implementation with a number of comments, mostly drawn from the paper, which intend to help readers understand what is going on - these are incomplete without the paper, but writing them certainly helped my understanding. They may be helpful if future optimization attempts are attempted, so I chose to add them in.
- Loading branch information
Showing
2 changed files
with
235 additions
and
51 deletions.
There are no files selected for viewing
275 changes: 224 additions & 51 deletions
275
compiler/rustc_data_structures/src/graph/dominators/mod.rs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters