-
Notifications
You must be signed in to change notification settings - Fork 112
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Utilize dominators for constructing extended basic blocks #142
Comments
Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated. Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block. See: sysprog21#142
Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated. Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block. As shown in the performance results below, we gain 4% performance improvement when running coreMark and SciMark2 and lost 1% performance when running dhrysone. * Intel Core i7-11700 | Metric | origin | proposed |Speedup| |------------+---------+----------+-------| | CoreMark | 2193.28 | 2289.26 | +4% | | SciMark2 | 13.45 | 18.48 | +4% | | Dhrystone | 1413.11 | 1447.11 | -1% | See: sysprog21#142
Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated. Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block. As shown in the performance results below, we gain 4% performance improvement when running coreMark and SciMark2 and lost 1% performance when running dhrysone.
|
Previously, we employed recursive jump translation to implement an extended basic block. Nevertheless, this approach makes it challenging to detect loop paths since we position the loop's entry block inside the block, and its using frequency would not be updated. Based on this observation, it becomes necessary to eliminate the recursive jump translation. By doing so, we can accurately update the using frequency of the loop's entry block. As shown in the performance results below, we gain 4% performance improvement when running coreMark and 37% when running SciMark2, but we lost 1% performance when running dhrysone. * Intel Core i7-11700 | Metric | origin | proposed |Speedup| |------------+---------+----------+-------| | CoreMark | 2193.28 | 2289.26 | +4% | | SciMark2 | 13.45 | 18.48 | +37% | | Dhrystone | 1413.11 | 1447.11 | -1% | See: sysprog21#142
The effectiveness has been confirmed. We shall look for further faster approaches for loop detection. |
Quoted from Basic Blocks and CFG, the definition of extended basic block (EBB):
We can identify loops by using dominators:
back edge: An edge in the flow graph, whose head dominates its tail (example - edge from B6 to B4).
A loop consists of all nodes dominated by its entry node (head of the back edge) and having exactly one back edge in it.
Intercept contains an effective dominator implementation. See
Usage:
Similarly, blink comes with an approach to detect loops during code generation.
The text was updated successfully, but these errors were encountered: