-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cmd/compile: lay out loop-free, likeliness-free control flow more compactly #20356
Comments
Seems like the layout pass would benefit from being made generally loop-aware. |
Does it improve performance by laying b6 and b19 together? Does the old What tool did you use to generate the graphs? I have done this a few times with pen and paper. A tool seems very helpful. |
CL https://golang.org/cl/43464 mentions this issue. |
I think it will, in larger functions. There are trade-offs--number of jumps, fwd vs backward, code compactness, jump encoding, etc. I need to experiment, although there are enough other layout-related things in flight (CL 43293, issue #20355), that I'd like to wait a little bit. Just filing this so I don't forget and to gather input from experienced hands.
I don't know. I'd be curious to find out.
I used CL 43464, which gopherbot has helpfully linked to. It has major problems, though. Maybe I'll email golang-dev to get opinions and solicit help on it... |
@laboger had mentioned that our likeliness/layout was not all that they had hoped. One possibility I had intended to try was to introduce at least one more level of (un)likeliness, for things like branches to panics where we are "really sure" about likeliness, versus all the cases where we're making less-educated guesses. As a general rule we expect p(loopbackedge) > p(return) > p(panic). |
Note to self: Interesting test cases for this are moderate-complexity autogenerated SSA rule functions, like rewriteValuedec_OpStore_0. |
See CL 43293 for branches to panics; could be improved, but it is a simple and fairly effective first cut. And see the first (somewhat awful) patchset of that CL for what adding new likeliness levels looks like. And yes, I think spending some time in 1.10 on likeliness and code layout is worthwhile. |
Another observation for 1.10 use. It appears that a lot of the non-compactness I observed here may simply be due to the fact that posdegree (and zerodegree) are effectively stacks instead of queues. Though that may become irrelevant if lay out code based primarily on loop nesting information. |
CL https://golang.org/cl/43501 mentions this issue. |
Noticed while looking at #20356. Cuts 160k (1%) off of the cmd/compile binary. Change-Id: If2397bc6971d6be9be6975048adecb0b5efa6d66 Reviewed-on: https://go-review.googlesource.com/43501 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
I think CL 43491 (and possibly @dr2chase's follow-up) is enough to address the 1.9 regression. Punting this to 1.10. |
DO NOT SUBMIT This CL adds CFG graphs to ssa.html. It execs dot to generate SVG, which then gets inlined into the html. Some standard naming and javascript hacks enable integration with the rest of ssa.html: Clicking on blocks highlights the relevant part of the CFG graph, and vice versa. Sample output and screenshots can be seen in issues golang#20355 and golang#20356. There are serious problems with this CL, though. Performance: * Calling dot after every pass is noticeably slow. * The generated output is giant. * The browser is very slow to render the resulting page. * Clicking on blocks is even slower than before. * Some things I want to do, like allow the user to change the table column widths, lock up the browser. Appearance: * The CFGs can easily be large and overwhelming. Toggling them on/off might be workable, if the performance concerns above were addressed. * I can't figure out the right size to render the CFGs; simple ones are currently oversized and cartoonish, while larger ones are unreadable. * They requires an unsatisfying amount of explanation (see golang#20356). Block layout information is particularly inferior/confusing. * Dead blocks float awkwardly in the sky, with no indication that they are dead. * It'd be nice to somehow add visual information about loops, which we can calculate, and which is non-obvious in large graphs, but I don't know how. * It'd be nice to add more information per block, like the number of values it contains, or even the values themselves, but adding info to a node makes the graph even less readable. Just adding the f.Blocks index in parens was not good. Bugs, incompleteness: * I seem to have broken highlighting around the entire block in the text. * Need to hook up some way to communicate dot-related errors without bringing down the process. * Might need some way to enable/disable dot entirely. Change-Id: I19abc3007f396bdb710ba7563668d343c0924feb
I am looking at this. I was able to generate CFG using tip and the @josharian tool for the program in the topic. It's somewhat different from what was posted 1+ year ago (not a surprise). I got this: For the full picture see the attached ssa.html: Now I wonder how can we optimize the layout? What is the desired order of blocks in this case? EDIT: I have uploaded improved versions of CFGs pictures. |
For anyone who will work on this. Current version of gc does not have branching in layout pass for this code: if i%2 == 0 {
odd = 2
} Layout pre/post b3 has the following code: b3: ← b2
v21 (+10) = ADDQconst <int> [-1069] v52 (odd[int])
v57 (+8) = SHRQconst <int> [63] v7
v60 (?) = MOVQconst <int> [0]
v36 (+8) = ADDQ <int> v57 v7
v31 (+8) = SARQconst <int> [1] v36
v50 (+8) = SHLQconst <int> [1] v31
v15 (8) = CMPQ <flags> v7 v50
v23 (13) = CMOVQEQ <int> v5 v21 v15 (odd[int])
v24 (+13) = ADDQ <int> v23 v52 (x[int])
Plain → b8 (8) It uses conditional move and thus you cannot replicate the bug with the code in the topic! |
Code that reproduces the problem: package p
//go:noinline
func g() {
}
func f() int {
x := 0
for i := 0; i < 10; i++ {
odd := 0
if i%2 == 0 {
odd = 2 // not 1, otherwise this branch gets optimized away!
g()
}
x += odd
// Distract the layout pass with a bunch of loops.
for j := 0; j < 10; j++ {
for j := 0; j < 10; j++ {
for j := 0; j < 10; j++ {
x++
}
}
}
}
return x
} |
I agree that it would solve this problem generally, in the case of i%2 == 0 we are pretty safe to assume that it's 50%/50%, while the compiler estimates that as unlikely. I've read in some very popular book that compiler could profile the code first to estimate likeness of branches and then make the optimization. Laughs and idealistic approaches aside, we can try to introduce what David has suggested. @josharian what do you think? |
While we're on the topic, it seems there is no github issue for PGO, that may well be the only good way to address issues such as this or the midstack inlining one. Is it deemed completely out of scope for gc? |
I’ll file a PGO issue. |
In this code, we have even odds of the
if i%2 == 0
branch being taken. But the code layout is pretty uneven. In this CFG, b3 is that branch, and b6 and b19 are the taken/not-taken blocks; both feed into b7. It seems like a good layout for this would be b3 b6 b19 b7 or b3 b19 b6 b7. But we put b19 at the very end of the function.This is a simplified version of something that also happens in the fannkuch benchmark. See also #20355 and #18977.
For details on how to read this image, see #20355.
cc @randall77 @dr2chase @cherrymui
Marking 1.9Maybe because we removed the old backend's instruction re-ordering pass during 1.9; this may help prevent regressions from that.
The text was updated successfully, but these errors were encountered: