-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: combine extension with register loads and stores on amd64 #15300
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Yes, I don't have any good ideas here. Some modified Arg value might work, say with a memory layout type and a post-load type, but it's ugly. |
Note that it's not just Args, it's also loading spilled registers. |
DO NOT REVIEW [code freeze] This CL teaches SSA to recognize code of the form // b is a boolean value, i is an int of some flavor if b { i = 1 } else { i = 0 } and use b's underlying 0/1 representation for i instead of generating jumps. Unfortunately, it does not work on the obvious code: func bool2int(b bool) int { if b { return 1 } return 0 } This is left for future work. Note that the existing phiopt optimizations also don't work for: func neg(b bool) bool { if b { return false } return true } In the meantime, runtime authors and the like can use: func bool2int(b bool) int { var i int if b { i = 1 } else { i = 0 } return i } This compiles to: "".bool2int t=1 size=16 args=0x10 locals=0x0 0x0000 00000 (x.go:25) TEXT "".bool2int(SB), $0-16 0x0000 00000 (x.go:25) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (x.go:25) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:32) MOVBLZX "".b+8(FP), AX 0x0005 00005 (x.go:32) MOVBQZX AL, AX 0x0008 00008 (x.go:32) MOVQ AX, "".~r1+16(FP) 0x000d 00013 (x.go:32) RET The extraneous MOVBQZX is golang#15300. This optimization also helps range and slice. The compiler must protect against pointers pointing to the end of a slice/string. It does this by increasing a pointer by either 0 or 1 * elemsize, based on a condition. This CL optimizes away a jump in that code. This CL triggers 382 while compiling the standard library. Updates golang#6011 Change-Id: Ia7c1185f8aa223c543f91a3cd6d4a2a09c691c70
DO NOT REVIEW [code freeze] This CL teaches SSA to recognize code of the form // b is a boolean value, i is an int of some flavor if b { i = 1 } else { i = 0 } and use b's underlying 0/1 representation for i instead of generating jumps. Unfortunately, it does not work on the obvious code: func bool2int(b bool) int { if b { return 1 } return 0 } This is left for future work. Note that the existing phiopt optimizations also don't work for: func neg(b bool) bool { if b { return false } return true } In the meantime, runtime authors and the like can use: func bool2int(b bool) int { var i int if b { i = 1 } else { i = 0 } return i } This compiles to: "".bool2int t=1 size=16 args=0x10 locals=0x0 0x0000 00000 (x.go:25) TEXT "".bool2int(SB), $0-16 0x0000 00000 (x.go:25) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (x.go:25) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:32) MOVBLZX "".b+8(FP), AX 0x0005 00005 (x.go:32) MOVBQZX AL, AX 0x0008 00008 (x.go:32) MOVQ AX, "".~r1+16(FP) 0x000d 00013 (x.go:32) RET The extraneous MOVBQZX is golang#15300. This optimization also helps range and slice. The compiler must protect against pointers pointing to the end of a slice/string. It does this by increasing a pointer by either 0 or 1 * elemsize, based on a condition. This CL optimizes away a jump in that code. This CL triggers 382 while compiling the standard library. Updates golang#6011 Change-Id: Ia7c1185f8aa223c543f91a3cd6d4a2a09c691c70
CL https://golang.org/cl/22711 mentions this issue. |
This CL teaches SSA to recognize code of the form // b is a boolean value, i is an int of some flavor if b { i = 1 } else { i = 0 } and use b's underlying 0/1 representation for i instead of generating jumps. Unfortunately, it does not work on the obvious code: func bool2int(b bool) int { if b { return 1 } return 0 } This is left for future work. Note that the existing phiopt optimizations also don't work for: func neg(b bool) bool { if b { return false } return true } In the meantime, runtime authors and the like can use: func bool2int(b bool) int { var i int if b { i = 1 } else { i = 0 } return i } This compiles to: "".bool2int t=1 size=16 args=0x10 locals=0x0 0x0000 00000 (x.go:25) TEXT "".bool2int(SB), $0-16 0x0000 00000 (x.go:25) FUNCDATA $0, gclocals·23e8278e2b69a3a75fa59b23c49ed6ad(SB) 0x0000 00000 (x.go:25) FUNCDATA $1, gclocals·33cdeccccebe80329f1fdbee7f5874cb(SB) 0x0000 00000 (x.go:32) MOVBLZX "".b+8(FP), AX 0x0005 00005 (x.go:32) MOVBQZX AL, AX 0x0008 00008 (x.go:32) MOVQ AX, "".~r1+16(FP) 0x000d 00013 (x.go:32) RET The extraneous MOVBQZX is #15300. This optimization also helps range and slice. The compiler must protect against pointers pointing to the end of a slice/string. It does this by increasing a pointer by either 0 or 1 * elemsize, based on a condition. This CL optimizes away a jump in that code. This CL triggers 382 times while compiling the standard library. Updating code to utilize this optimization is left for future CLs. Updates #6011 Change-Id: Ia7c1185f8aa223c543f91a3cd6d4a2a09c691c70 Reviewed-on: https://go-review.googlesource.com/22711 Run-TryBot: Josh Bleecher Snyder <josharian@gmail.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Change https://golang.org/cl/115617 mentions this issue: |
We generate MOVBLZX for byte-sized LoadReg, so (MOVBQZX (LoadReg (Arg))) is the same as (LoadReg (Arg)). Remove those zero extension where possible. Triggers several times during all.bash. Fixes #25378 Updates #15300 Change-Id: If50656e66f217832a13ee8f49c47997f4fcc093a Reviewed-on: https://go-review.googlesource.com/115617 Run-TryBot: Ilya Tocar <ilya.tocar@intel.com> TryBot-Result: Gobot Gobot <gobot@golang.org> Reviewed-by: Keith Randall <khr@golang.org>
Extending a register load or store currently generates suboptimal code with the SSA backend. In some cases, this is a regression from the old backend. For example:
func load8(i uint8) uint64 { return uint64(i) }
Generates:
The old back end generates:
I tried fixing this in CL 21838, but the fix was partial and probably not in the right place.
It's hard to do this as part of the arch-specific rewrite rules, because (a) you lose the type extension work that the ssa conversion did for you and have to recreate it later and (b) you don't know where all the register loads and stores will be, because regalloc hasn't run.
However, it's hard to do this as part of converting final SSA to instructions (genvalue), since that's really geared to handle one value at a time, in isolation.
Teaching regalloc to combine these MOVs seems arch-specific and would further complicate already complicated machinery. Maybe the thing to do is to add an arch-specific rewrite pass after regalloc ("peep"?), using hand-written rewrite rules. Input requested.
Related, for those extension MOVs that remain, we should test whether CWB and friends are desirable--they are shorter, but are register-restricted and the internet disagrees about whether they are as fast.
Here are some test cases:
cc @randall77
The text was updated successfully, but these errors were encountered: