-
Notifications
You must be signed in to change notification settings - Fork 18k
cmd/compile: structs with more than the hardcoded 4 words limits will always be spilled onto the stack even when passed in and out through registers by regabi #72897
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
Now, I expected the above to be the end of the story. But I tried unrolling the arguments, such that X takes a dozen uintptrs as arguments and then returns them. And... the code is exactly what I want?! func X2(
x0 F,
x1 uint64, x2 uint64, x3 uint64, x4 uint64,
x5 uint64, x6 uint64, x7 uint64, x8 uint64,
) (
F,
uint64, uint64, uint64, uint64,
uint64, uint64, uint64, uint64,
) {
return x0(x0, x1, x2, x3, x4, x5, x6, x7, x8)
} TEXT command-line-arguments.X2(SB), NOSPLIT|ABIInternal, $80-72
PUSHQ BP
MOVQ SP, BP
SUBQ $72, SP
MOVQ (AX), R12
MOVQ AX, DX
CALL R12
ADDQ $72, SP
POPQ BP
RET It is somewhat obnoxious that this is forced to waste stack space for the callee, which will be nosplit as well, but that's not something I can reasonably expect to work. So, this is a bug with struct arguments specifically: struct arguments have the same ABI as individual values, but they result in terrible register allocation outcomes in the body. On the bright side, at least I can write threaded interpreters in Go, it's just going to be excruciating until this bug is fixed XD |
As an addendum, here's a godbolt with all the code, and some other functions. https://godbolt.org/z/bs8xvMsM7 Of interest is this variant: //go:nosplit
func Y(s state) state {
s.x1++
return s
} Not only does this spill its guts EXACTLY like |
And a microbenchmark, for good measure:
I am actually shocked at it being 10x slower. I expected a 2x slowdown at most! Harness: package x
import "testing"
type state struct {
x0 func(state) state
x1 uint64
x2 uint64
x3 uint64
x4 uint64
x5 uint64
x6 uint64
x7 uint64
x8 uint64
}
//go:nosplit
func X(s state) state {
return s.x0(s)
}
//go:nosplit
func Y(s state) state {
s.x1++
return s
}
type F func(
x0 F,
x1 uint64, x2 uint64, x3 uint64, x4 uint64,
x5 uint64, x6 uint64, x7 uint64, x8 uint64,
) (
F,
uint64, uint64, uint64, uint64,
uint64, uint64, uint64, uint64,
)
//go:nosplit
func X2(
x0 F,
x1 uint64, x2 uint64, x3 uint64, x4 uint64,
x5 uint64, x6 uint64, x7 uint64, x8 uint64,
) (
F,
uint64, uint64, uint64, uint64,
uint64, uint64, uint64, uint64,
) {
return x0(x0, x1, x2, x3, x4, x5, x6, x7, x8)
}
//go:nosplit
func Y2(
x0 F,
x1 uint64, x2 uint64, x3 uint64, x4 uint64,
x5 uint64, x6 uint64, x7 uint64, x8 uint64,
) (
F,
uint64, uint64, uint64, uint64,
uint64, uint64, uint64, uint64,
) {
x1++
return x0, x1, x2, x3, x4, x5, x6, x7, x8
}
func BenchmarkX(b *testing.B) {
b.Run("slow", func(b *testing.B) {
for b.Loop() {
X(state{x0: Y})
}
})
b.Run("fast", func(b *testing.B) {
for b.Loop() {
X2(Y2, 0, 0, 0, 0, 0, 0, 0, 0)
}
})
} |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
My bad it looks like I can't read assembly at 2am. I have a fix for this here https://go-review.googlesource.com/c/go/+/561695 but it makes compilation scaling worst than pseudo-linear. Maybe a dup of #65495 That very relevant here and would make 561695 acceptable to submit imo:
|
The test I've wrote for CL 561695 test the exact same behaviors (altho it's not as complete): // asmcheck
// Copyright 2024 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.
package a
type desc struct{ t, x, y, z, p int }
func enc() byte {
// amd64:-"MOV.+autotmp"
return byte(process().z)
}
//go:noinline
func process() desc {
return desc{}
} I do think this is a dup. |
Go version
gotip version devel go1.25-38d146d5 Sun Mar 16 15:46:25 2025 -0700 linux/amd64
Output of
go env
in your module/workspace:What did you do?
I wrote the following package and generated assembly from it.
This is an example of threaded code, where all operations are replaced with function calls that consume the whole state and return it by value, so that the state can remain in registers. This is a technique for writing highly optimized parsers, such as the one in UPB and Protobuf C++.
What did you see happen?
The output was the following assembly.
There are several problems with this code:
s
is spilled in two separate places, even though the ABI requires the caller to guarantee spill spacefor arguments.
state
's shape is pointer-free except for thex0 *funcval
, so this can't be aiding stack scanning.*x = y; y = *x
.What did you expect to see?
I expected to see approximately the following code.
Of course, this breaks symbolization of arguments/returns in backtraces (and, presumably debuggers). To my knowledge, there is no way to instruct gc to not perform such spills.
The text was updated successfully, but these errors were encountered: