Description
What version of Go are you using (go version
)?
$ go version go version go1.20.5 linux/amd64
Does this issue reproduce with the latest release?
With latest stable release (Go 1.20.5), yes. I have not tested Go 1.21 release candidate.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="" GOARCH="amd64" GOBIN="" GOCACHE="/home/jacob/.cache/go-build" GOENV="/home/jacob/.config/go/env" GOEXE="" GOEXPERIMENT="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/jacob/go/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/jacob/go" GOPRIVATE="" GOPROXY="direct" GOROOT="/usr/lib/golang" GOSUMDB="off" GOTMPDIR="" GOTOOLDIR="/usr/lib/golang/pkg/tool/linux_amd64" GOVCS="" GOVERSION="go1.20.5" GCCGO="gccgo" GOAMD64="v1" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/home/jacob/git/fyne/go.mod" GOWORK="" CGO_CFLAGS="-O2 -g" CGO_CPPFLAGS="" CGO_CXXFLAGS="-O2 -g" CGO_FFLAGS="-O2 -g" CGO_LDFLAGS="-O2 -g" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -Wl,--no-gc-sections -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build38465015=/tmp/go-build -gno-record-gcc-switches"
What did you do?
Link to code: https://go.dev/play/p/78gWoQEaUui
We have the AddXY
method that is inlineable with a cost of 12 and Components
that is inlineable with cost of 5 as methods on the size type:
type size struct {
x, y int
}
func (s size) AddXY(x, y int) size {
return size{s.x + x, s.y + y}
}
func (s size) Components() (int, int) {
return s.x, s.y
}
We then create two functions and an interface. Combining the two functions for simplified code we get AddSlowVector()
which gets a cost of 85 and thus isn't inlineable. However, manually inlining AddXY()
to produce AddFastVector()
gets us a cost of 79 and is inlineable and 15x faster with zero allocations.
type vector interface {
Components() (int, int)
}
func (s size) AddFastVector(v vector) size {
x, y := v.Components()
return size{s.x + x, s.y + y}
}
func (s size) AddSlowVector(v vector) size {
return s.AddXY(v.Components())
}
What did you expect to see?
I would expect these two to produce the exact same code because both Components()
and AddXY()
are inlineable with very small costs (the sum is less than the 80 cost limit). Simplifying AddFastVector()
to not duplicate the code from AddXY
should be inlineable and not be much slower. It would seem logical for the compiler to basically produce the fast function by inlining AddXY
in the slow function.
What did you see instead?
Simplifying AddFastVector()
into AddSlowVector()
produces a 15x slower function that allocates once. Not using an interface for the input into the function decreases the inline cost a lot and does not exhibit the same slowdown. I do understand that interfaces need to be devirtualized and that produces some overhead but there should not be such a huge difference between the two functions (that are both using interfaces) in my opinion.
BenchmarkPosition_Add/AddFastVector()-8 775631107 1.492 ns/op 0 B/op 0 allocs/op
BenchmarkPosition_Add/AddSlowVector()-8 43955944 22.84 ns/op 16 B/op 1 allocs/op
BenchmarkPosition_Add/AddFastSize()-8 1000000000 0.4322 ns/op 0 B/op 0 allocs/op
BenchmarkPosition_Add/AddSlowSize()-8 1000000000 0.4953 ns/op 0 B/op 0 allocs/op