Description
What version of Go are you using (go version
)?
$ go version go version go1.15.6 linux/amd64
Does this issue reproduce with the latest release?
Yes.
What operating system and processor architecture are you using (go env
)?
go env
Output
$ go env GO111MODULE="on" GOARCH="amd64" GOBIN="" GOCACHE="/home/cs/.cache/go-build" GOENV="/home/cs/.config/go/env" GOEXE="" GOFLAGS="" GOHOSTARCH="amd64" GOHOSTOS="linux" GOINSECURE="" GOMODCACHE="/home/cs/development/golang/pkg/mod" GONOPROXY="" GONOSUMDB="" GOOS="linux" GOPATH="/home/cs/development/golang" GOPRIVATE="" GOPROXY="https://proxy.golang.org,direct" GOROOT="/home/cs/go1.15.6" GOSUMDB="sum.golang.org" GOTMPDIR="" GOTOOLDIR="/home/cs/go1.15.6/pkg/tool/linux_amd64" GCCGO="gccgo" AR="ar" CC="gcc" CXX="g++" CGO_ENABLED="1" GOMOD="/dev/null" CGO_CFLAGS="-g -O2" CGO_CPPFLAGS="" CGO_CXXFLAGS="-g -O2" CGO_FFLAGS="-g -O2" CGO_LDFLAGS="-g -O2" PKG_CONFIG="pkg-config" GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build804757202=/tmp/go-build -gno-record-gcc-switches"
What did you do?
I maintain zrepl, a tool for ZFS replication written in Go.
Some users have been hitting Go runtime panics and/or lock-ups on FreeBSD.
- go runtime panics and/or zrepl process hangs on FreeBSD 12 zrepl/zrepl#343 (comment)
fatal error: bad g->status in ready
- go runtime panics and/or zrepl process hangs on FreeBSD 12 zrepl/zrepl#343 (comment)
fatal error: wirep: invalid p state
- FreeBSD >= 12.2, < 13.0: Go runtime deadlocks and/or panics on multicore systems zrepl/zrepl#411 (comment)
(hung in runtime.newobject)
The problems started with reports in July 2020 with a user on a development version of FreeBSD (12-STABLE, it must have been somewhere between 12.1-RELEASE and 12.2-RELEASE).
The problem occurred both with Go 1.13.5 and 14.*.
At the time I requested to reproduce the problem on an official binary release of FreeBSD but that did not happen.
After I updated my personal FreeBSD server to 12.2-RELEASE, I started to encounter similar issues as reported in July 2020.
I have not yet encountered runtime panics.
But several lock-ups (for lack of a better term) of the Go runtime.
The last of the links above contains a stack trace of a goroutine blocked forever on runtime.newobject (stack obtained using dlv).
Summary of my triaging since July:
- We have ruled out that it's due to the one use of unsafe in zrepl by removing the unsafe code path in a test build. The panics / lock-ups still occurred.
- The problems stop reproducibly when limiting the process to one CPU using the OS scheduler (
cpuset
). - Most often the problems happen while sockets are being used. It does not happen when the daemon is idle.
- We have ruled out faulty hardware.
- The issue has only occurred on Intel systems on bare metal so far. I was unable to reproduce it in a stress test between two FreeBSD VMs on a Ryzen 1700X.
- I suspect the root cause is one of the following:
- FreeBSD kernel bug introduced between 12.1-RELEASE and 12.2-RELEASE
- Go runtime bug (would need to be present in multiple Go versions though)
It would be very helpful to get a quick explanation of what these panics mean so that I can narrow down my audit of the changes between FreeBSD 12.1-RELEASE and 12.2-RELEASE.
Also, I can offer a tmate
or similar to a Go / FreeBSD developer to the system with the locked-up daemon.
The lock-up usually occurs after 2-3 days on my system, sometimes sooner, but I can leave it in the locked-up state for a day or two.
Related zrepl issues:
Metadata
Metadata
Assignees
Labels
Type
Projects
Status