Skip to content

runtime: panics and/or deadlocks on FreeBSD 12.2-RELEASE #43873

Open
@problame

Description

@problame

What version of Go are you using (go version)?

$ go version
go version go1.15.6 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env
GO111MODULE="on"
GOARCH="amd64"
GOBIN=""
GOCACHE="/home/cs/.cache/go-build"
GOENV="/home/cs/.config/go/env"
GOEXE=""
GOFLAGS=""
GOHOSTARCH="amd64"
GOHOSTOS="linux"
GOINSECURE=""
GOMODCACHE="/home/cs/development/golang/pkg/mod"
GONOPROXY=""
GONOSUMDB=""
GOOS="linux"
GOPATH="/home/cs/development/golang"
GOPRIVATE=""
GOPROXY="https://proxy.golang.org,direct"
GOROOT="/home/cs/go1.15.6"
GOSUMDB="sum.golang.org"
GOTMPDIR=""
GOTOOLDIR="/home/cs/go1.15.6/pkg/tool/linux_amd64"
GCCGO="gccgo"
AR="ar"
CC="gcc"
CXX="g++"
CGO_ENABLED="1"
GOMOD="/dev/null"
CGO_CFLAGS="-g -O2"
CGO_CPPFLAGS=""
CGO_CXXFLAGS="-g -O2"
CGO_FFLAGS="-g -O2"
CGO_LDFLAGS="-g -O2"
PKG_CONFIG="pkg-config"
GOGCCFLAGS="-fPIC -m64 -pthread -fmessage-length=0 -fdebug-prefix-map=/tmp/go-build804757202=/tmp/go-build -gno-record-gcc-switches"

What did you do?

I maintain zrepl, a tool for ZFS replication written in Go.
Some users have been hitting Go runtime panics and/or lock-ups on FreeBSD.

The problems started with reports in July 2020 with a user on a development version of FreeBSD (12-STABLE, it must have been somewhere between 12.1-RELEASE and 12.2-RELEASE).
The problem occurred both with Go 1.13.5 and 14.*.
At the time I requested to reproduce the problem on an official binary release of FreeBSD but that did not happen.

After I updated my personal FreeBSD server to 12.2-RELEASE, I started to encounter similar issues as reported in July 2020.
I have not yet encountered runtime panics.
But several lock-ups (for lack of a better term) of the Go runtime.
The last of the links above contains a stack trace of a goroutine blocked forever on runtime.newobject (stack obtained using dlv).

Summary of my triaging since July:

  • We have ruled out that it's due to the one use of unsafe in zrepl by removing the unsafe code path in a test build. The panics / lock-ups still occurred.
  • The problems stop reproducibly when limiting the process to one CPU using the OS scheduler (cpuset).
  • Most often the problems happen while sockets are being used. It does not happen when the daemon is idle.
  • We have ruled out faulty hardware.
  • The issue has only occurred on Intel systems on bare metal so far. I was unable to reproduce it in a stress test between two FreeBSD VMs on a Ryzen 1700X.
  • I suspect the root cause is one of the following:
    • FreeBSD kernel bug introduced between 12.1-RELEASE and 12.2-RELEASE
    • Go runtime bug (would need to be present in multiple Go versions though)

It would be very helpful to get a quick explanation of what these panics mean so that I can narrow down my audit of the changes between FreeBSD 12.1-RELEASE and 12.2-RELEASE.

Also, I can offer a tmate or similar to a Go / FreeBSD developer to the system with the locked-up daemon.
The lock-up usually occurs after 2-3 days on my system, sometimes sooner, but I can leave it in the locked-up state for a day or two.

Related zrepl issues:

Metadata

Metadata

Assignees

No one assigned

    Labels

    NeedsInvestigationSomeone must examine and confirm this is a valid issue and not a duplicate of an existing one.OS-FreeBSDcompiler/runtimeIssues related to the Go compiler and/or runtime.

    Type

    No type

    Projects

    Status

    Triage Backlog

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions