Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) #37688

Closed
peterbourgon opened this issue Mar 5, 2020 · 19 comments
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Milestone

Comments

@peterbourgon
Copy link

What version of Go are you using (go version)?

$ go version
go1.14 linux/amd64

Does this issue reproduce with the latest release?

Yes.

What operating system and processor architecture are you using (go env)?

go env Output
$ go env

What did you do?

I built a version of my program with a minor code change, and for the first time switching from the go 1.13 compiler to go 1.14. I then deployed that version of the program to my staging environment, all linux/amd64 machines.

What did you expect to see?

The program to start up normally on all hosts.

What did you see instead?

On about half of the hosts, the program crashed on startup with slight variations on this message:

runtime: pointer 0xc0034fbd40 to unused region of span span.base()=0xc006bf2000 span.limit=0xc006c67300 span.state=1
runtime: found in object at *(0xc000425110+0x28)
object=0xc000425110 s.base()=0xc000424000 s.limit=0xc000425fb0 s.spanclass=28 s.elemsize=208 s.state=mSpanInUse
 *(object+0) = 0x1010100000080
 *(object+8) = 0xc0034fb980
 *(object+16) = 0x8726ce
 *(object+24) = 0x0
 *(object+32) = 0x0
 *(object+40) = 0xc0034fbd40 <==
 *(object+48) = 0xc0a6c8
 *(object+56) = 0xc0034fbaf0
 *(object+64) = 0x872431
 *(object+72) = 0x0
 *(object+80) = 0x0
 *(object+88) = 0x0
 *(object+96) = 0x0
 *(object+104) = 0x0
 *(object+112) = 0x0
 *(object+120) = 0x0
 *(object+128) = 0x0
 *(object+136) = 0x0
 *(object+144) = 0x0
 *(object+152) = 0x0
 *(object+160) = 0x0
 *(object+168) = 0x0
 *(object+176) = 0x0
 *(object+184) = 0x0
 *(object+192) = 0x0
 *(object+200) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)
runtime stack:
runtime.throw(0xbc9a25, 0x3e)
        /opt/go/src/runtime/panic.go:1112 +0x72 fp=0x7fb9c77fdd30 sp=0x7fb9c77fdd00 pc=0x436132
runtime.badPointer(0x7fba3c4cf108, 0xc0034fbd40, 0xc000425110, 0x28)
        /opt/go/src/runtime/mbitmap.go:380 +0x230 fp=0x7fb9c77fdd78 sp=0x7fb9c77fdd30 pc=0x416410
runtime.findObject(0xc0034fbd40, 0xc000425110, 0x28, 0x7fba3c659d20, 0xc000036e98, 0x7)
        /opt/go/src/runtime/mbitmap.go:416 +0x9b fp=0x7fb9c77fddb0 sp=0x7fb9c77fdd78 pc=0x4164bb
runtime.scanobject(0xc000425110, 0xc000036e98)
        /opt/go/src/runtime/mgcmark.go:1274 +0x235 fp=0x7fb9c77fde40 sp=0x7fb9c77fddb0 pc=0x421555
runtime.gcDrain(0xc000036e98, 0x2)
        /opt/go/src/runtime/mgcmark.go:1032 +0x1fb fp=0x7fb9c77fde98 sp=0x7fb9c77fde40 pc=0x420d5b
runtime.gcBgMarkWorker.func2()
        /opt/go/src/runtime/mgc.go:1951 +0x11e fp=0x7fb9c77fded8 sp=0x7fb9c77fde98 pc=0x46383e
runtime.systemstack(0x7fb9e6ffcb70)
        /opt/go/src/runtime/asm_amd64.s:370 +0x66 fp=0x7fb9c77fdee0 sp=0x7fb9c77fded8 pc=0x466976
runtime.mstart()
        /opt/go/src/runtime/proc.go:1041 fp=0x7fb9c77fdee8 sp=0x7fb9c77fdee0 pc=0x43af10

I don't use unsafe or cgo in my program directly, though some of my transitive dependencies might, I can't say for sure.

@myitcv myitcv added the NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one. label Mar 5, 2020
@myitcv
Copy link
Member

myitcv commented Mar 5, 2020

cc @aclements @randall77

@bcmills
Copy link
Contributor

bcmills commented Mar 5, 2020

See also #37506, #37502, #35541, #32324.

(CC @mknyszek)

@bcmills bcmills added this to the Go1.14.1 milestone Mar 5, 2020
@bcmills bcmills changed the title fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) runtime: fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?) Mar 5, 2020
@bcmills
Copy link
Contributor

bcmills commented Mar 5, 2020

@peterbourgon, does a binary built with -race exhibit the same failure mode? (And, importantly, does it detect any races?)

@peterbourgon
Copy link
Author

Frustratingly, I've been unable to reproduce, even with the same binary on the same host. I'm still working at it...

@axw
Copy link
Contributor

axw commented Mar 9, 2020

I'm seeing the same issue with a package I maintain. Fails with 1.14.0 without -race, does not fail with -race (and there are no races reported) or 1.13.8. Also on linux/amd64.

To reproduce:

go test -count=100 go.elastic.co/apm/module/apmrestful

If I skip all the tests that involve panic recovery, then it no longer reproduces:

go test -count=100 -run 'TestMassageRoutePath|TestContainerFilter$|TestContainerFilterUnknownRoute|TestHandlerHTTPSuite/TestImplicitWrite' go.elastic.co/apm/module/apmrestful

@randall77
Copy link
Contributor

If I skip all the tests that involve panic recovery, then it no longer reproduces:

Possibly related to #37664 ? @danscales

@danscales
Copy link
Contributor

Though it's always possible, this doesn't look likely related to #37664 . In that bug, I would expect either an invalid pc during a traceback (gentraceback()) from a stale defer or a problem during adjustdefers() or tracebackdefers() in scanstack(). It would easy to confirm if we have a repro case (but that's hard to come by, I know).

@axw
Copy link
Contributor

axw commented Mar 10, 2020

@danscales the repro is in #37688 (comment), but I've applied https://go-review.googlesource.com/c/go/+/222420/ locally and it doesn't resolve the issue.

@cagedmantis cagedmantis modified the milestones: Go1.14.1, Go1.14.2 Mar 19, 2020
@axw
Copy link
Contributor

axw commented Mar 20, 2020

Small update: just confirmed that this issue persists with 1.14.1. I ran a git bisect between 1.13.8 and 1.14.0, and the first bad commit is dad6163

@ianlancetaylor
Copy link
Contributor

Thanks. That is https://golang.org/cl/190098 which added low-cost defers. @danscales

@danscales
Copy link
Contributor

Yes, thanks for bisecting it! I have reproduced it on my system, so I will start debugging it.

@danscales danscales self-assigned this Mar 20, 2020
@danscales
Copy link
Contributor

I understand the cause and have a simple fix. Working to see if I can create a much simpler test that can reproduce the problem. Not sure if that will be doable, because of the complex interactions and non-determinism involving the GC, etc.

@ianlancetaylor ianlancetaylor modified the milestones: Go1.14.2, Go1.15 Mar 20, 2020
@ianlancetaylor
Copy link
Contributor

@gopherbot Please open a backport to 1.14. This problem causes program crashes with no workaround.

@gopherbot
Copy link
Contributor

Backport issue(s) opened: #37968 (for 1.14).

Remember to create the cherry-pick CL(s) as soon as the patch is submitted to master, according to https://golang.org/wiki/MinorReleases.

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/224581 mentions this issue: runtime: fix code so defer is not added to g0 defer list

@gopherbot
Copy link
Contributor

Change https://golang.org/cl/225279 mentions this issue: [release-branch.go1.14] runtime: fix code so defer record is not added to g0 defer list during panic

gopherbot pushed a commit that referenced this issue Mar 24, 2020
…d to g0 defer list during panic

newdefer() actually adds the new defer to the current g's defer chain. That
happens even if we are on the system stack, in which case the g will be the g0
stack. For open-coded defers, we call newdefer() (only during panic processing)
while on the system stack, so the new defer is unintentionally added to the
g0._defer defer list. The code later correctly adds the defer to the user g's
defer list.

The g0._defer list is never used. However, that pointer on the g0._defer list can
keep a defer struct alive that is intended to be garbage-collected (smaller defers
use a defer pool, but larger-sized defer records are just GC'ed). freedefer() does
not zero out pointers when it intends that a defer become garbage-collected. So,
we can have the pointers in a defer that is held alive by g0._defer become invalid
(in particular d.link). This is the cause of the bad pointer bug in this issue

The fix is to change newdefer (only used in two places) to not add the new defer
to the gp._defer list. We just do it after the call with the correct gp pointer.
(As mentioned above, this code was already there after the newdefer in
addOneOpenDeferFrame.) That ensures that defers will be correctly
garbage-collected and eliminate the bad pointer.

This fix definitely fixes the original repro. I added a test and tried hard to
reproduce the bug (based on the original repro code), but awasn't actually able to
cause the bug. However, the test is still an interesting mix of heap-allocated,
stack-allocated, and open-coded defers.

For #37688
Fixes #37968

Fixes #37688

Change-Id: I1a481b9d9e9b9ba4e8726ef718a1f4512a2d6faf
Reviewed-on: https://go-review.googlesource.com/c/go/+/224581
Run-TryBot: Dan Scales <danscales@google.com>
TryBot-Result: Gobot Gobot <gobot@golang.org>
Reviewed-by: Keith Randall <khr@golang.org>
(cherry picked from commit 825ae71)
Reviewed-on: https://go-review.googlesource.com/c/go/+/225279
Run-TryBot: Ian Lance Taylor <iant@golang.org>
Reviewed-by: Dan Scales <danscales@google.com>
@abhinavdangeti
Copy link

Hello, I'm seeing panics such as the one this ticket shows with go1.14.4 but NOT with go1.14.2.

#37968 says these are addressed with go1.14.2, so have they sneaked back in somehow?

runtime: pointer 0xc00050d0d0 to unallocated span span.base()=0xc000506000 span.limit=0xc00050e000 span.state=0
runtime: found in object at *(0xc000125e00+0x50)
object=0xc000125e00 s.base()=0xc000124000 s.limit=0xc000125fe0 s.spanclass=14 s.elemsize=96 s.state=mSpanInUse
 *(object+0) = 0xc0005146f0
 *(object+8) = 0xc00032ae00
 *(object+16) = 0x2
 *(object+24) = 0x4
 *(object+32) = 0xc0005146c0
 *(object+40) = 0x4080000000000000
 *(object+48) = 0x409c480000000000
 *(object+56) = 0x4070000000000000
 *(object+64) = 0x4080000000000000
 *(object+72) = 0x4080000000000000
 *(object+80) = 0xc00050d0d0 <==
 *(object+88) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)
runtime stack:
runtime.throw(0x1c9b182, 0x3e)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/panic.go:1116 +0x72 fp=0x700004433d98 sp=0x700004433d68 pc=0x10363a2
runtime.badPointer(0xaa5ba20, 0xc00050d0d0, 0xc000125e00, 0x50)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mbitmap.go:380 +0x230 fp=0x700004433de0 sp=0x700004433d98 pc=0x1016e10
runtime.findObject(0xc00050d0d0, 0xc000125e00, 0x50, 0xaa56600, 0xc000053698, 0x24)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mbitmap.go:416 +0x9b fp=0x700004433e18 sp=0x700004433de0 pc=0x1016ebb
runtime.scanobject(0xc000125e00, 0xc000053698)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:1274 +0x235 fp=0x700004433ea8 sp=0x700004433e18 pc=0x1021c65
runtime.gcDrain(0xc000053698, 0x3)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgcmark.go:1032 +0x1fb fp=0x700004433f00 sp=0x700004433ea8 pc=0x102146b
runtime.gcBgMarkWorker.func2()
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/mgc.go:1940 +0x80 fp=0x700004433f40 sp=0x700004433f00 pc=0x1063e10
runtime.systemstack(0x2f00000)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/asm_amd64.s:370 +0x66 fp=0x700004433f48 sp=0x700004433f40 pc=0x1066ba6
runtime.mstart()
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.4/go/src/runtime/proc.go:1041 fp=0x700004433f50 sp=0x700004433f48 pc=0x103b1d0

@abhinavdangeti
Copy link

abhinavdangeti commented Jun 19, 2020

Apologies, I had probably not cleaned up my workspace after I upgraded the go version the first time. I can't see the panics any longer.

I see the panics with 1.14.3 as well ..

runtime: pointer 0xc0005e30d0 to unused region of span span.base()=0xc000390000 span.limit=0xc000391f80 span.state=1
runtime: found in object at *(0xc000307860+0x50)
object=0xc000307860 s.base()=0xc000306000 s.limit=0xc000307fe0 s.spanclass=14 s.elemsize=96 s.state=mSpanInUse
 *(object+0) = 0xc000119b90
 *(object+8) = 0xc0002ef880
 *(object+16) = 0x1
 *(object+24) = 0x4
 *(object+32) = 0xc000119b60
 *(object+40) = 0x4080000000000000
 *(object+48) = 0x409c600000000000
 *(object+56) = 0x4070000000000000
 *(object+64) = 0x4080000000000000
 *(object+72) = 0x4080000000000000
 *(object+80) = 0xc0005e30d0 <==
 *(object+88) = 0x0
fatal error: found bad pointer in Go heap (incorrect use of unsafe or cgo?)

runtime stack:
runtime.throw(0x1c9b282, 0x3e)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/panic.go:1116 +0x72 fp=0x70000eab3d98 sp=0x70000eab3d68 pc=0x10363a2
runtime.badPointer(0x2f55038, 0xc0005e30d0, 0xc000307860, 0x50)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/mbitmap.go:380 +0x230 fp=0x70000eab3de0 sp=0x70000eab3d98 pc=0x1016e10
runtime.findObject(0xc0005e30d0, 0xc000307860, 0x50, 0x2f11a20, 0xc00004e698, 0x92)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/mbitmap.go:416 +0x9b fp=0x70000eab3e18 sp=0x70000eab3de0 pc=0x1016ebb
runtime.scanobject(0xc000307860, 0xc00004e698)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/mgcmark.go:1274 +0x235 fp=0x70000eab3ea8 sp=0x70000eab3e18 pc=0x1021c65
runtime.gcDrain(0xc00004e698, 0x3)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/mgcmark.go:1032 +0x1fb fp=0x70000eab3f00 sp=0x70000eab3ea8 pc=0x102146b
runtime.gcBgMarkWorker.func2()
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/mgc.go:1940 +0x80 fp=0x70000eab3f40 sp=0x70000eab3f00 pc=0x1063d60
runtime.systemstack(0xb900000)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/asm_amd64.s:370 +0x66 fp=0x70000eab3f48 sp=0x70000eab3f40 pc=0x1066af6
runtime.mstart()
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.3/go/src/runtime/proc.go:1041 fp=0x70000eab3f50 sp=0x70000eab3f48 pc=0x103b200

@networkimprov
Copy link

@abhinavdangeti, please open a new issue and include details on what you tried that failed, with a reproducer program or directions on how to replicate your situation.

ns-codereview pushed a commit to couchbase/cbft that referenced this issue Jun 19, 2020
See comments:
golang/go#37688 (comment)

See those panics with 1.14.3, 1.14.4.
And with 1.14.2:

runtime: nelems=5 nalloc=3 previous allocCount=1 nfreed=65534
fatal error: sweep increased allocation count

runtime stack:
runtime.throw(0x1c80007, 0x20)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/panic.go:1116 +0x72
runtime.(*mspan).sweep(0x2f5ec68, 0x2f5ec00, 0x7000049eee01)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mgcsweep.go:328 +0x8b8
runtime.(*mcentral).uncacheSpan(0x2727458, 0x2f5ec68)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mcentral.go:197 +0x79
runtime.(*mcache).releaseAll(0x2943560)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mcache.go:158 +0x6b
runtime.(*mcache).prepareForSweep(0x2943560)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mcache.go:185 +0x46
runtime.gcMarkTermination.func4.1(0xc00004f800)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mgc.go:1755 +0x2f
runtime.forEachP(0x1cb56c0)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/proc.go:1260 +0x119
runtime.gcMarkTermination.func4()
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/mgc.go:1754 +0x2d
runtime.systemstack(0xa500000)
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/asm_amd64.s:370 +0x66
runtime.mstart()
	/Users/abhinavdangeti/.cbdepscache/exploded/x86_64/go-1.14.2/go/src/runtime/proc.go:1041

Change-Id: I11b95706a42db5665f3269f97adcfc82c394a293
Reviewed-on: http://review.couchbase.org/c/cbft/+/130933
Well-Formed: Build Bot <build@couchbase.com>
Reviewed-by: Chris Hillery <ceej@couchbase.com>
Tested-by: Abhinav Dangeti <abhinav@couchbase.com>
@golang golang locked and limited conversation to collaborators Jun 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
FrozenDueToAge NeedsInvestigation Someone must examine and confirm this is a valid issue and not a duplicate of an existing one.
Projects
None yet
Development

No branches or pull requests