Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

runtime: fatal error: free list corrupted (3) #13287

Closed
msackman opened this issue Nov 17, 2015 · 3 comments
Closed

runtime: fatal error: free list corrupted (3) #13287

msackman opened this issue Nov 17, 2015 · 3 comments

Comments

@msackman
Copy link

Possible dup of #11411 and #12879

runtime: free list of span 0x7ea0832a2d40:
0xc87618ea00 -> 0x80c87618fa40 (BAD)
fatal error: free list corrupted

runtime stack:
runtime.throw(0x85ecc0, 0x13)
    /home/matthew/src/golang/go1.5.1/src/runtime/panic.go:527 +0x90
runtime.mSpan_Sweep(0x7ea0832a2d40, 0x18100000100, 0xc80002a801)
    /home/matthew/src/golang/go1.5.1/src/runtime/mgcsweep.go:186 +0x800
runtime.sweepone(0x439b12)
    /home/matthew/src/golang/go1.5.1/src/runtime/mgcsweep.go:97 +0x154
runtime.gosweepone.func1()
    /home/matthew/src/golang/go1.5.1/src/runtime/mgcsweep.go:109 +0x21
runtime.systemstack(0xc820023500)
    /home/matthew/src/golang/go1.5.1/src/runtime/asm_amd64.s:262 +0x79
runtime.mstart()
    /home/matthew/src/golang/go1.5.1/src/runtime/proc1.go:674

> go version
go version go1.5.1 linux/amd64

The software is a distributed database server. At the time, there were 3 servers running, all connected to each other (and all running on the same machine). All the servers are running the exact same binary. Clients would connect, run some tests, disconnect. I was asleep.

From the rest of the stack traces, it looks as though the panic happened 22 minutes after the server was started and 14 minutes after the last client disconnect (test had passed). All 3 connected servers would have been idle at this point.

Of the 3 servers, one (server1) failed with the above, one survived (server2) until the morning when I found it, and the other (server3) appears to have failed at exactly the same time with:

fatal error: C malloc failed

goroutine 77 [running]:
runtime.throw(0x8241f0, 0xf)
    /home/matthew/src/golang/go1.5.1/src/runtime/panic.go:527 +0x90 fp=0xc8e6765408 sp=0xc8e67653f0
runtime.cmalloc(0xa, 0x409617)
    /home/matthew/src/golang/go1.5.1/src/runtime/cgocall.go:148 +0x68 fp=0xc8e6765438 sp=0xc8e6765408
net._Cfunc_CString(0xc8202cb800, 0x9, 0xc8e67654e8)
    ??:0 +0x28 fp=0xc8e67654a8 sp=0xc8e6765438
net.cgoLookupIPCNAME(0xc8202cb800, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/cgo_unix.go:108 +0x13c fp=0xc8e67655d0 sp=0xc8e67654a8
net.cgoLookupIP(0xc8202cb800, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/cgo_unix.go:163 +0x56 fp=0xc8e6765628 sp=0xc8e67655d0
net.lookupIP(0xc8202cb800, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/lookup_unix.go:67 +0x94 fp=0xc8e6765698 sp=0xc8e6765628
net.glob.func15(0x8d0300, 0xc8202cb800, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/hook.go:10 +0x4d fp=0xc8e67656d8 sp=0xc8e6765698
net.lookupIPMerge.func1(0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/lookup.go:68 +0x71 fp=0xc8e6765758 sp=0xc8e67656d8
internal/singleflight.(*Group).doCall(0xc2c570, 0xc8764ae230, 0xc8202cb800, 0x9, 0xc8e6765950)
    /home/matthew/src/golang/go1.5.1/src/internal/singleflight/singleflight.go:93 +0x2c fp=0xc8e6765808 sp=0xc8e6765758
internal/singleflight.(*Group).Do(0xc2c570, 0xc8202cb800, 0x9, 0xc8e6765950, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/internal/singleflight/singleflight.go:63 +0x284 fp=0xc8e6765878 sp=0xc8e6765808
net.lookupIPMerge(0xc8202cb800, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/lookup.go:69 +0x9b fp=0xc8e6765988 sp=0xc8e6765878
net.lookupIPDeadline(0xc8202cb800, 0x9, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/lookup.go:91 +0xde fp=0xc8e6765bc0 sp=0xc8e6765988
net.internetAddrList(0x821bd8, 0x3, 0xc8202cb800, 0xf, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
    /home/matthew/src/golang/go1.5.1/src/net/ipsock.go:252 +0x6ee fp=0xc8e6765d28 sp=0xc8e6765bc0
net.ResolveTCPAddr(0x821bd8, 0x3, 0xc8202cb800, 0xf, 0x5d8abc, 0x0, 0x0)
    /home/matthew/src/golang/go1.5.1/src/net/tcpsock.go:56 +0x11b fp=0xc8e6765de8 sp=0xc8e6765d28
....my code.

There could have been memory pressure at the time, but I find it unlikely give indications are it's some 14 mins after the last test finished (and passed). Syslog does not show any activity by the kernel OOM process killer.

This server (server3) would only have been in this code because it was trying to reconnect to server1 after server1 had failed. So this would have been exactly 5 seconds after server1 had failed. With server1 having failed, I can't believe there really could have been any memory pressure in the system.

Thus I think this could be the same issue as #12879 in that the server is idle at the time. I do not know if the two crashes are or could be related at all. I shall attempt to see how reproducible this is.

@msackman
Copy link
Author

Repeated the test. At the end of the test each server has about 2GB resident so nothing unusual. Does fall completely idle as expected. But now, some 30 minutes later, still no crash.

@ianlancetaylor ianlancetaylor added this to the Go1.5.2 milestone Nov 17, 2015
@ianlancetaylor
Copy link
Contributor

CC @aclements @RLH

@rsc
Copy link
Contributor

rsc commented Nov 18, 2015

I'm going to go out on a limb and say this is a duplicate of #12879. I would very much like to see a simple way to reproduce this, though. If you find one, please comment on that issue. Thanks.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants