You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
runtime: free list of span 0x7ea0832a2d40:
0xc87618ea00 -> 0x80c87618fa40 (BAD)
fatal error: free list corrupted
runtime stack:
runtime.throw(0x85ecc0, 0x13)
/home/matthew/src/golang/go1.5.1/src/runtime/panic.go:527 +0x90
runtime.mSpan_Sweep(0x7ea0832a2d40, 0x18100000100, 0xc80002a801)
/home/matthew/src/golang/go1.5.1/src/runtime/mgcsweep.go:186 +0x800
runtime.sweepone(0x439b12)
/home/matthew/src/golang/go1.5.1/src/runtime/mgcsweep.go:97 +0x154
runtime.gosweepone.func1()
/home/matthew/src/golang/go1.5.1/src/runtime/mgcsweep.go:109 +0x21
runtime.systemstack(0xc820023500)
/home/matthew/src/golang/go1.5.1/src/runtime/asm_amd64.s:262 +0x79
runtime.mstart()
/home/matthew/src/golang/go1.5.1/src/runtime/proc1.go:674
> go version
go version go1.5.1 linux/amd64
The software is a distributed database server. At the time, there were 3 servers running, all connected to each other (and all running on the same machine). All the servers are running the exact same binary. Clients would connect, run some tests, disconnect. I was asleep.
From the rest of the stack traces, it looks as though the panic happened 22 minutes after the server was started and 14 minutes after the last client disconnect (test had passed). All 3 connected servers would have been idle at this point.
Of the 3 servers, one (server1) failed with the above, one survived (server2) until the morning when I found it, and the other (server3) appears to have failed at exactly the same time with:
There could have been memory pressure at the time, but I find it unlikely give indications are it's some 14 mins after the last test finished (and passed). Syslog does not show any activity by the kernel OOM process killer.
This server (server3) would only have been in this code because it was trying to reconnect to server1 after server1 had failed. So this would have been exactly 5 seconds after server1 had failed. With server1 having failed, I can't believe there really could have been any memory pressure in the system.
Thus I think this could be the same issue as #12879 in that the server is idle at the time. I do not know if the two crashes are or could be related at all. I shall attempt to see how reproducible this is.
The text was updated successfully, but these errors were encountered:
Repeated the test. At the end of the test each server has about 2GB resident so nothing unusual. Does fall completely idle as expected. But now, some 30 minutes later, still no crash.
I'm going to go out on a limb and say this is a duplicate of #12879. I would very much like to see a simple way to reproduce this, though. If you find one, please comment on that issue. Thanks.
Possible dup of #11411 and #12879
The software is a distributed database server. At the time, there were 3 servers running, all connected to each other (and all running on the same machine). All the servers are running the exact same binary. Clients would connect, run some tests, disconnect. I was asleep.
From the rest of the stack traces, it looks as though the panic happened 22 minutes after the server was started and 14 minutes after the last client disconnect (test had passed). All 3 connected servers would have been idle at this point.
Of the 3 servers, one (server1) failed with the above, one survived (server2) until the morning when I found it, and the other (server3) appears to have failed at exactly the same time with:
There could have been memory pressure at the time, but I find it unlikely give indications are it's some 14 mins after the last test finished (and passed). Syslog does not show any activity by the kernel OOM process killer.
This server (server3) would only have been in this code because it was trying to reconnect to server1 after server1 had failed. So this would have been exactly 5 seconds after server1 had failed. With server1 having failed, I can't believe there really could have been any memory pressure in the system.
Thus I think this could be the same issue as #12879 in that the server is idle at the time. I do not know if the two crashes are or could be related at all. I shall attempt to see how reproducible this is.
The text was updated successfully, but these errors were encountered: