-
Notifications
You must be signed in to change notification settings - Fork 17.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
runtime: fatal error: free list corrupted (3) #13287
Labels
Comments
Repeated the test. At the end of the test each server has about 2GB resident so nothing unusual. Does fall completely idle as expected. But now, some 30 minutes later, still no crash. |
CC @aclements @RLH |
I'm going to go out on a limb and say this is a duplicate of #12879. I would very much like to see a simple way to reproduce this, though. If you find one, please comment on that issue. Thanks. |
This was referenced Jan 20, 2016
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
Possible dup of #11411 and #12879
The software is a distributed database server. At the time, there were 3 servers running, all connected to each other (and all running on the same machine). All the servers are running the exact same binary. Clients would connect, run some tests, disconnect. I was asleep.
From the rest of the stack traces, it looks as though the panic happened 22 minutes after the server was started and 14 minutes after the last client disconnect (test had passed). All 3 connected servers would have been idle at this point.
Of the 3 servers, one (server1) failed with the above, one survived (server2) until the morning when I found it, and the other (server3) appears to have failed at exactly the same time with:
There could have been memory pressure at the time, but I find it unlikely give indications are it's some 14 mins after the last test finished (and passed). Syslog does not show any activity by the kernel OOM process killer.
This server (server3) would only have been in this code because it was trying to reconnect to server1 after server1 had failed. So this would have been exactly 5 seconds after server1 had failed. With server1 having failed, I can't believe there really could have been any memory pressure in the system.
Thus I think this could be the same issue as #12879 in that the server is idle at the time. I do not know if the two crashes are or could be related at all. I shall attempt to see how reproducible this is.
The text was updated successfully, but these errors were encountered: