-
Notifications
You must be signed in to change notification settings - Fork 29.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
segfault on node v6.3.1 on Ubuntu 14.04.4 #8074
Comments
Do you have a minimal example that reproduces it? |
No, unfortunately I have not been able to minimize this. I noticed the TLS tag: the application does use TLS with Postgres and for external http requests via axios. For incoming requests, though, TLS termination happens outside of the node process. |
/cc @indutny |
Yeah, the segfault occurs in the second part of this line, meaning that |
@uhoh-itsmaciek Have you tried with different node versions (e.g. v6.0.0, v5.x, v4.x) to find when this problem started? |
I'll try these out and report back. |
It appears that on 5.12, we don't get the axios 401s (from another service we run) that trigger this problem in the first place. I had not yet investigated why these 401s were happening and thought that they were legitimate, but maybe we're somehow corrupting the credentials we send with these requests in node 6? |
Ok, on 5.12 if I force an error response from axios by intentionally sending the wrong credentials, I still get a segfault with a very similar trace, although it seems to consistently be for address 0x0 (which was not the case with 6):
|
On 4.4.7, the asset precompilation fails for my app, so I can't easily check that. On 6.0.0, I get the segfault. |
A colleague suggested that this seems similar to an issue fixed recently in Postgres--we looked at the node code you linked briefly @addaleax and it's not obviously the same issue, but perhaps it's related? In both cases, two independent clients are managing their own TLS connections in the same process, and at least in the Postgres case, the OpenSSL API made it difficult to do this safely. |
@uhoh-itsmaciek Yeah, I mostly linked that code to explain where the label comes from – unfortunately, I can’t tell from the stack trace itself or the code that it points to why the Something you can try (if that’s possible) is running your app with valgrind. It’s going to be terribly slow but it’s the best idea I have for now. |
I'll see if I can set that up--it might be tricky to do on Heroku (and I have not been able to reproduce locally). In the meantime, I noticed that the endpoint I call with axios does not actually respond with 401s: according to the heroku router logs, my endpoint responds with 301s, but somehow axios sees a 401 response. I'm guessing this is related to the memory corruption, but I suppose it doesn't really tell us much. |
@uhoh-itsmaciek may I ask you to update a core dump of this crash? It can be inspected with llnode later on, and we should be able to find much more info from there! |
Unfortunately, I can't get a core dump on Heroku--it's a limitation of the platform. I'll see if I can reproduce this locally using the Cedar-14 Docker image. I did get Valgrind building and running on Heroku, but I think reproducing this locally is worth trying first. |
So haven't gotten it running locally, but in the meantime, I tried turning off TLS on our Postgres connection, so we would be left with only a single user of TLS in the process (axios), but I was still able to reproduce. |
I've got the app running locally in a container using the |
Maybe… it’s hard to tell without actually seeing it (although I guess the same goes for a core dump, too). I can say that valgrind is the first tool I run to when seeing something nondeterministic, and usually the only price it has is that it can take a lot of CPU time. |
Alright, thanks. I'm working on some other things concurrently so it may be a few days, but I'll give it a shot. |
In the meantime, a colleague told me that 6.4.0 has been released so I tried that, and I can still reproduce the segfault. That's what I expected, but just wanted to note that as another data point. |
Ok, this is the valgrind output on heroku during the steps that reproduce the crash:
This is under node 6.4.0 For what it's worth, this is the only other output from valgrind in this session, right at node startup:
Does this help? It looks like some symbols are still missing--is that from openssl? Do you need that info? Are there debug symbols from openssl I can install to get more info there? |
Just wanted to follow up: is there any additional info on this I can provide to help diagnose this issue? |
@uhoh-itsmaciek Sorry… the valgrind output only seems to confirm that there is some kind of memory corruption, and if as you indicated these are the first lines with actual output, then even valgrind is pretty vague here. If I were in your position I’d probably start cluttering Node’s source code with |
I can run with an instrumented build of node, but I've never seriously worked with a large C++ project, and it's been years since I've touched C++ at all. I wouldn't know where to start. I guess we'll try to avoid the segfault and hope someone else stumbles onto this and can help debug further. |
Also seeing this with node 6.9.2 and 7.2.1 on Linux (Ubuntu 16.04). Using |
Perhaps not exactly the same, but it looks similar:
|
For what it's worth it seems to happen when a client terminates a connection, but it's sporadic and I haven't been able to reproduce reliably |
I've been running stress tests but have so far been unable to reproduce. |
Still trying to narrow it down, will reply if I can figure out how to reproduce it reliably. In the meantime (not really sure what I'm doing) I've managed to grab a core dump but it's 119mb - is there a good / safe way to share something like that? Here's a preview from when I open it with gdb:
|
Yes, I’d assume that it’s resolved now. If somebody runs into this on a recent version of Node.js, we can always open a new issue for that |
This test has a dependency on the order in which the TCP connection is made, and TLS server handshake completes. It assumes those server side events occur before the client side write callback, which is not guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3 didn't exist at the time the bug existed. Pin the test to TLS1.2, since the test shouldn't be changed in a way that doesn't trigger a segfault in 7.7.3: - nodejs#13184 (comment)
This test has a dependency on the order in which the TCP connection is made, and TLS server handshake completes. It assumes those server side events occur before the client side write callback, which is not guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3 didn't exist at the time the bug existed. Pin the test to TLS1.2, since the test shouldn't be changed in a way that doesn't trigger a segfault in 7.7.3: - nodejs#13184 (comment)
This test has a dependency on the order in which the TCP connection is made, and TLS server handshake completes. It assumes those server side events occur before the client side write callback, which is not guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3 didn't exist at the time the bug existed. Pin the test to TLS1.2, since the test shouldn't be changed in a way that doesn't trigger a segfault in 7.7.3: - nodejs#13184 (comment)
This test has a dependency on the order in which the TCP connection is made, and TLS server handshake completes. It assumes those server side events occur before the client side write callback, which is not guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3 didn't exist at the time the bug existed. Pin the test to TLS1.2, since the test shouldn't be changed in a way that doesn't trigger a segfault in 7.7.3: - #13184 (comment) PR-URL: #25508 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com>
This test has a dependency on the order in which the TCP connection is made, and TLS server handshake completes. It assumes those server side events occur before the client side write callback, which is not guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3 didn't exist at the time the bug existed. Pin the test to TLS1.2, since the test shouldn't be changed in a way that doesn't trigger a segfault in 7.7.3: - #13184 (comment) PR-URL: #25508 Reviewed-By: Anna Henningsen <anna@addaleax.net> Reviewed-By: James M Snell <jasnell@gmail.com>
This test has a dependency on the order in which the TCP connection is made, and TLS server handshake completes. It assumes those server side events occur before the client side write callback, which is not guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3 didn't exist at the time the bug existed. Pin the test to TLS1.2, since the test shouldn't be changed in a way that doesn't trigger a segfault in 7.7.3: - nodejs#13184 (comment)
This test has a dependency on the order in which the TCP connection is made, and TLS server handshake completes. It assumes those server side events occur before the client side write callback, which is not guaranteed by the TLS API. It usually passes with TLS1.3, but TLS1.3 didn't exist at the time the bug existed. Pin the test to TLS1.2, since the test shouldn't be changed in a way that doesn't trigger a segfault in 7.7.3: - nodejs#13184 (comment)
Hello,
I seem to be able to fairly reliably (maybe 50% of the time) reproduce a segfault with node v6.3.1 in my express application. I'm not using any native modules (except for
segfault-handler
to get the trace output below, and the error of course also occurs withoutsegfault-handler
).This is the output I get:
The error seems to happen right after I send a message to a client via socket.io, but the message itself seems to send correctly (that is, I see a log line after the
.emit
).I looked at the other open issues mentioning segfaults, but I did not see anything relevant.
Any ideas?
The text was updated successfully, but these errors were encountered: