Skip to content
This repository has been archived by the owner on Apr 22, 2023. It is now read-only.

Socket not closed when using TLS #8615

Closed
misterdjules opened this issue Oct 24, 2014 · 12 comments
Closed

Socket not closed when using TLS #8615

misterdjules opened this issue Oct 24, 2014 · 12 comments
Milestone

Comments

@misterdjules
Copy link

Currently, it seems that sometimes the socket is not properly closed when using TLS to connect from a client to a server, get some data from the server and close the connection. The client hangs while it waits for the connection to be closed, and eventually exits after approximately 20 seconds.

In order to reproduce this issue, clone the repository that contains the code to reproduce the issue and follow the instructions in the README file.

The current code uses socket.io. Of course, socket.io is a big external dependency and I'm working on a reproduction that doesn't involve socket.io.

The issue cannot be reproduced with Node.js 0.10. Thanks to @migounette, we've been able to identify that the issue appears in node 0.11.10. After some more investigation, it seems that 1e066e4 is the change responsible for the issue.

@indutny
Copy link
Member

indutny commented Oct 24, 2014

What do you mean by 'properly'?

@misterdjules misterdjules changed the title Socket not closed properly when using TLS Socket not closed when using TLS Oct 24, 2014
@migounette
Copy link

@indutny the problem shows a fd leak, after running the sample on Rhel 6 or Ubuntu 14, the system is starving of fd, and lsof shows many fd with "can't identify protocol",
Moreover, the behavior between 0.10.32 and 0.11.x is that some socket are not closed leading to "can't identify protocol". After some regression testing, we have detected that some changes around the tls_wrap.cc with the eof may be a good candidate.

@misterdjules misterdjules added this to the 0.11.15 milestone Oct 27, 2014
@misterdjules
Copy link
Author

After some more investigation, I'd like to share my thoughts on the source of this issue.

I wasn't able to reproduce the issue without using web sockets. While I found it surprising at first, it seems that 1e066e4 may expose an issue with how ws (the WebSocket implementation used by Socket.io) handles cleaning up sockets.

ws cleans up resources used by a WebSocket connection in cleanupWebsocketResources. However, this function removes all events handlers for the underlying socket, including the handler for the finish event.

The problem is that the finish event handler is what allows the server to close the socket when it's done writing the data that remains to be written.

Removing the call to removeAllListeners in ws fixes the issue, but it probably has other undesirable effects.

How does 1e066e4 break this dubious behavior? Let's consider the case when the SSL shutdown is done by the client first, and the server hasn't done a SSL shutdown. The server will call onread with UV_EOF as nread. However, it is a "fake" EOF and at that point there's still some data to read on the socket. As a result, onread will not destroy the connection.

Later, when we actually read EOF, then another change in 1e066e4a4a88f97af865d965f104b5fe8136797f prevents onread to be called again with UV_EOF as nread.

At that point, and as mentionned above, we rely on the finish event handler to destroy the socket. But as we've seen, this handler is deleted by ws before it can be called.

Calling onread with UV_EOF unconditionnaly when nread < 0 in TLSCallbacks::DoRead also seems to fix this issue.

@migounette Could you please let me know if you came up with similar conclusions? Also, if the above explanation makes sense to you, could you please try one of above mentionned workarounds in your code and let us know if that fixes your issues?

@indutny I would be very happy if you could give us your thought on this too :)

Thank you!

@indutny
Copy link
Member

indutny commented Oct 28, 2014

@migounette is ws module using legacy TLS API? Anyway, going to check it all out anyway soon.

@migounette
Copy link

What do you mean by legacy TLS API ?

  1. At this stage I can confirm that no fd leak occurs with node 0.11.09
    (long run)
  2. With a modification in ws and the remove of remove all listeners, at
    this stage it also seems to work

Yann

On Tue, Oct 28, 2014 at 4:58 PM, Fedor Indutny notifications@github.com
wrote:

@migounette https://github.com/migounette is ws module using legacy TLS
API? Anyway, going to check it all out anyway soon.


Reply to this email directly or view it on GitHub
#8615 (comment).

@misterdjules
Copy link
Author

@migounette There was some confusion about the commit I pointed to (1e066e4). It looked like it was a change that impacted only the TLS legacy API (the first chunk is in _tls_legacy.js), hence the question. We clarified this during our discussion with @indutny, so I think it's safe to ignore his question.

I think we all agree now on what the root of the issue is, now the question is what do we fix and how do we fix it?

It's unfortunate that ws removes all listeners, including internal ones like finish as soon as it received the end event (that is, when the socket isn't necessarily done writing the data it needs to write to close the connection). We could probably fix ws by making it wait for both the end and the finish events.

However, other modules do and will call removeAllListeners at inappropriate times, and in the long term it's probably better to come up with a more robust solution than fixing every single one of them.

Ideally, calling removeAllListeners on a TLS socket would not remove important internal event handlers. There are several ways to do that, including:

  1. Making TLSSocket re-use net.Socket not through inheritance but with composition.
  2. Refactoring internal/private events handling to use separate data structures/APIs than non-internal events.

I don't know if we want to investigate 1) or 2) in the near future. We could also probably fix ws in the short term, that probably wouldn't hurt, but should not be considered as a robust long-term solution.

@tjfontaine @indutny Does that sound like a reasonable summary of the situation?

@migounette
Copy link

I agree, any use of removeAllListeners may lead to a false positive bug
based on a bad usage and that option 1 and 2 are costly in term
of re-factoring.

Tomorrow, I will look deeply for an elegant solution and respect the legacy
part of ws/ Not so easy !
Waiting for any comment from @tjfontaine https://github.com/tjfontaine
@indutny https://github.com/indutny

Thanks for the report

On Tue, Oct 28, 2014 at 11:00 PM, Julien Gilli notifications@github.com
wrote:

@migounette https://github.com/migounette There was some confusion
about the commit I pointed to (1e066e4
1e066e4).
It looked like it was a change that impacted only the TLS legacy API (the
first chunk is in _tls_legacy.js), hence the question. We clarified this
during our discussion with @indutny https://github.com/indutny, so I
think it's safe to ignore his question.

I think we all agree now on what the root of the issue is, now the
question is what do we fix and how do we fix it?

It's unfortunate that ws removes all listeners, including internal ones
like finish as soon as it received the end event (that is, when the
socket isn't necessarily done writing the data it needs to write to close
the connection). We could probably fix ws by making it wait for both the
end and the finish events.

However, other modules do and will call removeAllListeners at
inappropriate times, and in the long term it's probably better to come up
with a more robust solution than fixing every single one of them.

Ideally, calling removeAllListeners on a TLS socket would not remove
important internal event handlers. There are several ways to do that,
including:

  1. Making TLSSocket re-use net.Socket not through inheritance but with
    composition.
  2. Refactoring internal/private events handling to use separate data
    structures/APIs than non-internal events.

I don't know if we want to investigate 1) or 2) in the near future. We
could also probably fix ws in the short term, that probably wouldn't hurt,
but should not be considered as a robust long-term solution.

@tjfontaine https://github.com/tjfontaine @indutny
https://github.com/indutny Does that sound like a reasonable summary of
the situation?


Reply to this email directly or view it on GitHub
#8615 (comment).

@indutny
Copy link
Member

indutny commented Oct 29, 2014

@misterdjules I agree that we should consider internalising some events in the future, but for now I'm absolutely sure that removeAllListeners call is invalid usage of our APIs.

@migounette
Copy link

@indutny I agree based on the documentation:

"Removes all listeners, or those of the specified event. It's not a good idea to remove listeners that were added elsewhere in the code, especially when it's on an emitter that you didn't create (e.g. sockets or file streams)"

So a quick fix is to patch ws. @misterdjules do you want me to propose the fix to ws project ?

@migounette
Copy link

Proposal for fixing "ws": "0.4.31" used by socket.io 1.0.x as dependency

This fix shows no more fd leak on our test bed and after a long run.
Remove only documented events in order to stay in the philosophy,

function removeAllListeners(instance) {
if (isNodeV4) {
// node v4 doesn't actually remove all listeners globally,
// so we do that instead
instance._events = {};
}
else {
//
// Fix for resource leaks
// #8615 (comment)
//
if (instance.constructor.name === 'TLSSocket' || instance.constructor.name === 'Socket') {
//
// Remove only documented events 'connect', 'data', 'end', 'timeout', 'drain', 'error', 'close'
// http://nodejs.org/api/net.html
//
['connect', 'data', 'end', 'timeout', 'drain', 'error', 'close'].some(function(evt) {
instance.removeAllListeners(evt);
});
}
else {
instance.removeAllListeners();
}
}
}

@indutny
Copy link
Member

indutny commented Oct 29, 2014

Man, I wonder if socket.io could just keep track of the event listeners that it is adding? This how I usually do such things my projects, just removing the listeners that I have added without touching anything else.

@misterdjules
Copy link
Author

@migounette @tjfontaine Closing this issue as my recommendation is that we fix the original problem in ws. I also submitted a PR to ws that only removes custom event handlers.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants