-
Notifications
You must be signed in to change notification settings - Fork 7.3k
http client: Reasonable KeepAlive behavior #4769
Comments
i'm actually of the opinion that we should remove pooling from core. there are several approaches to this problem and all change the expected API behavior.
this problem is hard, there's more than one way to solve it, i think we should push it back in to userland. |
also, if i don't win this argument and get pooling ripped out of core, here's some things we'll need to remember.
|
In this day and age, an http client that doesn't support keepalives or retries is not usable. If we "push it back into userland", then we may as well push the whole http client back to userland. Yes, it's a hard problem. But it's far from impossible, and it's something that belongs in core. What we have now is wrong in every direction, and needs to be replaced either with something that isn't broken, or with nothing. |
Just to be clear: By "retries" I mean, "I tried to use a socket out of the pool, got an immediate ECONNRESET, so now I need to reconnect." Not "I got an error halfway through downloading the thing." The second case is an error. The first is a socket that was dead, but we don't know until we poke it. |
@isaacs totally clear on the ECONNRESET not being a thing after the response is written but you're missing the real problem. It's not "I tried to use a socket out of the pool, got an immediate ECONNRESET", its, I got assigned a socket and in the same tick i constructed all the request data and wrote it to the socket and I get a ECONNRESET in the future before I get a response. you'll pretty much never get an ECONNERESET before writing the request out because you'll always write the headers right when it gets assigned. you have to retry, know that you retried, and let the error emit the second time. also, for fear of mutating data more than intended, you can't retry anything but GET and HEAD and that's assuming some idiot's PHP site isn't mutating the backend data on GET based on querystring params. this kind of pooling isn't handled by any of our contemporaries. yes, they do "keep-alive" but they're using blocking IO and get out of most of these inflight issues, or they just pretend they don't exist. the problem with these bugs is that you don't see them unless you're under load which makes them hard to test and debug and it means it'll be months after release until we know if there are any side effects. now, if we do decide this must go in to core, we could construct it as a general client socket pool which would make it much more useful, general, and help out people building socket clients with pooling like redis. this was how i wanted it done originally but ryan went off and wrote it in to the http library and shipped it while it was still kinda broken and slow. my point is, if socket pooling is so important that not having it renders the http client useless then it's probably a big enough problem for people writing non-http socket clients that we should give them a solution. |
Re: abstracting out "pool some sockets" into a standalone thing, yes, absolutely, that is the correct abstraction. Implementation detail, we'll work that out when we're actually building it, post-v0.10. I'd like to see something like this: var pool = net.pool(configs for connect as well as pool size, etc.);
var conn = pool.connect(function onConnect() { ... }); And then we can just use that instead of net.connect. We could also do something fancy like
If you write the first packet of an http request to a socket, and get a RST packet back, then the correct behavior is to tear it down and try again. When I say "Immediate ECONNRESET", I mean, in the first The subtlety of the API needs to be worked out, obviously. Any socket pool needs to be able to swallow RSTs on sockets that haven't been used in a while, because the other side will often decide that they're dead, and start culling them. And that's kind of hard when the socket object has already been handed off to a user. So, we'll have to be smart enough to say, "This socket is actually from a pool, so let's just make a new connection, and swap out its _handle object" or something, under the hood. (Perhaps what we're pooling is not JS net.Socket objects, but just the tcp_wrap handles. TBD.) You don't have to worry about servers having interpreted the request at that point and idempotency (which you obviously DO have to worry about if you're retrying on any random error that occurs mid-stream). It never made it to the server, because there was never an ACK; instead, we got a RST. "That connection isn't here any more, try again." The server told us explicitly to reconnect, because that's what "reset" means.
It's handled by curl and by web browsers. That's all of our contemporaries. This is actually a pretty normal part of the TCP/HTTP dance that we just don't do. |
My .02: I would very much like to see the http client make connection management more transparent to me. The current semantics of the API require convoluted usage upstack to make it do what you want with regards to timeouts/backoff/etc. Also, any magic number of 5 or 8 is still a magic number - at the top of any client file I have the same I very much want a "http handle" that has a As to retry/backoff, this is not specific to HTTP, but any TCP-based system - I think node needs to do that for connection establishment (but notably not user requests), along with connection establishment timeout. It's required for any real system, and I know at least in all of our TCP/LDAP/HTTP systems, we have the same boilerplate setTimer/setup retry+backoff logic/wait for connection, loops. I suspect there's lots of wrong code out there doing this, including mine/ours because it's error prone and it needs to be replicated all over. I'm pretty sure what's described above would be fine - i.e., before there's data that's actually been ACK'd, just manage retry for me. After that, emit a RST, or whatever to consumer that my message got dropped. Everything above has nothing to do with a pool, that's just for a single socket. Now the question is around pooling - I suspect that a general TCP pool wouldn't necessarily work for HTTP or vice-versa; plenty of protocols allow full async communication over a single socket (i.e., LDAP/SPDY/...), in which case "checking out a connection and queuing" is completely artificial. On the flip side, you obviously need to checkout+queue for HTTP and most every DB protocol that I know of. I do agree with @isaacs that it's essentially standard these days for every user agent (don't forget java, where jakarta-commons still powers a significant percentage of clients that aren't web browsers out there) to do this, so I don't see how node can offer up HTTP as a supported "core" protocol and not do it for most users, but it wouldn't be the end of the world either to just say it's explicitly an upstack problem, IMO. So in tl;dr form:
|
the browser works more or less like what we have now. it does not keep connections alive that it's not using. the use case is quite different, almost all outgoing requests are known pretty quickly and a shorter life cycle so they don't have the same problems we do.
SDPY requires that you only use a single connection. HTTP 1.1 has keep-alive and pipelineing but the pipelining doesn't support multiplexing so almost nobody uses it because it ends up being slower than pooling.
The problem is, this logic must exist in the http response handling because HTTP allows for the response to dictate keep-alive and close so this isn't something you can be 100% consistent with given the input configuration like you're suggesting. The current API (agent) is terrible for writing your own handling of these cases but it's important to establish upfront that both connection:close and connection:keep-alive handling have to be defined for every outgoing http request. |
would also like some of @dannycoates' input on pooling. |
Yes, I understand; whatever SPDY "requires" w.r.t. to a single connection is sort of meaningless - clients can always make more. I was only saying that there are drastically different models so I don't see how a generic pool for all things sockets would work out.
Sorry, I meant by this that you want the "handle" object to either reuse the existing connection or create a new one for each request as dictated by the current state. That model would work, and be 1:1 of "thing I hold to socket". |
last time i looked at the SPDY spec it stated that you can't use more than one connection. sure you could make more but there's no incentive other than load testing. SPDY support will actually look a lot like websockets, the HTTPS connection get's "upgraded" and then it gets taken out of the pool. maybe the solution to these handlers is something like |
Are you sure about that? I'm pretty sure that browsers keep the underlying TCP connections open for a while in case you want to make more HTTP requests on top of them. Especially for sites that use HTTPS, this is a gigantic win. |
I think it's fair to say that non-browser implementations of HTTP clients do not typically have this kind of "linger keepalive" support. Node has better support for HTTP than just about everything else, by design, and this keepalive thing is one area that's still lacking. Let's not get distracted by the issue of SPDY and pipelining multiple requests over the same TCP. If those kinds of solutions were optimal, we wouldn't need HTTP. SPDY may indeed by the future, but right now in the present we need real HTTP keepalive, either in node or in userland. At Voxer we've done a version of this in our application code, and it's OK, but everyone should not have to implement their own keepalive logic to get the behavior they expect. |
SPDY and pipelining are separate concerns entirely. We don't have first-class SPDY support in core, and aren't likely to any time soon. But the node-spdy module is maintained by @indutny who is always deep in core development and will keep us from breaking anything it depends on. Our client doesn't do pipelining, and likely never will. Our client does do keepalive, just badly -- it'll only reuse connections that already have pending requests, but won't keep the socket alive if there are no requests pending, when it can do that perfectly easily. We don't handle ECONNRESET very nicely in the client at all, either, and using longer-lived keepalives will force that issue, I'm sure. But, there is clearly user demand for this, it's a reasonable user expectation, and hard or not, it's something we have to do. v0.12 is the version to land this in. |
i thought this was cleaned up? should make it's way to the ClientRequest object and emit error. |
It's significantly cleaner. But not still not optimum. With a keepalive connection, you want to be able to do handle ECONNRESET from the first handle.ondata() after the first write, and if you get that, make a new connection, and try again. That logic will go into the socketpool thingamajig, I imagine. |
Are +1, "me too!" comments unwelcome here? Because this is one. I need to implement custom agent/pooling behavior for my middleware, partly to more gracefully handle a connection limit my database provider has but mostly to deal with the horrible latency caused by lack of Keep-Alive. Personally I'd be happy with the built-in library letting me initialize an explicit HTTP connection rather than request, but in lieu of that a better default keep-alive strategy (and less weird ECONNRESET behaviour when maxSockets is exceeded) and some non-UTSL documentation for custom agents would do. |
i find +1 comments that have no explanation entirely useless. yours is quite detailed and, I for one, welcome it :) |
This was landed in master some time ago. Closing this issue. If there are other issues with it, please post new issues. Thanks for the feedback everyone :) |
@isaacs since the documentation on |
@ravi Would love to hear a bit more too, but I just wrapped up work implementing node-compatible (v0.11) 'http' module and so maybe I can give a few pointers meanwhile:
|
@natevw Greatly appreciate your response. The docs link you posted is indeed relevant, but unfortunately it seems to make matters murkier (at least to me): This section remains from the old doc:
This is essentially, IIUC, the behaviour in 0.10 (and earlier?) i.e., connections are opened with keep-alive but closed if no requests are pending (no grace period). However immediately after that, we read:
So, now we have a boolean option keepAlive whose behaviour is a bit ambiguous to me. If it is true then you opt into using HTTP keep-alive. Which is, I would guess, the HTTP KeepAlive describe in the second quoted section above wherein the socket is not closed automatically. Which raises the question, when does the logic outlined in the first quoted section apply? When the option keepAlive is false? I am guessing the right way to read the doc is:
I suspect however that my reading may be flawed. Also, the use of the word socket is a bit unfortunate, since in classical networking terminology there is an important distinction between a socket and a connection (or connected socket), which unluckily is quite relevant here. |
Time to read the code! :-) |
@ravi I think your understanding is mostly correct. Here's mine:
Hope this is helpful (and correct :-P) |
Everyone wants this. People expect it: #1958
The current system is build so that a socket will only be reused if there is a pending request. However, this is not optimal for many many use cases. It was designed that way due to a limitation that any connected socket would keep node open, even if we were not using it. But now we have
Socket.unref()
andSocket.ref()
.Also, 5 is always the wrong number, so it's a bad default.
The http client should keep a pool of sockets around for each server we connect to. This should be a a configurable max (8 by default) per protocol/host/port.
When a socket is in use, we
socket.ref()
. Then when it goes back into the pool, callsocket.unref()
. If a socket gets ECONNRESET while waiting in the pool, just ignore the error, and discard that connection.It's a bit of added bookkeeping, but this will make the http client behave much more expectably, and more like a web browser.
/cc @mranney @mcavage @bnoordhuis @piscisaureus @mikeal
The text was updated successfully, but these errors were encountered: