attempt to make asynchttpserver better; fixes #15925; [backport:1.0] #15957

Araq · 2020-11-13T12:03:31Z

No description provided.

disruptek · 2020-11-13T15:50:46Z

And just like that, #12325 was fixed. 👍

Araq · 2020-11-13T15:52:12Z

@disruptek Yeah, I had a vague memory of your PR. Good we're finally sorting it out.

ringabout · 2020-11-13T15:54:27Z

There is an issue releated to processClient: #3324

dom96

If I understand the problem this is solving then I think you can simplify this and make it clearer to the users:

maxClients argument to AsyncHttpServer
increment a clientsCount counter in processClient
decrement after connection is closed
don't accept unless clientsCount < maxClients.

dom96 · 2020-11-13T16:28:18Z

lib/pure/asynchttpserver.nim

+##     while true:
+##       if server.shouldAcceptRequest(5):
+##         var (address, client) = await server.socket.acceptAddr()
+##         asyncCheck processClient(server, client, address, cb)
+##       else:
+##         poll()


This example is under "basic usage", you shouldn't complicate it.

People copy&paste this section, it needs to be reasonably complete rather than simple. In fact, I should probably add some basic error handling there too.

dom96 · 2020-11-13T16:30:43Z

lib/pure/asynchttpserver.nim

+proc serve*(server: AsyncHttpServer, port: Port,
+            callback: proc (request: Request): Future[void] {.closure, gcsafe.},
+            address = "";
+            assumedDescriptorsPerRequest = 5) {.async.} =


This is very arbitrary and therefore likely error prone.

It's much better than no limit and limits are always arbitrary.

dom96 · 2020-11-13T16:31:45Z

lib/pure/asynchttpserver.nim

+  ##
+  ## You should prefer to call `acceptRequest` instead with a custom server
+  ## loop so that you're in control over the error handling and logging.
+  listen server, port, address


This shouldn't be here, unless I'm missing something.

It has to be there thanks to bad API design (it should be part of newAsyncHttpServer IMHO but this would be a breaking change).

Won't this literally break code? You'll have code in the wild calling listen twice now.

Edit: oh, I see, didn't realise you introduced the listen.

Araq · 2020-11-13T17:09:16Z

If I understand the problem this is solving then I think you can simplify this and make it clearer to the users:

This wouldn't work as well because serving a single async request can require multiple different FDs, say if you load data from a DB before serving the request. It's much more robust to ask the event loop instead.

c-blake · 2020-11-13T17:57:03Z

I think I agree with @Araq here. At first my reaction was kind of like @dom96, but then I thought about it a little more. The current way or its mathematical complement (clients limit & maxDescriptors() subtraction) way are basically equivalent control, but a user is probably more likely to be able to estimate/bound how many new descriptors a worst case request might need. So, if it were the max clients way, most users wanting to be safe would be doing that same subtraction to go back to the complementary sense, anyway. It is also easier to default "spare fds"/show in example code this way.

There might be a "silent underprovisioning" argument to be made. The least max fds I've ever seen in 30 years of using Unix is 64, HP-UX in the 1990s, IIRC. So, std(in|out|err)+cpl overhead + 5 headroom still leaves (probably) ~54 concurrent connections in the crazy worst case. That's actually still a lot of concurrent connections, TBH. Except for debugging/testing like for this problem, people never lower such small fd limits. So the "spare fds way" doesn't really risk underprovision, and the complex guesstimation above kind of exhibits what happens to careful users in the "complementary coordinates". Such guestimation seems ugly & more error prone..You may be a forked process inheriting a bunch of fds, too. So, I think specifying from the spares/headroom side is best.

I do think having a 5 line API call to crank up the soft max to the hard max (somewhere) would be helpful..basically just:

import posix # add more error checks, of course
var fdLim: RLimit
discard getrlimit(RLIMIT_NOFILE, fdLim)
fdLim.rlim_cur = fdLim.rlim_max
discard setrlimit(RLIMIT_NOFILE, fdLim)

Maybe call it softToHardDescriptorLimit or something like that. (In truth the only errors possible here are EFAULT for a bad fdLim addr and EINVAL for a bad RLIMIT_NOFILE, but it's good to check all one's errors). May even make sense to layer it, like softToHardLimit(resource = RLIMIT_NOFILE) so it could work for any of the RLIMIT_*.

lib/pure/asyncdispatch.nim

timotheecour · 2020-11-13T20:44:15Z

lib/pure/asynchttpserver.nim

+proc serve*(server: AsyncHttpServer, port: Port,
+            callback: proc (request: Request): Future[void] {.closure, gcsafe.},
+            address = "";
+            assumedDescriptorsPerRequest = 5) {.async.} =


factor assumedDescriptorsPerRequest = 5 with the constant used in

proc shouldAcceptRequest*(server: AsyncHttpServer; assumedDescriptorsPerRequest = 5): bool {.inline.} =

timotheecour · 2020-11-13T20:50:56Z

@Araq good PR but please always squash and merge, this PR will break git bisect workflows on which I and others rely a lot to find regressions/accidental bugfixes etc, because it keeps intermediate broken commits

can we do the proposal in #8664 would prevent such issues in future:

in nim-lang/Nim/settings, disable "Allow merge commits", as follows: [...]

(and even force squash and merge); in the rare cases where a squash and merge isn't desirable, the other merge strategies could be enabled temporarily

dom96

This wouldn't work as well because serving a single async request can require multiple different FDs, say if you load data from a DB before serving the request. It's much more robust to ask the event loop instead.

Then perhaps it should be part of asyncdispatch itself?

dom96 · 2020-11-13T21:31:31Z

lib/pure/asynchttpserver.nim

+  ##
+  ## You should prefer to call `acceptRequest` instead with a custom server
+  ## loop so that you're in control over the error handling and logging.
+  listen server, port, address


Won't this literally break code? You'll have code in the wild calling listen twice now.

Edit: oh, I see, didn't realise you introduced the listen.

Araq · 2020-11-13T21:50:28Z

Won't this literally break code? You'll have code in the wild calling listen twice now.

But previously listen wasn't exported so I don't see how this could happen.

Araq · 2020-11-13T21:50:58Z

Then perhaps it should be part of asyncdispatch itself?

The APIs that are added are part of asyncdispatch, yes.

c-blake · 2020-11-13T21:51:38Z

The best place to limit fd creation for a network service is just before accept(2). Then the usual listen(2) backlog will just keep the requester clients in limbo until fds have been freed up and then accept(2) as usual. I believe that's what @Araq did.

Araq · 2020-11-13T21:52:28Z

(and even force squash and merge); in the rare cases where a squash and merge isn't desirable, the other merge strategies could be enabled temporarily

I know, it was a simple mistake. Sometimes github defaults to the wrong button for some reason.

dom96 · 2020-11-13T21:54:48Z

The APIs that are added are part of asyncdispatch, yes.

No, I mean the handling of FDs. Wouldn't it be better to have it handled in accept automatically?

c-blake · 2020-11-13T21:57:38Z

There could be a safeAccept, I suppose, that no-ops if there aren't enough spare fds, but probably the event loop then also needs to be aware of a no-op return, as opposed to just successful or an error case to raise.

Araq · 2020-11-13T21:57:49Z

Maybe, but this would be an even more invasive change. Also: People requested an API like mine for asynchttpserver before, nobody requested this feature for accept.

c-blake · 2020-11-13T22:00:40Z

Technically, this same problem exists for literally every system call returning a file descriptor. So, some kind of safeOpen template-y/-ish system might be warranted someday/somehow, but I think that could live side-by-side with @Araq's API.

dom96 · 2020-11-13T22:00:55Z

Then the usual listen(2) backlog will just keep the requester clients in limbo until fds have been freed up and then accept(2) as usual.

Huh, this implies that the OS should handle this for us then. If so why do we need a separate mechanism in the stdlib?

c-blake · 2020-11-13T22:02:26Z

The OS still limits your open fds. What I meant is that the clients will not see an ECONNREFUSED-type situation until the backlog buffer gets filled up. So, just not doing the accept syscall is more graceful, deferring failure if there is just a very transient spike in connections.

Araq · 2020-11-13T22:03:27Z

Technically, this same problem exists for literally every system call returning a file descriptor. So, some kind of safeOpen template-y/-ish system might be warranted someday/somehow, but I think that could live side-by-side with @Araq's API.

That's true but please consider:

This PR with an API change will be backported to version 1.0.x. I consider it a critical omission -- more critical than other bugfixes that we backported -- but at the same time I tried my best to keep the changes to a minimum.

c-blake · 2020-11-13T22:05:06Z

Oh, sure. I was just responding to the @dom96 idea of safeAccept. Re: the listen backlog/overall dynamics, it is also true that few logical situations allow you to just punt to a no-op and try again later. So, utility of safeOpen may be pretty limited.

dom96 · 2020-11-13T22:07:04Z

Does anyone have a real repro for issue #15925? All I see are synthetic code samples that don't seem realistic to me. I'm quite confused about how this would crop up in real code without it being due to a real FD leak.

Araq · 2020-11-13T22:08:04Z

Does anyone have a real repro for issue #15925?

As far as I can know this affects our very own forum software. (Still investigating though.)

c-blake · 2020-11-13T22:08:49Z

The origin case was in the Forum thread. The wrk http demon workhorse tester with 1024 parallel connections (-c 1024). That is a lot more than is typical organically, unless you are under a DoS attack or a Big Tech company. (EDIT: but our stuff should still handle this scenario.)

timotheecour · 2020-11-13T22:09:31Z

How to prevent a program from crashing when the "OSError: Too many open files" error occurs? - Nim forum

dom96 · 2020-11-13T22:29:38Z

Well... we just tested his exploit and it doesn't do anything. Furthermore, I checked the server and it has been running for 2 months without crashes.

dom96 · 2020-11-13T22:36:15Z

So my recommendation is to change this slightly:

It shouldn't be the default
The docs should be reverted to the basic example
Docs should be provided showing how to use this new API

dom96 · 2020-11-13T22:38:15Z

lib/pure/asynchttpserver.nim

+proc acceptRequest*(server: AsyncHttpServer, port: Port,
+            callback: proc (request: Request): Future[void] {.closure, gcsafe.}) {.async.} =


also, the port here is unused

Ah, good one. Will do a follow-up PR later.

c-blake · 2020-11-13T23:12:00Z

I'm not sure how the Forum server process never died, but you also said server not server process. So, maybe ambiguous? I didn't look at the Forum code, anyway.

I do think it is not great example code/library behavior to just exhaust file descriptors. Reserving at least a few for the user code just makes very basic sense to me. I'm not sure why anyone would be against it. You need at least 1 free for the accept itself to even work.

My simpler test #15925 (comment) behaved as one would expect with that wrk http test prog from strace s that I did. I haven't yet tried it with the recent @Araq change. Could be wrinkles.

Araq · 2020-11-14T09:20:01Z

I'm not sure how the Forum server process never died, but you also said server not server process. So, maybe ambiguous? I didn't look at the Forum code, anyway.

The Forum process itself never died.

c-blake · 2020-11-14T13:02:04Z

I don't know what's going on with the Forum recovery from overload, but this simple program:

import asynchttpserver, asyncdispatch, asyncnet, posix

var fdLim = RLimit(rlim_cur: 7, rlim_max: 1024)
discard setrlimit(RLIMIT_NOFILE, fdLim)
const
  s = "HTTP/1.1 200\r\ncontent-type: text/html\r\ncontent-length: 3\r\n\r\nHi\n"
proc svc(req: Request, staticDir="") {.async.} = await req.client.send(s)
proc cb(req: Request) {.async.} = await req.svc("static")

var server = newAsyncHttpServer()
waitFor server.serve(Port(8080), cb)

dies immediately with the old code even for wrk --latency -d 1s -t 1 -c 3 http://localhost:8080/ yet runs fine with wrk --latency -d 1s -t 1 -c 1024 http://localhost:8080/ with the new code. Raising an avoidable exception under overload (without an easy plan to clear the condition) seems bad while gracefully handling massive overload seems good. So, I think the new code is both more robust & a better example of managing a scarce (on Unix) resource. Like all code, it may be imperfect.

Araq added 4 commits November 13, 2020 13:03

attempt to make asynchttpserver better; fixes #15925; [backport:1.0]

8778d4a

better documentation

cb19dc5

fixes 'nim doc'

19d5203

makes test green again

fa7b12b

Araq requested a review from dom96 November 13, 2020 13:51

Araq added 2 commits November 13, 2020 17:03

ported to FreeRTOS

9f56688

fixes the doc rendering

02f8b11

dom96 requested changes Nov 13, 2020

View reviewed changes

Araq merged commit 562c627 into devel Nov 13, 2020

Araq deleted the araq-async-madness branch November 13, 2020 19:57

timotheecour reviewed Nov 13, 2020

View reviewed changes

lib/pure/asyncdispatch.nim Show resolved Hide resolved

timotheecour reviewed Nov 13, 2020

View reviewed changes

dom96 reviewed Nov 13, 2020

View reviewed changes

timotheecour mentioned this pull request Nov 15, 2020

unselect Allow merge commits and Allow rebase merging in settings nim-lang/RFCs#284

Closed

dom96 mentioned this pull request Jun 5, 2021

The asychttpserver dies when a reverse proxy is used. #18161

Open

		proc acceptRequest*(server: AsyncHttpServer, port: Port,
		callback: proc (request: Request): Future[void] {.closure, gcsafe.}) {.async.} =

attempt to make asynchttpserver better; fixes #15925; [backport:1.0] #15957

attempt to make asynchttpserver better; fixes #15925; [backport:1.0] #15957

Conversation

Araq commented Nov 13, 2020

disruptek commented Nov 13, 2020

Araq commented Nov 13, 2020

ringabout commented Nov 13, 2020

dom96 left a comment

Choose a reason for hiding this comment

dom96 Nov 13, 2020

Choose a reason for hiding this comment

Araq Nov 13, 2020

Choose a reason for hiding this comment

dom96 Nov 13, 2020

Choose a reason for hiding this comment

Araq Nov 13, 2020

Choose a reason for hiding this comment

dom96 Nov 13, 2020

Choose a reason for hiding this comment

Araq Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

dom96 Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

Araq commented Nov 13, 2020

c-blake commented Nov 13, 2020

timotheecour Nov 13, 2020

Choose a reason for hiding this comment

timotheecour commented Nov 13, 2020

dom96 left a comment

Choose a reason for hiding this comment

dom96 Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

Araq commented Nov 13, 2020 • edited Loading

Araq commented Nov 13, 2020

c-blake commented Nov 13, 2020 • edited Loading

Araq commented Nov 13, 2020

dom96 commented Nov 13, 2020

c-blake commented Nov 13, 2020

Araq commented Nov 13, 2020

c-blake commented Nov 13, 2020

dom96 commented Nov 13, 2020

c-blake commented Nov 13, 2020

Araq commented Nov 13, 2020 • edited Loading

c-blake commented Nov 13, 2020

dom96 commented Nov 13, 2020

Araq commented Nov 13, 2020

c-blake commented Nov 13, 2020 • edited Loading

timotheecour commented Nov 13, 2020

dom96 commented Nov 13, 2020

dom96 commented Nov 13, 2020

dom96 Nov 13, 2020

Choose a reason for hiding this comment

Araq Nov 13, 2020 • edited Loading

Choose a reason for hiding this comment

c-blake commented Nov 13, 2020

Araq commented Nov 14, 2020

c-blake commented Nov 14, 2020

Araq Nov 13, 2020 •

edited

Loading

dom96 Nov 13, 2020 •

edited

Loading

dom96 Nov 13, 2020 •

edited

Loading

Araq commented Nov 13, 2020 •

edited

Loading

c-blake commented Nov 13, 2020 •

edited

Loading

Araq commented Nov 13, 2020 •

edited

Loading

c-blake commented Nov 13, 2020 •

edited

Loading

Araq Nov 13, 2020 •

edited

Loading