-
-
Notifications
You must be signed in to change notification settings - Fork 343
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Design: do we need batched accept / accept_nowait? #14
Comments
Here's another case where Draining a listening socket means, handle all currently pending attempts when you know that no new ones will be arriving. That's a pretty natural fit to This is pretty obscure, but if you need it, you really need it. And who knows, maybe someday some OS will make it possible to do something similar for AF_INET{,6} listening sockets, which would be really handy. |
Ah-hah, I thought of a case where batched accept makes a difference: Say we have a mixed workload, involving a CPU-bound "background task" that regularly I think this situation is kind of problematic all around... it would be difficult to service a connection in 0.1 ms in Python, and only yielding every 20 ms is not so great, and if the goal is that the heavy background task shouldn't interfere with the light connection tasks, then wouldn't the Right solution be to make a better scheduling algorithm (#32) or just assign a low static priority to the background task? But I'm sure this kind of thing happens, at least at times. I guess that from this point of view, the point of a batched accept is that normally an accept loop is a very light task (it does very little work between calls to accept), so standard round-robin scheduling will tend to implicitly assign it a very low priority (each of its individual time slices end up being very small compared to other tasks, and round-robin says that they all get the same number of time slices). So it's a heuristic way of compensating for a particular failure mode of the not-very-clever round-robin scheduler. |
It looks like #636 will solve the API parts of this issue, by moving the |
Something I just realized: at some point we may make |
Note that in the year 2022 we can punt batched accept up into the kernel level with io_uring and IORING_ACCEPT_MULTISHOT, which continuously pushes accepted connections onto the completion queue without needing to do accept(). |
https://bugs.python.org/issue27906 suggests that it's important to be able to accept multiple connections in a single event loop tick.
It's trivial to implement, too -- just
But...
I am... not currently able to understand how/why this can help. Consider a simple model where after accepting each connection, we do a fixed bolus of CPU-bound work taking T seconds and then immediately send the response, so each connection is handled in a single loop step. If we accept only 1 connection per loop step, then each loop step takes
1 * T
seconds and we handle 1 connection/T seconds on average. If we accept 100 connections per loop step, then each loop step takes100 * T
seconds and we handle 1 connection/T seconds on average.Did the folks in the bug report above really just need an increased backlog parameter to absorb bursts? Batching
accept()
certainly increases the effective listen queue depth (basically making it unlimited), but "make your queues unbounded" is not a generically helpful strategy.The above analysis is simplified in that (a) it ignores other work going on in the system and (b) it assumes each connection triggers a fixed amount of synchronous work to do. If it's wrong it's probably because one of these factors matters somehow. The "other work" part obviously could matter, if the other work is throttlable at the event loop level, in the sense that if loop steps take longer then they actually do less work. Which is not clear here (the whole idea of batching accept is make it not have this property, so if this is how we're designing all our components then it doesn't work...).
I guess one obvious piece of "other work" that scales with the number of passes through the loop is just, loop overhead. One would hope this is not too high, but it is not nothing.
If we do want this, then #13 will want to use it.
The same question probably applies to
recvfrom
/sendto
-- right now a task can only send or receive 1 UDP packet per event loop tick.The text was updated successfully, but these errors were encountered: