Work around Windows kernel bad asymptotic behavior when there are too many operations outstanding on the same handle #1280

njsmith · 2019-10-31T07:47:22Z

So as discussed over here, apparently the CancelIoEx operation on Windows becomes super-slow when you have a lot of operations outstanding on the same handle. And #1269 makes it so that every wait_readable/wait_writable call issues an operation on the same handle. So cancelling a bunch of these calls becomes like O(n^3) or something.

And we also issue CancelIoEx whenever we transition from 1->2 or 2->1 waiters on the same socket, so this can also be hit during normal usage, without Trio-level cancellation. (I think a typical case would be when you have a socket where one task is blocked for a long time in wait_readable, while another task is sending as fast as it can, so it keeps going in and out of wait_writable.)

The overhead isn't really noticeable up to like 1000 sockets or so, but eventually it dominates: at 10k sockets, cancelling a wait_readable is like 10x slower, and at 30k sockets, it's like 100x slower, getting into multiple ms per call.

@piscisaureus is the one who pointed this out, and he says that he's found that the cost is based on the number of AFD_POLL operations issued per AFD helper handle. So as a workaround, we could open multiple AFD helper handles, and spread our AFD_POLL operations around over them. This will involve some annoying bookkeeping, but it shouldn't be too bad – basically we'll want to keep a pool of AFD handles, and for each one set a limit on how many total simultaneous operations we'll allow. We'll keep a pool of handles that are below their limit, and pull from there if we can; and if we run out, we'll allocate a new handle.

There's a script in notes-to-self/socket-scaling.py to measure the problem, and that can be used to validate our fix.

The text was updated successfully, but these errors were encountered:

njsmith · 2019-11-01T07:46:40Z

I just realized also: it's possible that poll operations on different handles don't interact with each other, in which case we might be able to simplify the code a lot by using separate handles for read polls and write polls. Should check this.

njsmith · 2019-11-01T08:00:49Z

...so I just modified my little test script in notes-to-self/afd-lab.py to use separate AFD helper handles for the POLL_SEND and POLL_RECEIVE operations... and the weird interference bugs still happened. Darn.

According to @piscisaureus in #52 (comment), this doesn't happen if you also use different completion ports for the two handles (what the heck Windows why does that matter none of this makes any sense). But that's no use to us, because a single event loop has to funnel everything through a single completion port (unless we want to use threads but that's worse than what we're doing now).

njsmith · 2019-11-01T08:02:40Z

For reference, here's how I modified the script to use separate helper handles (and also double-check I wasn't missing any errors):

modified   notes-to-self/afd-lab.py
@@ -132,7 +132,7 @@ class AFDLab:
             print(f"Poll for {flags!r}: {sys.exc_info()[1]!r}")
             raise
         out_flags = AFDPollFlags(poll_info.Handles[0].Events)
-        print(f"Poll for {flags!r}: got {out_flags!r}")
+        print(f"Poll for {flags!r}: got {out_flags!r} ({lpOverlapped.Internal})")
         return out_flags
 
 
@@ -145,7 +145,8 @@ def fill_socket(sock):
 
 
 async def main():
-    afdlab = AFDLab()
+    afdlab1 = AFDLab()
+    afdlab2 = AFDLab()
 
     a, b = socket.socketpair()
     a.setblocking(False)
@@ -157,13 +158,13 @@ async def main():
         print("-- Iteration start --")
         async with trio.open_nursery() as nursery:
             nursery.start_soon(
-                afdlab.afd_poll,
+                afdlab1.afd_poll,
                 a,
                 AFDPollFlags.AFD_POLL_SEND,
             )
             await trio.sleep(2)
             nursery.start_soon(
-                afdlab.afd_poll,
+                afdlab2.afd_poll,
                 a,
                 AFDPollFlags.AFD_POLL_RECEIVE,
             )

njsmith · 2019-11-02T01:44:56Z

I also just double-checked, and confirmed that if you use separate IOCP completion ports, then everything works fine, which matches Bert's experience. Man, I do not understand Windows at all.

    async def t1():
        afdlab = AFDLab()
        with trio.move_on_after(10):
            await afdlab.afd_poll(a, AFDPollFlags.AFD_POLL_SEND)

    async def t2():
        afdlab = AFDLab()
        with trio.move_on_after(10):
            await afdlab.afd_poll(a, AFDPollFlags.AFD_POLL_RECEIVE)

    while True:
        print("-- Iteration start --")
        async with trio.open_nursery() as nursery:
            nursery.start_soon(
                trio.to_thread.run_sync, trio.run, t1
            )
            await trio.sleep(2)
            nursery.start_soon(
                trio.to_thread.run_sync, trio.run, t2
            )
            await trio.sleep(2)
            print("Sending another byte")
            b.send(b"x")

piscisaureus · 2019-11-05T02:00:37Z

I just realized also: it's possible that poll operations on different handles don't interact with each other, in which case we might be able to simplify the code a lot by using separate handles for read polls and write polls. Should check this.

Yeah I have tried that too at some point. Just like yourself, I found that it didn't work.

njsmith added Windows performance polish labels Oct 31, 2019

piscisaureus mentioned this issue Nov 5, 2019

Windows event notification #52

Closed

11 tasks

richardsheridan mentioned this issue Dec 24, 2020

avoid cubic AFD cancellation scaling with AFDGroups #1832

Merged

njsmith closed this as completed in #1832 Dec 29, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Work around Windows kernel bad asymptotic behavior when there are too many operations outstanding on the same handle #1280

Work around Windows kernel bad asymptotic behavior when there are too many operations outstanding on the same handle #1280

njsmith commented Oct 31, 2019 •

edited

Loading

njsmith commented Nov 1, 2019

njsmith commented Nov 1, 2019

njsmith commented Nov 1, 2019

njsmith commented Nov 2, 2019

piscisaureus commented Nov 5, 2019

Work around Windows kernel bad asymptotic behavior when there are too many operations outstanding on the same handle #1280

Work around Windows kernel bad asymptotic behavior when there are too many operations outstanding on the same handle #1280

Comments

njsmith commented Oct 31, 2019 • edited Loading

njsmith commented Nov 1, 2019

njsmith commented Nov 1, 2019

njsmith commented Nov 1, 2019

njsmith commented Nov 2, 2019

piscisaureus commented Nov 5, 2019

njsmith commented Oct 31, 2019 •

edited

Loading