Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Work around Windows kernel bad asymptotic behavior when there are too many operations outstanding on the same handle #1280

Closed
njsmith opened this issue Oct 31, 2019 · 5 comments · Fixed by #1832

Comments

@njsmith
Copy link
Member

njsmith commented Oct 31, 2019

So as discussed over here, apparently the CancelIoEx operation on Windows becomes super-slow when you have a lot of operations outstanding on the same handle. And #1269 makes it so that every wait_readable/wait_writable call issues an operation on the same handle. So cancelling a bunch of these calls becomes like O(n^3) or something.

And we also issue CancelIoEx whenever we transition from 1->2 or 2->1 waiters on the same socket, so this can also be hit during normal usage, without Trio-level cancellation. (I think a typical case would be when you have a socket where one task is blocked for a long time in wait_readable, while another task is sending as fast as it can, so it keeps going in and out of wait_writable.)

The overhead isn't really noticeable up to like 1000 sockets or so, but eventually it dominates: at 10k sockets, cancelling a wait_readable is like 10x slower, and at 30k sockets, it's like 100x slower, getting into multiple ms per call.

@piscisaureus is the one who pointed this out, and he says that he's found that the cost is based on the number of AFD_POLL operations issued per AFD helper handle. So as a workaround, we could open multiple AFD helper handles, and spread our AFD_POLL operations around over them. This will involve some annoying bookkeeping, but it shouldn't be too bad – basically we'll want to keep a pool of AFD handles, and for each one set a limit on how many total simultaneous operations we'll allow. We'll keep a pool of handles that are below their limit, and pull from there if we can; and if we run out, we'll allocate a new handle.

There's a script in notes-to-self/socket-scaling.py to measure the problem, and that can be used to validate our fix.

@njsmith
Copy link
Member Author

njsmith commented Nov 1, 2019

I just realized also: it's possible that poll operations on different handles don't interact with each other, in which case we might be able to simplify the code a lot by using separate handles for read polls and write polls. Should check this.

@njsmith
Copy link
Member Author

njsmith commented Nov 1, 2019

...so I just modified my little test script in notes-to-self/afd-lab.py to use separate AFD helper handles for the POLL_SEND and POLL_RECEIVE operations... and the weird interference bugs still happened. Darn.

According to @piscisaureus in #52 (comment), this doesn't happen if you also use different completion ports for the two handles (what the heck Windows why does that matter none of this makes any sense). But that's no use to us, because a single event loop has to funnel everything through a single completion port (unless we want to use threads but that's worse than what we're doing now).

@njsmith
Copy link
Member Author

njsmith commented Nov 1, 2019

For reference, here's how I modified the script to use separate helper handles (and also double-check I wasn't missing any errors):

modified   notes-to-self/afd-lab.py
@@ -132,7 +132,7 @@ class AFDLab:
             print(f"Poll for {flags!r}: {sys.exc_info()[1]!r}")
             raise
         out_flags = AFDPollFlags(poll_info.Handles[0].Events)
-        print(f"Poll for {flags!r}: got {out_flags!r}")
+        print(f"Poll for {flags!r}: got {out_flags!r} ({lpOverlapped.Internal})")
         return out_flags
 
 
@@ -145,7 +145,8 @@ def fill_socket(sock):
 
 
 async def main():
-    afdlab = AFDLab()
+    afdlab1 = AFDLab()
+    afdlab2 = AFDLab()
 
     a, b = socket.socketpair()
     a.setblocking(False)
@@ -157,13 +158,13 @@ async def main():
         print("-- Iteration start --")
         async with trio.open_nursery() as nursery:
             nursery.start_soon(
-                afdlab.afd_poll,
+                afdlab1.afd_poll,
                 a,
                 AFDPollFlags.AFD_POLL_SEND,
             )
             await trio.sleep(2)
             nursery.start_soon(
-                afdlab.afd_poll,
+                afdlab2.afd_poll,
                 a,
                 AFDPollFlags.AFD_POLL_RECEIVE,
             )

@njsmith
Copy link
Member Author

njsmith commented Nov 2, 2019

I also just double-checked, and confirmed that if you use separate IOCP completion ports, then everything works fine, which matches Bert's experience. Man, I do not understand Windows at all.

    async def t1():
        afdlab = AFDLab()
        with trio.move_on_after(10):
            await afdlab.afd_poll(a, AFDPollFlags.AFD_POLL_SEND)

    async def t2():
        afdlab = AFDLab()
        with trio.move_on_after(10):
            await afdlab.afd_poll(a, AFDPollFlags.AFD_POLL_RECEIVE)

    while True:
        print("-- Iteration start --")
        async with trio.open_nursery() as nursery:
            nursery.start_soon(
                trio.to_thread.run_sync, trio.run, t1
            )
            await trio.sleep(2)
            nursery.start_soon(
                trio.to_thread.run_sync, trio.run, t2
            )
            await trio.sleep(2)
            print("Sending another byte")
            b.send(b"x")

@piscisaureus
Copy link

I just realized also: it's possible that poll operations on different handles don't interact with each other, in which case we might be able to simplify the code a lot by using separate handles for read polls and write polls. Should check this.

Yeah I have tried that too at some point. Just like yourself, I found that it didn't work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants