-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Check for pending IO in the portable thread pool's worker threads #82245
Conversation
- When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Fixes dotnet#82207
Tagging subscribers to this area: @mangod9 Issue Details
Fixes #82207
|
Just to build confidence, do we know why this happens / can we find out why this happens? Is it documented somewhere? |
src/libraries/System.Private.CoreLib/src/System/Threading/PortableThreadPool.WorkerThread.cs
Outdated
Show resolved
Hide resolved
I don't know yet, still looking into it. |
Looks like the failures are unrelated. |
There is a brief note in AcceptEx docs:
I'm not sure about the details though, I have a question out to Windows folks. I'll go ahead and merge this for now, as it seems like something that should be fixed anyway. |
We're patching this in 7.0 right? |
cc @mconnew |
This looks like it has a pathological case where multiple IO operations can be pending for long periods of time with no real work to complete. This will cause a tight loop spinning CPU. The known scenario I'm concerned about is CoreWCF (or WCF client) which both tend to make IO calls on the IO thread pool. This is because after completing a pending receive, they send back an ACK and make the next pending receive call on the same thread that the completion happened on. The NetTcp transport has a default receive timeout of 10 minutes. For each active connection (which can be thousands in the server case, and in a 3-tier app the middle tier can have thousands of WCF clients) we have a pending receive. We had a customer issue with WCF where their employees would all log in to the client app within a short period of time (around 200,000 new client connections within about 15 minutes) which would have a burst of threads handling the connection handshakes. That case ended up being an lsass bottleneck, but their application would hit a problem with this implementation. In that scenario, you would have 200,000 pending IO connections across 400+ IO threads, so every IO thread has many pending IO operations. Once things settled down, the request rate was very low and the IO thread pool would shrink over time. The IO thread pool was able to shrink as new work was only getting executed via a call to GetQueuedCompletionStatus. The special OS behavior of threads bound to an IO completion port means a thread won't be preemptively context switched away from if it's not in an alertable state (doing actual work) which means all the extra IO threads calling GetQueuedCompletionStatus wouldn't end up running any new work. With this new implementation, that constraint doesn't exist as the work is dispatched to a non IOCP thread, which means the "extra" threads are still going to pick up "new" work, and that new work will be able to initiate new IO, and keeping the thread around. You could end up with hundreds of threads spinning consuming CPU, with any IO completions being evenly distributed between them. With enough concurrent connections, the threads will be self maintaining and never clean up, spending most of their time consuming CPU. |
- Port of dotnet#82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for dotnet#82207
- Port of dotnet#82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for dotnet#82207
…ds (#82248) * [6.0] Check for pending IO in the portable thread pool's worker threads - Port of #82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for #82207 * Refactor Windows-specific code
…ds (#82246) * [7.0] Check for pending IO in the portable thread pool's worker threads - Port of #82245 - When Resource Monitor is attached, some async IO operations are bound to the thread that issued it even though the IO handle is bound to an IOCP. If the thread exits, the async IO operation is aborted. This can lead to hangs or unexpected exceptions. - Added a check that was missing in the portable thread pool implementation to prevent exiting a worker thread when it has pending IO Port of fix for #82207 * Refactor Windows-specific code
Fixes #82207