-
Notifications
You must be signed in to change notification settings - Fork 561
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve io thread scheduler #4862
Conversation
This has been verified by vendors to not cause any regressions. An internal dependent team has validated this improves the responsiveness of WCF when opening synchronously while experiencing thread pool contention. Merging because of this extra validation. |
Hi @mconnew , Our team recently upgraded a .net core 3.1 api to .net6, under the hood the api makes multiple wcf calls to another system. We also tried upgrading to the preview version "4.10.0-preview1.22261.2", but still ran into the same issue. Today we saw this PR of yours from 3 days ago! We took the main branch and built System.ServiceModel.xx dlls, and the issue is now fixed! A couple of questions:
|
I don't know why this issue is worse in .NET 6. I believe the team who asked for this were experiencing the problem on .NET Core 3.1. There is another change I didn't explicitly call out was around staying async if not calling external code. The problem we have is we need to not break custom binding elements and need to maintain expected behavior. If you have a custom channel layer and call Open() on the channel, there is an expectation that Open() is called on the custom channel layer and not BeginOpen. The original code when you called Open(), it would call the internal OpenAsync() and continue on another thread. When that code needed to open the next channel layer, it would call Open() again as the original call was synchronous. This meant when you had several layers all using inbox channel layers, you would consume a thread calling Open() which then blocked waiting for OpenAync() to complete, one time for each layer. The change I made was to detect the next layer is an inbox implementation and pass on the info that Open() was originally called synchronously, but then call OpenAsync() as we know it can support that (and it won't be a surprise hitting an unimplemented code path). This means you only block the single original thread which calls Open() synchronously. I don't know what changed in .NET 6 to make this problem worse. It either needs more concurrent threads to do the same amount of work, or the threadpool doesn't grow as easily as in earlier versions. This is only an issue when you call WCF synchronously. There are a couple of suggestions I have. If you share a channel between multiple threads (call CreateChannel() and have multiple threads use that channel at the same time), always open the channel before first use (cast to ICommunicationObject). Open it asynchronously. You can do that like this: var channel = factory.CreateChannel();
var commObj = (ICommunicationObject)channel;
await Task.Factory.FromAsync(commObj.BeginOpen, commObj.EndOpen, null); Otherwise there's a behavior to ensure in order calling of requests (requests go out on the wire in the same order that the calls were made) which causes an issue. Basically the first request detects the channel needs to be opened and places the actual request into a queue and calls open. Once the channel is opened, it starts draining the queue making one request at a time. When a new request is made, if that queue isn't empty, it adds itself onto the end of the queue. This means if you initially have a lot of requests happening concurrently using a single channel, you can end up making the requests one at a time until the queue is empty. This is an easy situation to get into if you had some thread contention when you made the first call and you share a channel instance. So basically always explicitly open your channel if there's any sharing between threads. The next suggestion is to ensure your contract is only using async operations. The client and server don't need to match, so if your service implementation is sync, you client can still be async. As long as the name of the operation is compatible. This means the method If you are using a generated client, enabled channel factory caching. If your client type is called The final hammer you can throw at the problem is to increase the MinThreads on the thread pool so that it ramps up quicker. Work out how many worker threads you typically use at stable state, then have this code at the start of your program: int workerThread, ioThreads;
ThreadPool.GetMinThreads(out workerThreads, out completionPortThreads);
workerThreads = typicalWorkerThreadsMeasured;
ThreadPool.SetMinThreads(workerThreads, completionPortThreads); You can find out the number of threads you currently have with this code: int workerThread, ioThreads;
GetAvailableThreads(out workerThreads, out completionPortThreads); Or you can take a dump and in windbg use the sos command !threadpool. This removes the warmup period and allows the thread pool to immediately ramp up to the configured number of threads if needed. You might also need to increase the completion port threads too as async IO completes on completion port threads, but the issue this PR fixed isn't affected by the completion port thread limit and that would be a general scale up issue related to doing lots of IO. Edit: Just realized I didn't answer your final question. This fix will be in .NET 7. We are almost certain to release another preview/rc release with these changes well before then. |
Thanks heaps for the detailed reply, and advice. We rolled back our wcf dll to the current release v4.9.0. The changes we made:
p/s Our team from Brisbane, Australia is super thankful for the fix & your advice. |
SynchronizationContext.Current doesn't flow through an asynchronously completed await. The existing code only worked for scheduling code to an IO thread up until the second await which completed asynchronously. The first await correctly schedules code to run on using the sync context, but then Current would be null. This meant any subsequent await's would use the default synchronization context which means the continuation happened on the thread pool. This change creates a TaskScheduler based on the synchronization context and executes the code through that. This way the synchronization context is ignored (so a null value doesn't matter) and continuations correctly execute using the IOThreadScheduler.
There was also a bug where we could block the IOThreadScheduler dispatch thread as we were dispatching code which effectively called Task.Wait() to the IOThreadScheduler. This was only not a problem because of the previously mentioned bug where continuations were being scheduled to the thread pool which meant the Task.Wait() would complete. This should improve performance as now we don't block the IOThreadScheduler with a Wait() call so any other code queued up on the scheduler will progress quicker.