Another deadlock - or at least a very slow lock #107

IanYates · 2017-09-09T05:27:17Z

I added this to #57 but thought that since it's closed, the repro for an bug there may not have been noticed.

Here's the last post from that thread... Basically it's an issue which v3 doesn't exhibit but v4 and v5 do. I think I may have hit this in production, just after I finally moved to v4.x after staying on a patched v3.x for many months.

It's intermittent but definitely does reproduce for me after a few executions. Hoping you can shed some light on the issue :)
Thanks for the library and the general community work!

Post from #57 starts now...

The repro provided by @Allon-Guralnek will cause the issue for me.

In LinqPad I've got this

AsyncLock l = new AsyncLock();

void Main()
{
	Task.Run(() =>
	{

		Console.WriteLine("Starting tasks...");
		var tasks = Enumerable.Range(0, 100).Select(i => Task.Run(() => Do1(i))).ToArray();

		Console.WriteLine("Tasks started. Waiting for all to complete...");
		Task.WaitAll(tasks);

		Console.WriteLine("All done!");
	}).Wait();
}

void Do1(int i)
{
	if (i == 0)
		Thread.Sleep(100);

	using (l.Lock())
	{
		// mutate shared state atomically
	}

	Console.WriteLine("Done: " + i);
}

void Do2(int i)
{
	if (i == 0)
		Thread.Sleep(100);

	lock (l)
	{
		// mutate shared state atomically
	}

//	Console.WriteLine("Done: " + i);
}

You'll note that I've got a Task.Run() in the main() method to get it started.

Sometimes it runs through just fine. Other times it'll stop.

Example output

Starting tasks...
Done: 1
Tasks started. Waiting for all to complete...
Done: 2
Done: 99

and 1 minute 13 seconds later I'm still nowhere :(
After 1 minute 28 seconds it suddenly finished

Starting tasks...
Done: 1
Tasks started. Waiting for all to complete...
Done: 2
Done: 99
Done: 4
Done: 3
Done: 6
Done: 7
Done: 8
Done: 5
Done: 9
Done: 98
Done: 0
Done: 10
Done: 11
Done: 12
Done: 13
Done: 14
{{snip}}All the rest were in order{{/snip}}

But I can hit stop & start and get it working again. 9 times out of 10 it works properly. Removing the Console.WriteLine inside the loop seems to make it fail more often.

With LinqPad I can quickly swap between
Nito.AsyncEx.Coordination NuGet package - this made the poor output above
Nito.AsyncEx NuGet package v3.x - works
Nito.AysyncEx NuGet package v4.x (strong named one was convenient to use) - failed first go with no output at all :(
Nito.AsyncEx v3.x with one line patched from earlier - works

So this appears to be a regression where v3.x - with the one-line patch from earlier in this issue - is good but AsyncEx v4.x and v5.x are both bad.

It's a tricky one as it's intermittent and does eventually come good.

The text was updated successfully, but these errors were encountered:

StephenCleary · 2017-09-10T16:54:24Z

Yes, this is actually a known problem (though it didn't have a GH issue before). In fact, this is exactly the reason why AsyncEx v5 is still in prerelease. One of my commercial clients first made me aware of it, and while they're waiting for a fix they just cranked up the min-threads on the thread pool as a workaround.

This problem comes up most often when:

Many thread pool threads are synchronously waiting on an AsyncEx primitive.
The thread pool is exhausted.

What's actually happening is that the first lock goes through just fine (synchronously). Then the thread pool is exhausted with a bunch of threads synchronously blocked on that lock. Then when the first thread unlocks, it releases the next waiter, but requires a thread pool thread to do so. Thus, it must wait until the thread pool gets around to running that "release" code. Meanwhile, the thread pool is injecting one new thread every 2 seconds or so, in no rush. So we end up with a lock convoy with huge wait times due to the thread pool injection rate.

This is due to the core change of v4 -> v5: all continuations are asynchronous in the core of all the primitives (AsyncWaitQueue is the "core"). This simplifies the code considerably, but does assume that there is available room on the thread pool.

For a solution, I'm trying out variations of a custom "completer" thread. Something very similar to Ayende's blog here.

IanYates · 2017-09-10T23:35:52Z

Hi @StephenCleary - thanks very much for the comprehensive answer.

Ayende's solution looks pretty smart. Having a thread set aside to "keep things going" whilst also letting the ThreadPool race to handle the continuation might add a tiny bit of overhead (negligible probably) but avoids this, which is a huge plus.

Your mention of min threads does remind me that in our web app I've bumped up min threads as it helped with something similar a while ago. Maybe I've just run into it again? Anyway, v3.x (with the one change discussed about a year ago from #57 ) is working fine for me at the moment.

I only ran into this, I think, because in our web app I have some leases on some image & blob objects which are acquired asynchronously. There's also a background process which releases them from memory if they haven't been actively used in a while. That's handled using IDisposable, and to ensure a lease isn't returned whilst disposing (returning a lease after dispose just disposes rather than pooling again) I took a sync lock during the Dispose.
I still have that code, but since the only thing really doing a Dispose is that background task, it does the resource cleanup asynchronously, effectively using an AsyncDispose call, before calling the real Dispose. Therefore the real Dispose just checks to see if disposed is already set, and if so, doesn't do any extra work. I'm safe enough in case I do a Dispose via other means in future but avoid the sync-lock on an AsyncLock.

Thanks again for the great library & support!
Feel free to close this issue, or if it helps with planning, leave it open.

molinch · 2017-10-16T08:12:42Z

Basically these starvation issues can only ever happen when using the AsyncEx primitives both synchronously and asynchronously.

Isn't it?

StephenCleary · 2017-10-16T12:16:44Z

No; the problem can be caused by either synchronous or asynchronous APIs when the thread pool is saturated. At the core, both sync and async lock APIs require a thread pool to become unlocked, and that's what causes the issue.

molinch · 2017-10-18T06:52:43Z

Thanks it's clear now.

This is due to the core change of v4 -> v5: all continuations are asynchronous in the core of all the primitives (AsyncWaitQueue is the "core"). This simplifies the code considerably, but does assume that there is available room on the thread pool.

For a solution, I'm trying out variations of a custom "completer" thread.

Don't you think that spawning your own thread to solve this issue would be both overkill and complex? Making continuations synchronous again, so we don't have this issue, seems way nicer and easier to debug.

StephenCleary · 2017-10-18T13:19:22Z

Making the continuations synchronous has its own problems. For a given TaskCompletionSource, its continuations can either be forced-asynchronous (used by v5) or unspecified (used by v4).

Synchronization primitives in v4 avoid this thread pool exhaustion issue by using synchronous continuations. However, since end-user code is also expressed as a continuation, we have to force those continuations to be asynchronous or else we'd end up with a callback-under-lock scenario, which very easily results in deadlock. So in v4, all continuations are actually semi-asynchronous; they're synchronous enough to avoid the thread pool exhaustion but asynchronous enough to avoid the deadlock. As you can imagine, this complicates the code considerably.

I do much prefer the approach forcing asynchrony for end-user continuations, just from a code perspective (simpler, easier to prove correct, easier to maintain). Assuming I keep this, there are two options for avoiding the thread pool exhaustion issue: introducing a separate thread, and separating end-user continuations from internal continuations (requiring separate TCSs). I've played with both and am definitely leaning towards the thread solution.

LarsKemmann · 2017-10-18T14:09:10Z

Anytime I've used AsyncEx I've typically bought into multithreading enough that paying the overhead of one more dedicated thread was not an issue for performance. I don't think it would make a difference, one way or the other, for the experience of debugging client code. The thread solution gets my vote if it's simpler to maintain.

molinch · 2017-10-18T14:17:31Z

The beauty of asynchronous programming is that "there is no thread". I learned it from Stephen, both from his book and articles.

If it's easier to maintain I understand of course, but I just find it a bit "sad" we have to rely on spawning a manual Thread, which isn't controlled by the ThreadPool. It still feels complicated.

IanYates · 2017-10-19T06:41:09Z

Possibly complicated, but it's entirely abstracted away by the library to the point where 99% of library consumers won't even know it's there.

Making continuations asynchronous avoids the "sometimes sync, sometimes async" behaviour that brought jQuery promises a bad name. I appreciate the environment is different - since JS is completely single-threaded - but the consistency argument still applies (see http://blog.izs.me/post/59142742143/designing-apis-for-asynchrony and https://medium.com/@bluepnume/intentionally-unleashing-zalgo-with-promises-ab3f63ead2fd for the JS-side of the fence).

molinch · 2017-10-19T07:03:46Z

Thanks Ian these are very good arguments, I agree most people will just consume the library, without looking deep in the internals.

One remark though: even today when you await in C#, it may go synchronous or asynchronous as well.
If the Task is already completed then the continuation will most likely be synchronous.
So what you describe is already happening now (I have to admit it's not the usual use case).

StephenCleary · 2017-10-19T13:52:41Z

Just to clarify, the synchronous vs asynchronous continuations that I'm talking about are only when there is a continuation, so from the end-user's perspective this only applies when the await is asynchronous. When I'm talking about the continuation being synchronous or asynchronous, that's from the perspective of the TCS-completing thread.

To use a lock example, LockAsync will behave synchronously if the lock is unlocked; that's not going to change. The situation we're fixing here is when the lock is already locked, and some code UseResource calls LockAsync (which behaves asynchronously since the lock is already locked) and awaits on the returned task. Then when some thread releases the lock, we need to complete that TCS, but we can't do it while holding an internal lock. We need to push that completion outside of the lock, which is most easily done by forcing the continuation to be asynchronous.

So there are still plenty of situations with AsyncEx where end-user code will await, and it will behave synchronously, just like other TAP code. This issue is purely about a careful internal design that will be maintainable, prevent deadlocks, and also continue to work when the thread pool is exhausted. It won't have any effect on end-user semantics as far as which operations act synchronously and which ones are asynchronous.

This actually comes down to how await schedules continuations. Specifically, it always schedules them synchronously - so, after an await acts asynchronously, when its task is completed, the asynchronous method that did the await attempts to resume synchronously on the same thread and stack that is completing the task. When I discovered this, I found it very surprising and reported it as a bug (closed as "by design"). It can cause deadlocks (as described in my bug report) as well as stack overflows, unless end-user code take steps to prevent it.

It's my personal opinion that JavaScript's promise callback and await semantics are much better than C#'s await semantics. I think Microsoft was too focused on micro-benchmarks and missed out on providing an easier-to-use abstraction for developers.

IanYates · 2017-10-20T01:21:33Z

Reading the VS release notes for 15.4.1 (out as of about 40 minutes ago) I came across this Roslyn thread
dotnet/roslyn#22650 (comment)

It's not quite the same issue but it reminded me of this discussion. A block synchronous resource - disk - coupled with lots of async calls piling up, then the thread pool starving (or growing very slowly). Their problem is they also gobbled up RAM at the same time. The solution they seem to be discussing (and have released) is to have a single thread to which the other tasks post their requests for work.

Sorry if this is off-topic - I just found it interesting as another manifestation of a problem in the same ballpark as this thread's issue.

StephenCleary · 2017-10-22T19:15:14Z

That is very interesting. Particularly since the thread pool has special dedicated I/O threads so that the limited thread injection rate shouldn't affect that kind of scenario. But in reality, a lot of the time the BCL code running on the I/O thread will post work to a worker thread so it "jumps" over to the more limiting side before it executes end-user code.

waynebrantley · 2017-12-09T18:01:06Z

Why not just create a SemaphoreSlim(1,1) and use WaitAsync() on it. That has the effect of locking?

StephenCleary · 2017-12-10T19:41:11Z

@waynebrantley Yes, that's a fine workaround while this bug is still in AsyncEx.

sjb-sjb · 2018-01-08T13:29:23Z

Would a wrapped SemaphoreSlim(1,1) cause similar issues under high load?

StephenCleary · 2018-01-08T15:10:59Z

@sjb-sjb No; it does not depend on a thread pool thread being free in order to release.

sjb-sjb · 2018-01-08T17:35:35Z

I wonder what the rationale is for using the queuing approach rather than the older SemaphoreSlim method.

StephenCleary · 2018-01-08T18:59:14Z

Historically, SemaphoreSlim did not support asynchronous operations at the time AsyncLock was written. I'm not sure if SemaphoreSlim supports them on all the AsyncEx platforms today. But even if it did, AsyncLock has some advanced usages (namely, integration with AsyncConditionVariable and the ability to use custom queues to implement alternative locking strategies such as prioritized locks) that can't be supported with a SemaphoreSlim wrapper.

Under the hood, SemaphoreSlim also uses a queuing approach, but their implementation avoids the thread pool exhaustion issue here. They use an approach more similar to what I used in AsyncEx v4 and earlier.

jam40jeff · 2018-01-08T19:02:09Z

SemaphoreSlim does not guarantee the order in which the threads are released.

Having this guarantee is an important feature of AsyncLock IMO.

jesuissur · 2018-01-11T13:44:48Z

@StephenCleary Which min-thread settings in the thread pool should be enough to alleviate the problem without causing others?

ThreadPool.SetMinThreads(Int32, Int32) Method

StephenCleary · 2018-01-11T14:58:18Z

@jesuissur Bumping the min-threads is one option, and some v5 users are using that as a workaround for now.

jesuissur · 2018-01-11T15:01:05Z

@StephenCleary Sorry, I think my question was not clear enough. Which min-threads value should be a good starting point to try the bumping workaround?

StephenCleary · 2018-01-11T15:04:03Z

@jesuissur That depends on your application type and server. I'd recommend getting the minimum threads first and then increasing it.

ygoe · 2018-06-28T13:17:11Z

Can the thread pool exhaustion be detected upfront? So if you start off something that needs one additional thread, could you know that it's going to be a problem and do something about it while a thread is still awake?

StephenCleary · 2018-06-29T02:20:40Z

@ygoe: I've never tried. There's ThreadPool.GetAvailableThreads, but it has unclear semantics - is the value the number of currently available threads, or the number of threads that can be created without a cooldown delay?

ygoe · 2018-06-30T12:39:13Z

I'm sorry I don't know it myself. Just tried to throw in an idea to get this resolved. My own library is now tied to being a pre-release because of this dependency.

StephenCleary · 2018-07-21T14:43:20Z

I've been working on a few different experiments related to this issue. During my travelling for OSCON, I thought it would be useful to write out the background and context in which I'm thinking about this problem. Doing this was a great exercise; it allowed me to step back and identify each of the issues at hand (and then choose a new approach). This comment is mostly a brain-dump for now, but some of it will be reworked into a CONTRIBUTING doc at some point.

Central Postulates:

AsyncEx uses locks, not lockless code. Lockless code is difficult to understand and verify, and - even more dangerously - it's very common to think one understands how to write lockless code when this is not the case, due to the incredible complexity of memory models. See Double-Checked Locking Is Broken for a good historical example from Java.
The greatest cardinal sin of concurrent programming is invoking arbitrary code while holding a lock. In the general case, this includes completing a TCS, since continuations may run synchronously.

AsyncEx Code

Most AsyncEx coordination primitive code is straightforward and doesn't even use async/await. However:

Some methods do use async/await.
One interaction is particularly notable: the relationship between AsyncLock and AsyncConditionVariable. The ACV needs to be able to unlock and relock the AL without affecting the IDisposable "key" that the user already has.
Coordination primitives which queue are built on a wait queue abstraction, which permits users to do fancy things like create priority locks. This wait queue abstraction does force us to use sync-over-async in some places instead of using the (preferred) boolean argument hack; AsyncConditionVariable is a good example of this.

So, a good cross-section for experimentation purposes is:

AsyncManualResetEvent (extremely simple, no queue)
AsyncSemaphore (simple with queue)
AsyncLock + AsyncConditionVariable (internal interaction, and code that requires sync-over-async).

When I run experiments like this, I focus on these four types. If it works for them, it should work anywhere.

AsyncEx v4

In v4, all TCS's used synchronous continuations, just like "normal" async code. v4 avoided invoking end-user callbacks under lock by separating the TCS completion code from internal logic. This necessitated a bit of a "dance" between the wait queue and the coordination primitives, with the completion code represented by a disposable.

This worked well enough, but has always had one theoretical issue: stack overflow is a possibility. The TPL does have heuristics in place to prevent this, but they don't always work. AFAIK, this was never actually observed in production.

In v5, one of the main changes was to use asynchronous continuations for TCS's. This simplified the wait queue code (and coordination primitive code) by removing the "dance" completely. Unfortunately, this has caused a dependency on the thread pool even for internal operations. And this issue is not theoretical; it is very rare but has certainly been observed.

Possible Solutions

Solutions must maintain the Central Postulates above, in addition to these considerations:

If we have a dedicated thread, then we want to keep end-user code from ever running on that thread. The thread must be dedicated to executing our own short, nonblocking internal code.
Our internal code must not depend on a thread pool thread. In the original example code for this problem, Lock returned an IDisposable whose Dispose implementation dependended on a thread pool thread. However, it is permissible to depend on a thread pool thread to resume executing asynchronous end-user code.

The solutions that I've considered (some of these have experimental code in different branches):

ForceInline:
- Use RunContinuationsAsynchronously | DenyChildAttach flag for all TCS tasks. These can be returned to the user or our own code.
- Our code uses ForceInline to override the RunContinuationsAsynchronously flag and force our continuations to be synchronous.
- Evaluation:
  👍 Similar to how the old code works.
  👎 No protection from stack exhaustion. The old code did (in theory).
  👎 Users can use ForceInline to override our own force-asynchronous behavior.
Dedicated Thread:
- Use RunContinuationsAsynchronously | DenyChildAttach flag for all TCS tasks. These can be returned to the user or our own code.
- Our code uses ResumeOnDedicatedThread to override the RunContinuationsAsynchronously flag and force the continuations to run asynchronously on our dedicated thread.
- Our code does not need to complete TCS's from the dedicated thread.
  👍 Our code always runs on the dedicated thread.
  👎 Users can use ForceInline to override our own force-asynchronous behavior, and thus execute end-user code on our dedicated thread.
TCS Pair with ConfigureAwait(false)
- FirstTcs is a normal TCS.
- SecondTcs is a normal TCS, linked from FirstTcs by giving FirstTcs a continuation that propagates its completion to SecondTcs, wrapped with an explicit Task.Run.
- Our code awaits FirstTcs using ConfigureAwait(false).
- User only sees SecondTcs, which is forced asynchronous (due to Task.Run).
  👍 Similar to how the old code works, including (theoretical) protection from stack exhaustion.
  👍 Users cannot force synchronous behavior using ForceInline, due to the explicit Task.Run.
  👎 Our awaits are not guaranteed to be synchronous. If stack exhaustion is imminent, or if the user is (indirectly) completing a TCS that doesn't want to run our continuation, then we will still require a thread pool thread. The old v4 code had this same problem.
TCS Pair with ForceInline
- Just like above, but instead of awaiting FirstTcs.ConfigureAwait(false), our code awaits FirstTcs.ForceInline().
  👍 Similar to how the old code works.
  👍 Users cannot force synchronous behavior using ForceInline, due to the explicit Task.Run.
  👎 No protection from stack exhaustion. The old code did (in theory).
TCS Pair with ResumeOnDedicatedThread
- Just like above, but instead of awaiting FirstTcs.ConfigureAwait(false), our code awaits FirstTcs.ResumeOnDedicatedThread().
  👍 Our code always runs on the dedicated thread.
  👍 Users cannot use ForceInline to override our own force-asynchronous behavior, due to the explicit Task.Run. This prevents users from executing end-user code on our dedicated thread.

Conclusion

After writing all these out, I believe the best solution going forward is the last one. My next step is to experiment with that solution on the experimental cross-section (AsyncManualResetEvent + AsyncSemaphore + AsyncLock + AsyncConditionVariable), and if that looks good, apply it through the rest of the code.

ygoe · 2019-02-10T21:07:25Z

I'm very sorry to sound demanding, but this project appears dead to me. This library does a really great job and is really helpful in my project. I've read a lot of helpful and profound comments and answers from you in several places, that's why I consider your work valuable. I decided to use the v5 branch of AsyncEx to be more future-proof and I think there was also another reason to use it. But this ongoing delay of a release is starting to get in my way now. I'm about to release a library myself and it makes use of some of the AsyncEx functions. NuGet forces me to label my package as a pre-release because it has a pre-release dependency. So I'm considering to remove that package dependency and incorporate all referenced code into my own code base. This isn't something I like to do, but it seems the only way out of this release handling. A regular fork of the library would be more difficult because I'd need a new name and do all the packaging. And even though forks might immediately help others, too, they usually hurt the community in the long run.

So if you could find some time to resolve this last issue or maybe at least note it as a known issue in a stable release, I'd really appreciate that. There have been some suggestions in this discussion that sound promising. I know I'm not very helpful in the matter, I've tried before but this just exceeds my experience.

StephenCleary · 2019-02-10T21:22:09Z

@ygoe Yes, I have considered releasing a stable version with this as a known bug. The perfectionist in me doesn't like that, but commercial software does it all the time.

I've done a ton of work in different branches, approaching the problem different ways, but hitting blocks each time. I did finally find a permanent solution, but haven't had time to implement it (the final solution also requires careful documentation).

At the same time, I'm aware of the problems with keeping a library prerelease for so long. My current plan is to try to finish this issue once and for all over MVP summit (one time of the year when I have a lot more free time than normal). If I don't finish it then, then I promise I'll release it as stable with a known issue.

ygoe · 2019-02-10T22:20:02Z

@StephenCleary Thank you for the update. I'm looking forward to end of March this year. :-)

peter-perot · 2019-08-27T17:50:05Z

Concerning the use of SemaphoreSlim, I observe the same issue here, but only when async and sync stuff are mixed.

Here is the code to reproduce this issue using SemaphoreSlim:

using System;
using System.Diagnostics;
using System.Threading;
using System.Threading.Tasks;

namespace Example
{
  public class Program
  {
    static readonly SemaphoreSlim s = new SemaphoreSlim(1, 1);

    static void Main()
    {
      Task.Run(() =>
      {
        Console.WriteLine("Starting tasks...");
        var sw = new Stopwatch();
        sw.Start();

        var tasks = RunTasks(onlySync: true, onlyAsync: false);

        Console.WriteLine("Sync tasks started. Waiting for all to complete...");
        Task.WaitAll(tasks);

        sw.Stop();

        Console.WriteLine("All sync tasks done! Time elapsed: " + sw.Elapsed.TotalSeconds + " seconds");

        sw.Reset();
        sw.Start();

        tasks = RunTasks(onlySync: false, onlyAsync: true);

        Console.WriteLine("Async tasks started. Waiting for all to complete...");
        Task.WaitAll(tasks);

        sw.Stop();

        Console.WriteLine("All async tasks done! Time elapsed: " + sw.Elapsed.TotalSeconds + " seconds");

        sw.Reset();
        sw.Start();

        tasks = RunTasks(onlySync: false, onlyAsync: false);

        Console.WriteLine("Mixed tasks started. Waiting for all to complete...");
        Task.WaitAll(tasks);

        sw.Stop();

        Console.WriteLine("All mixed tasks done! Time elapsed: " + sw.Elapsed.TotalSeconds + " seconds");
      }).Wait();

      Console.ReadLine();
    }

    static Task[] RunTasks(bool onlySync, bool onlyAsync)
    {
      bool mixed = !onlySync && !onlyAsync;
      var tasks = new Task[30];

      for (int i = 0; i < 30; i++)
      {
        Task task;
        int i1;

        if (mixed)
        {
          if (i % 2 == 0)
          {
            i1 = i;
            task = Task.Run(() => DoWithSemaphore(i1));
          }
          else
          {
            i1 = i;
            task = Task.Run(async () => await DoWithSemaphoreAsync(i1));
          }
        }
        else if (onlySync)
        {
          i1 = i;
          task = Task.Run(() => DoWithSemaphore(i1));
        }
        else
        {
          i1 = i;
          task = Task.Run(async () => await DoWithSemaphoreAsync(i1));
        }

        tasks[i] = task;
      }

      return tasks;
    }

    static void DoWithSemaphore(int i)
    {
      s.Wait();

      try
      {
        // mutate shared state atomically
        Thread.Sleep(100);
      }
      finally
      {
        s.Release();
      }

      Console.WriteLine("Done: " + i);
    }

    static async Task DoWithSemaphoreAsync(int i)
    {
      await s.WaitAsync();

      try
      {
        // mutate shared state atomically
        await Task.Delay(100);
      }
      finally
      {
        s.Release();
      }

      Console.WriteLine("Done (async): " + i);
    }
  }
}

When running the code and we only use "sync" tasks, the program completes after about 3 seconds.
When running the code and we only use "async" tasks, the program completes after about 3 seconds.
When running the code and we use mixed tasks, the program takes much longer to terminate (it varies a bit).

When you take a look at the source code of SemaphoreSlim, you can see that they are using a TaskNode class that is just a derived Task<> constructed with parameter TaskCreationOptions.RunContinuationsAsynchronously.

As soon as async waiters are in the game (see line 358 of SemaphoreSlim), all follow-up sync waiters are also put in the queue of async waiters.

On the other side, if we only have sync waiters (and no async ones), a pulse-wait approach is used, which does not need a thread pool.

And if we only have async waiters, the problem does not come into play, since we do not lock a task; instead we await it, and when awaited, the underlying thread goes back to the tread pool.

So, I don't think that using SemaphoreSlim solves the issue.

@StephenCleary You mentioned that you have a final solution in mind. I'm looking forward for it and a little bit curious what kind of solution it is.

Havret · 2020-01-12T12:24:36Z

@StephenCleary How about this approach? Is it safe?

tmendez3 · 2020-04-21T14:58:01Z

Hey @StephenCleary, I just ran into this problem in production. Until a v5 fix is in place, do you recommend downgrading to v4 or increasing the thread pool minimum thread count? Also, any idea as to when the issue might be resolved in v5?

qstarin · 2020-05-04T19:40:51Z

@tmendez3 Whether or not increasing your thread pool minimum will help depends on your system.

I use AsyncLock mainly as part of a cache-aside implementation to keep multiple consumers from going to remote caches or system-of-record for the same cache key and I call through that code 500M times a day.

It doesn't matter what I set minimum threads to, when using v5 it will sporadically (between a few times a week and a few times a day) use them all up and as many more as it can create before the process gets recycled for being unresponsive. If that start happening on more than 1 of my VMs at a time it's a quick and slippery slope to cascading overload failure.

On v3, everything is peachy, been running this system for years with 99.995% reliability. I haven't really given v4 a shot, so I cannot comment on that. I also target .Net Framework v4.7.2 - it sounds like maybe Core does not exhibit this issue - of course I'm going through updating dependencies as part of an effort to target Core (no small task on an active 15 year old code base serving nearly a billion requests a month).

I do have two key prefixes in my system that can end up calling AsyncLock.Lock() and AsyncLock.LockAsync() from different places on the same AsyncLock object (every unique cache key gets its own AsyncLock) and I haven't yet tested if removing that overlap clears up our issues with v5 or if they persist.

StephenCleary · 2020-05-17T23:23:11Z

Thanks for that use case @qstarin.

I did de-prioritize this for quite a while due to concerns that fixing this wouldn't actually fix the underlying issue. Specifically, if your code is exhausting the thread pool, sure that would cause a deadlock (technically slow lock) with AsyncEx, but that same underlying problem of thread exhaustion would just cause other asynchronous code to have the same issue (severely delayed, deadlock-like behavior); this is because almost all I/O APIs require a free thread pool thread for completion.

However, I do want to fix this issue.

There are a couple of paths forward. One is the "fully correct" solution outlined above; this would require several consecutive days of work and considerably complicate the implementation. Unfortunately, my opportunity for "several consecutive days of work" on a hobby project isn't great. Historically, I've been able to do the "hard" work on AsyncEx at the Microsoft MVP conference. But 2020's MVP conference was virtual so I lost that opportunity, and next year's conference is also virtual.

The other solution is to essentially take the AsyncEx 4 solution. This would only require a weekend or so, not an entire week. Technically, this solution is incorrect because it assumes ExecuteSynchronously always executes synchronously. This is true the vast majority of the time but could be false if the stack is too deep at the point the lock is released. However, I think this is an acceptable drawback; in all the years of AsyncEx, no one has actually reported that as a real production issue. On the other hand, I have suspicions that the stack overflow detection heuristic (which never worked that well on .NET Framework) is more sensitive on .NET Core - and if that is true - this situation would happen more commonly on .NET Core. But I'm guessing that it wouldn't trigger this scenario often enough to cause this issue (which always happens with the current v5).

Well, there are no obviously correct answers. :) But I think that in the interest of resolving this issue for AsyncEx v6 and above (the fix would require a major version rev), I'm going to take a weekend (sometime...) and restore the v4-style completion behavior.

…e time The minimum number of threads in the .NET thread pool is increased to 4. This should avoid that an application hangs if a continuation cannot run, because there is no thread to execute it. Also see: StephenCleary/AsyncEx#107

milspiir · 2022-04-25T14:03:22Z

So any news on this?

qstarin · 2022-04-25T15:06:23Z

@milspiir Not that I've seen, and I'm still on v3.

I did remove any and all mixed sync/async locking on any AsyncLock objects and that made no difference in the deadlocking behavior observed in v5. I also tested v4 and found similar sporadic deadlocking.

A simple SemaphoreSlim lock in place of AsyncLock does not appear to deadlock, but performs quite badly. v3 appears to perform noticeably better in both uncontested and contested (especially contested) use than v4 & v5 - all of which perform vastly better than a simple SemaphoreSlim.

Technically, this solution is incorrect because it assumes ExecuteSynchronously always executes synchronously. This is true the vast majority of the time but could be false if the stack is too deep at the point the lock is released. However, I think this is an acceptable drawback; in all the years of AsyncEx, no one has actually reported that as a real production issue.

fwiw, @StephenCleary, I've had StackOverflowExceptions trigger with the default v3 behavior of resuming TaskCompletionSource's synchronously - it happens when I have close to 300 or more tasks waiting on the same lock object. In those cases, providing AsyncLock with an implementation of IAsyncWaitQueue that creates the TaskCompletionSource with the TaskCreationOptions.RunContinuationsAsynchronously option, but is otherwise the same as the wait queue in v3 of the library, avoids the deadlock - but also performs more poorly than the included implementation that continues synchronously.

StephenCleary · 2022-05-18T12:03:08Z

I'm still planning to eventually fix this, and I still haven't had time. As noted previously, my primary hacking time is during conferences, which haven't happened since covid hit.

BenJury · 2022-11-28T16:47:17Z

As an aside, we just got bit by this. Removing a background task (started with Task.Factory.StartNew()) 'fixes' the issue, although obviously has some significant drawbacks.

leus · 2023-08-23T19:32:37Z

Hello Stephen. I'm currently being hit by this (I think). Hope you get a conference soon :-)

StephenCleary self-assigned this Sep 10, 2017

StephenCleary mentioned this issue Sep 26, 2017

Release v4 and v5 #71

Closed

StephenCleary added this to the 5.0.0 milestone Dec 10, 2017

This was referenced Dec 10, 2017

What is pending official release #115

Closed

Feature request: SemaphoreSlim.LockAsync #89

Closed

StephenCleary mentioned this issue Apr 6, 2018

Question: Release Date? #130

Closed

njannink mentioned this issue Aug 13, 2018

Updated Nito lib LykkeCity/Lykke.HttpClientGenerator#6

Merged

StephenCleary mentioned this issue Oct 10, 2018

Question - Are the Microsoft.Bcl libraries now required? #75

Closed

jmhinnen mentioned this issue Feb 20, 2019

App hangs - OnlineClientManager and UserSettingsManager aspnetboilerplate/aspnetboilerplate#4303

Open

StephenCleary added the bug label Feb 25, 2019

stay2be mentioned this issue Nov 29, 2019

Slow performance on Linux abpframework/abp#2026

Closed

Havret mentioned this issue Jan 11, 2020

Use asynchronous signaling to invoke pending send operations when Producer recovered Havret/dotnet-activemq-artemis-client#29

Closed

StephenCleary mentioned this issue Sep 13, 2020

Single-threaded uses, especially of AsyncSemaphore and DefaultAsyncWaitQueue #218

Closed

markusschaber mentioned this issue Nov 29, 2021

High CPU Load for idle clusters akkadotnet/akka.net#5400

Closed

StephenCleary mentioned this issue May 18, 2022

Deadlock with AsyncReaderWriterLock #255

Open

markusschaber mentioned this issue Jun 29, 2022

Huge performance penalty when using async and sync methods together kpreisser/AsyncReaderWriterLockSlim#5

Closed

SergeiGorlovetsky mentioned this issue Jul 26, 2022

Lock in OnlineClientManager was flagged by Microsoft support as contributing to deadlock in high volume traffic scenarious aspnetboilerplate/aspnetboilerplate#6500

Closed

Another deadlock - or at least a very slow lock #107

Another deadlock - or at least a very slow lock #107

Comments

IanYates commented Sep 9, 2017

StephenCleary commented Sep 10, 2017

IanYates commented Sep 10, 2017

molinch commented Oct 16, 2017

StephenCleary commented Oct 16, 2017

molinch commented Oct 18, 2017 • edited Loading

StephenCleary commented Oct 18, 2017

LarsKemmann commented Oct 18, 2017

molinch commented Oct 18, 2017 • edited Loading

IanYates commented Oct 19, 2017

molinch commented Oct 19, 2017 • edited Loading

StephenCleary commented Oct 19, 2017

IanYates commented Oct 20, 2017

StephenCleary commented Oct 22, 2017

waynebrantley commented Dec 9, 2017

StephenCleary commented Dec 10, 2017

sjb-sjb commented Jan 8, 2018

StephenCleary commented Jan 8, 2018

sjb-sjb commented Jan 8, 2018

StephenCleary commented Jan 8, 2018

jam40jeff commented Jan 8, 2018 • edited Loading

jesuissur commented Jan 11, 2018

StephenCleary commented Jan 11, 2018

jesuissur commented Jan 11, 2018

StephenCleary commented Jan 11, 2018

ygoe commented Jun 28, 2018

StephenCleary commented Jun 29, 2018

ygoe commented Jun 30, 2018

StephenCleary commented Jul 21, 2018

Central Postulates:

AsyncEx Code

AsyncEx v4

Possible Solutions

Conclusion

ygoe commented Feb 10, 2019

StephenCleary commented Feb 10, 2019

ygoe commented Feb 10, 2019

peter-perot commented Aug 27, 2019 • edited Loading

Havret commented Jan 12, 2020

tmendez3 commented Apr 21, 2020

qstarin commented May 4, 2020 • edited Loading

StephenCleary commented May 17, 2020

milspiir commented Apr 25, 2022

qstarin commented Apr 25, 2022

StephenCleary commented May 18, 2022

BenJury commented Nov 28, 2022

leus commented Aug 23, 2023

molinch commented Oct 18, 2017 •

edited

Loading

molinch commented Oct 18, 2017 •

edited

Loading

molinch commented Oct 19, 2017 •

edited

Loading

jam40jeff commented Jan 8, 2018 •

edited

Loading

peter-perot commented Aug 27, 2019 •

edited

Loading

qstarin commented May 4, 2020 •

edited

Loading