Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ContentStore locking exceptions in async code #17246

Merged
merged 4 commits into from
Oct 11, 2024

Conversation

ronaldbarendse
Copy link
Contributor

Prerequisites

  • I have added steps to test this contribution in the description below

Description

After large Deploy operations (like an environment restore) has completed and the CMS tries to update the content cache (NuCache) as part of completing the Umbraco scope, the following exceptions were thrown:

System.AggregateException: One or more errors occurred. (Write lock must be acquried.) (Exceptions were thrown by listed actions. (Object synchronization method was called from an unsynchronized block of code.) (Object synchronization method was called from an unsynchronized block of code.) (Object synchronization method was called from an unsynchronized block of code.))
   at Umbraco.Cms.Infrastructure.Scoping.Scope.TryFinally(Action[] actions)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.RobustExit(Boolean completed, Boolean onException)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.DisposeLastScope()
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Deploy.Infrastructure.Core.AltScope.Dispose()
   at Umbraco.Deploy.Infrastructure.Work.WorkItems.SessionWorkItem.<>c__DisplayClass24_0.<Complete>b__0()
...
 System.InvalidOperationException: Write lock must be acquried.
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.EnsureLocked()
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.SetLocked(ContentNodeKit kit)
   at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.NotifyLocked(IEnumerable`1 payloads, Boolean& draftChanged, Boolean& publishedChanged)
   at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.Notify(JsonPayload[] payloads, Boolean& draftChanged, Boolean& publishedChanged)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.NotifyPublishedSnapshotService(IPublishedSnapshotService service, AppCaches appCaches, JsonPayload[] payloads)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.Refresh(JsonPayload[] payloads)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.DeliverLocal[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.Deliver[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.QueueRefresh[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, IEnumerable`1 payloads)
   at Umbraco.Extensions.DistributedCacheExtensions.RefreshContentCache(DistributedCache dc, IEnumerable`1 changes)
   at Umbraco.Cms.Core.Cache.ContentTreeChangeDistributedCacheNotificationHandler.Handle(IEnumerable`1 entities)
   at Umbraco.Cms.Core.Cache.DistributedCacheNotificationHandlerBase`2.Handle(IEnumerable`1 notifications)
System.AggregateException: One or more errors occurred. (Could not enter monitor before timeout in content store)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.TryFinally(Action[] actions)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.RobustExit(Boolean completed, Boolean onException)
   at Umbraco.Cms.Infrastructure.Scoping.Scope.DisposeLastScope()
   at Umbraco.Cms.Infrastructure.Scoping.Scope.Dispose()
   at Umbraco.Deploy.Infrastructure.Core.AltScope.Dispose()
   at Umbraco.Deploy.Infrastructure.Work.WorkItems.SessionWorkItem.<>c__DisplayClass24_0.<Complete>b__0()
...
System.TimeoutException: Could not enter monitor before timeout in content store
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.Lock(WriteLockInfo lockInfo, Boolean forceGen)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.ScopedWriteLock..ctor(ContentStore store, Boolean scoped)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.<GetScopedWriteLock>b__29_0(Boolean scoped)
   at Umbraco.Cms.Core.Scoping.ScopeContextualBase.<>c__DisplayClass2_0`1.<Get>b__0()
   at Umbraco.Cms.Core.Scoping.ScopeContext.Enlist[T](String key, Func`1 creator, Action`2 action, Int32 priority)
   at Umbraco.Cms.Core.Scoping.ScopeContextualBase.Get[T](ICoreScopeProvider scopeProvider, String key, Func`2 ctor)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.GetScopedWriteLock(ICoreScopeProvider scopeProvider)
   at Umbraco.Cms.Infrastructure.PublishedCache.PublishedSnapshotService.Notify(JsonPayload[] payloads, Boolean& draftChanged, Boolean& publishedChanged)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.NotifyPublishedSnapshotService(IPublishedSnapshotService service, AppCaches appCaches, JsonPayload[] payloads)
   at Umbraco.Cms.Core.Cache.ContentCacheRefresher.Refresh(JsonPayload[] payloads)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.DeliverLocal[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.Deliver[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Infrastructure.Sync.ServerMessengerBase.QueueRefresh[TPayload](ICacheRefresher refresher, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, TPayload[] payload)
   at Umbraco.Cms.Core.Cache.DistributedCache.RefreshByPayload[TPayload](Guid refresherGuid, IEnumerable`1 payloads)
   at Umbraco.Extensions.DistributedCacheExtensions.RefreshContentCache(DistributedCache dc, IEnumerable`1 changes)
   at Umbraco.Cms.Core.Cache.ContentTreeChangeDistributedCacheNotificationHandler.Handle(IEnumerable`1 entities)
   at Umbraco.Cms.Core.Cache.DistributedCacheNotificationHandlerBase`2.Handle(IEnumerable`1 notifications)

This all originates from the ContentStore either not 'seeing' it already has a write lock or getting a timeout when trying to get it (essentially deadlocking). Looking at the locking implementation in this class, it uses Monitor, which has the following requirement:

Because the Monitor class has thread affinity, the thread that acquired a lock must release the lock by calling the Monitor.Exit method.

But looking at the code and stack trace, the lock/monitor is entered and exited in different methods and they can potentially be called from different threads, which is especially likely if it contains async calls in-between them (Deploy heavily uses async and does more work in a single lock, further increasing the chances of exiting on a different thread).

I've added a test showcasing one of the exceptions and will push a fix that uses SemaphoreSlim.

@ronaldbarendse
Copy link
Contributor Author

As expected, the test without fix results in this exception:

Failed SetLocked [462 ms]
  Error Message:
   System.Threading.SynchronizationLockException : Object synchronization method was called from an unsynchronized block of code.
  Stack Trace:
     at System.Threading.Monitor.Exit(Object obj)
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.Release(WriteLockInfo lockInfo, Boolean commit) in /_/src/Umbraco.PublishedCache.NuCache/ContentStore.cs:line 412
   at Umbraco.Cms.Infrastructure.PublishedCache.ContentStore.ScopedWriteLock.Release(Boolean completed) in /_/src/Umbraco.PublishedCache.NuCache/ContentStore.cs:line 311
   at Umbraco.Cms.Core.Scoping.ScopeContextualBase.Dispose() in /_/src/Umbraco.Infrastructure/Scoping/ScopeContextualBase.cs:line 76
   at Umbraco.Cms.Tests.Integration.Umbraco.Infrastructure.PublishedCache.ContentCacheTests.SetLocked() in /_/tests/Umbraco.Tests.Integration/Umbraco.Infrastructure/PublishedCache/ContentCacheTests.cs:line 75

I've applied the same change in SnapDictionary (which has more locking tests in SnapDictionaryTests that all still pass). I also noticed an unused MonitorLock class and obsoleted it.

The only remaining usage of Monitor that can potentially enter/exit on different threads is in HttpContextRequestAppCache:

protected override void EnterReadLock()
{
var locker = GetLock();
if (locker == null)
{
return;
}
Monitor.Enter(locker);
}
protected override void EnterWriteLock()
{
var locker = GetLock();
if (locker == null)
{
return;
}
Monitor.Enter(locker);
}
protected override void ExitReadLock()
{
var locker = GetLock();
if (locker == null)
{
return;
}
if (Monitor.IsEntered(locker))
{
Monitor.Exit(locker);
}
}
protected override void ExitWriteLock()
{
var locker = GetLock();
if (locker == null)
{
return;
}
if (Monitor.IsEntered(locker))
{
Monitor.Exit(locker);
}
}

ExamineIndexRebuilder also uses Monitor, but always exits within the same method and therefore doesn't have the same thread synchronization issues.

Copy link
Member

@bergmania bergmania left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code looks good to me

@bergmania bergmania merged commit c3db345 into v13/dev Oct 11, 2024
12 of 15 checks passed
@bergmania bergmania deleted the v13/bugfix/contentstore-locking branch October 11, 2024 07:45
bergmania pushed a commit that referenced this pull request Oct 11, 2024
* Add ContentCache test

* Use SemaphoreSlim as write lock

* Apply lock imrpovements to SnapDictionary

* Obsolete unused MonitorLock

(cherry picked from commit c3db345)
@ronaldbarendse
Copy link
Contributor Author

Awesome, let's verify whether this not only fixes the test, but also the runtime exceptions!👍🏻

Looking more into HttpContextRequestAppCache, I can see the locks are all entered/exited within the same methods (and thus same threads): it's just that the EnterReadLock()/ExitReadLock() and EnterWriteLock()/ExitWriteLock() are protected methods and can potentially be called incorrectly, but that's currently not the case...

bergmania pushed a commit that referenced this pull request Oct 16, 2024
* Add ContentCache test

* Use SemaphoreSlim as write lock

* Apply lock imrpovements to SnapDictionary

* Obsolete unused MonitorLock

(cherry picked from commit c3db345)
@sauron
Copy link

sauron commented Nov 11, 2024

Hello, I wanted to touch base with you all, because, since this was applied automatically by Umbraco Cloud,
we are getting the (Recursive locks not allowed) on tasks we didn't have before while doing simple things,
like moving nodes between folders
Screenshot 2024-11-11 at 11 14 00

or while running recurring Jobs that pulls from APIs and store locally.
The RecurringJob is a simple request to an API then insert or update into Umbraco under a folder like:

var results = "pull from API and parse with the Proper class"
foreach (var item in results)
{
    IContent dbNode = null;
    IPublishedContent? found = _publishedContentQuery.Content("<parentFOlderID>").Children().Where(x => x.Value<string>("externalId") == item.Id.Trim()).FirstOrDefault();
    if(found != null){
        dbNode = ContentService.GetById(found.Id);
    } else {
        dbNode = ContentService.Create(track.Title, int.Parse(parentId), "<nodeType>");
    }
    // Set Values
    dbNode.SetValue("...." , item.Id);
    // .....

    try
    {
        ContentService.SaveAndPublish(dbNode);
    }
    catch (AggregateException ae)
    {
        Console.WriteLine("Caught aggregate exception behavior"+ae.Message);
    }
}

While using try/catch avoid raising the error, we are wondering if there is a way to "release" the node being updated/inserted or a different approach we should take to run this recurring jobs to avoid this error.

For moving content between folders, we are "workarounding" it by moving smaller groups of content.

Is this the intended behavoir? Should I open a new issue?

@ronaldbarendse
Copy link
Contributor Author

Hello, I wanted to touch base with you all, because, since this was applied automatically by Umbraco Cloud,
we are getting the (Recursive locks not allowed) on tasks we didn't have before while doing simple things,

Looking at the stack trace, the error happens in a notification handler (as notifications are published in the TryFinally() method after completing/disposing the scope). Do you have any custom handlers registered that trigger when moving content (like ContentSavedNotification or ContentMovedNotification)? And the code that's running in the recurring job, is that creating its own scope before creating/saving content, as is shown in the documentation?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants