-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Transaction deadlocks within a simple DB model #14371
Comments
The two workarounds are actually similar: when you remove the FK property StateItem.InspectionId, EF Core will create an FK property "InspectionId" in shadow-state. In the latter case, the FK will be nullable, and by convention the constraint will have ON DELETE NO ACTION. The real property isn't nullable, hence by convention the corresponding database constrain will be created with ON DELETE CASCADE. So this workaround also changes the cascade behavior, like the first one, but on a different constraint. |
Here are the contents of the xml_deadlock_report with MaxBatchSize(1): <deadlock>
<victim-list>
<victimProcess id="process1d7a8c24ca8" />
</victim-list>
<process-list>
<process id="process1d7a8c24ca8" taskpriority="0" logused="2872" waitresource="KEY: 8:72057594043301888 (9dd6aff0b38d)" waittime="3969" ownerId="31287" transactionname="user_transaction" lasttranstarted="2019-01-10T16:48:02.680" XDES="0x1d7b59d8490" lockMode="RangeS-U" schedulerid="8" kpid="26864" status="suspended" spid="60" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2019-01-10T16:48:02.740" lastbatchcompleted="2019-01-10T16:48:02.740" lastattention="1900-01-01T00:00:00.740" clientapp="Core .Net SqlClient Data Provider" hostname="DIVEGA13" hostpid="1714060944" loginname="REDMOND\divega" isolationlevel="read committed (2)" xactid="31287" currentdb="8" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="2" stmtstart="58" stmtend="138" sqlhandle="0x02000000e0100005ffcfde2f1eba4da32aefa2075a13a1840000000000000000000000000000000000000000">
unknown </frame>
<frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
(@p0 bigint)SET NOCOUNT ON;
DELETE FROM [StateItem]
WHERE [Id] = @p0;
SELECT @@ROWCOUNT;
</inputbuf>
</process>
<process id="process1d7b5047c28" taskpriority="0" logused="3140" waitresource="KEY: 8:72057594043301888 (ef8398b4b071)" waittime="3969" ownerId="31285" transactionname="user_transaction" lasttranstarted="2019-01-10T16:48:02.680" XDES="0x1d7a5ce8490" lockMode="RangeS-U" schedulerid="6" kpid="26596" status="suspended" spid="59" sbid="0" ecid="0" priority="0" trancount="2" lastbatchstarted="2019-01-10T16:48:02.743" lastbatchcompleted="2019-01-10T16:48:02.743" lastattention="1900-01-01T00:00:00.743" clientapp="Core .Net SqlClient Data Provider" hostname="DIVEGA13" hostpid="1714060944" loginname="REDMOND\divega" isolationlevel="read committed (2)" xactid="31285" currentdb="8" lockTimeout="4294967295" clientoption1="673185824" clientoption2="128056">
<executionStack>
<frame procname="adhoc" line="2" stmtstart="58" stmtend="138" sqlhandle="0x02000000e0100005ffcfde2f1eba4da32aefa2075a13a1840000000000000000000000000000000000000000">
unknown </frame>
<frame procname="unknown" line="1" sqlhandle="0x0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000">
unknown </frame>
</executionStack>
<inputbuf>
(@p0 bigint)SET NOCOUNT ON;
DELETE FROM [StateItem]
WHERE [Id] = @p0;
SELECT @@ROWCOUNT;
</inputbuf>
</process>
</process-list>
<resource-list>
<keylock hobtid="72057594043301888" dbid="8" objectname="DeadlockDatabase.dbo.StateDetail" indexname="IX_StateDetail_VehicleStateItemId" id="lock1d7b56d1000" mode="RangeX-X" associatedObjectId="72057594043301888">
<owner-list>
<owner id="process1d7b5047c28" mode="RangeX-X" />
</owner-list>
<waiter-list>
<waiter id="process1d7a8c24ca8" mode="RangeS-U" requestType="wait" />
</waiter-list>
</keylock>
<keylock hobtid="72057594043301888" dbid="8" objectname="DeadlockDatabase.dbo.StateDetail" indexname="IX_StateDetail_VehicleStateItemId" id="lock1d7b56d1080" mode="X" associatedObjectId="72057594043301888">
<owner-list>
<owner id="process1d7a8c24ca8" mode="X" />
</owner-list>
<waiter-list>
<waiter id="process1d7b5047c28" mode="RangeS-U" requestType="wait" />
</waiter-list>
</keylock>
</resource-list>
</deadlock>
|
Note for triage: I have tried several things that I have learned in the past help workaround deadlocks, and even a few long shots:
Nothing helped: as long as the there are two levels of cascade deletes on Inspection --> StateItem --> StateDetail, if two concurrent sessions try to delete two different StateItems, a deadlock ends up occurring at the point the two sessions try to get access to StateDetail with different types of locks, presumably to apply the cascade delete. At this point this looks like a limitation on SQL Server. Here are a few things I would like to discuss in triage:
|
@divega to follow up with @AndriySvyryd and to reach out to the SQL Server team. |
I changed the repro with some simplifications and to introduce some instrumentation/console outputs, and I was able to observe that the deadlock occurs with almost perfect regularity every other iteration. This lead me to a new hypothesis: that the deadlock could be actually involve contention between a task currently running and locks held by the homologous task from the previous iteration. This could be the case if for example, the transaction was actually completing asynchronously and resources are still being held even if the SqlClient transaction was completed and the connection closed. I run some tests that seem to support (or at least not refute) this hypothesis: I introduced two new root entities and I changed the program so that each even numbered iteration to affect roots 1 and 2, and every odd numbered iteration affects roots 2 and 3. This reduced the number of deadlocks significantly (1 every 100 iterations), although the number of transactions is the same. A few more observations:
Here is the simplified repro (past this in a console application): using Microsoft.EntityFrameworkCore;
using System;
using System.Collections.Generic;
using System.ComponentModel.DataAnnotations;
using System.Linq;
using System.Threading.Tasks;
namespace DeadlockRepro
{
public class Blog
{
public int BlogId { get; set; }
public ICollection<Post> Posts { get; } = new HashSet<Post>();
}
public class Post
{
public int PostId { get; set; }
// Workaround 1: comment out this required attribute
[Required]
public Blog Blog { get; set; }
public ICollection<Comment> Comments { get; } = new HashSet<Comment>();
}
public class Comment
{
public int CommentId { get; set; }
// Workaround 2: comment out this required attribute
[Required]
public Post Post { get; set; }
}
public class DeadlockContext : DbContext
{
public DbSet<Blog> Blogs { get; set; }
protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
{
// disabling pooling reduces number of dealocks significantly
optionsBuilder.UseSqlServer(
@"Server=(localdb)\mssqllocaldb;Database=DeadlockDatabase;Trusted_Connection=True;ConnectRetryCount=0;Pooling=true;",
options => options.MaxBatchSize(1));
}
}
public class Program
{
static void Main(string[] args)
{
using (var db = new DeadlockContext())
{
db.Database.EnsureDeleted();
db.Database.EnsureCreated();
db.Blogs.Add(new Blog());
db.Blogs.Add(new Blog());
db.Blogs.Add(new Blog());
db.Blogs.Add(new Blog());
db.SaveChanges();
}
for (int i = 0; i < 100; i++)
{
try
{
var baseline = 0;
// uncomment to alternate between blogs 1, 2 and 2, 3 for each iteration
// baseline = i % 2 == 0 ? 0 : 2;
var t1 = CauseDeadlock(baseline + 1);
var t2 = CauseDeadlock(baseline + 2);
Task.WaitAll(t1, t2);
Console.WriteLine($"Iteration {i} completed successfully.");
}
catch (AggregateException ex)
{
Console.WriteLine($"Ìteration {i} failed with {ex?.InnerException?.InnerException?.InnerException.Message}");
}
}
}
private static async Task CauseDeadlock(int id)
{
using (var db = new DeadlockContext())
{
var blog =
db.Blogs
.Include(b => b.Posts)
.ThenInclude(p => p.Comments)
.First(b => b.BlogId == id);
foreach (var post in blog.Posts.ToList())
{
blog.Posts.Remove(post);
}
blog.Posts.Add(new Post { Comments = { new Comment() } });
blog.Posts.Add(new Post { Comments = { new Comment() } });
blog.Posts.Add(new Post { Comments = { new Comment() } });
await db.SaveChangesAsync();
}
}
}
} |
@AndriySvyryd could check with the latest repro if you can find anything I could be missing? |
I think this is the same issue we had previously when we didn't have a deterministic order for SaveChanges. When it deadlocks the new posts and comments are being inserted before the old ones are deleted. I think we don't have more reports of this because the entities need to be queried in this specific way to produce this change application order. We can probably fix it by adding an explicit dependency between deletes and inserts to the same table when applying the topological sort. |
Oh, I thought we had already done that! 😮 Thanks @AndriySvyryd! Clearing up milestone to discuss priority in the next triage. |
@divega Currently we only do that in the second order sort |
…cks. This dependency isn't hard, so cycle breaking is introduced. Make the implementations of Multigraph TopologicalSort and BatchingTopologicalSort more alike and faster Fixes #14371
Hello all. I'm facing an extremely weird issue with EF Core 2.2 (also briefly tested against 2.0). A little bit of backstory:
We received a bug report from production for a project I'm currently working on. After investigation, I discovered the application is sometimes encountering transaction deadlocks, given certain specific conditions. I was trying to figure out a root cause for over a day, unsuccessfully.
Then I decided to create a minimal repro of the problem, which lead me here. Here's the situation:
Environment:
Windows 10 1809 x64,
SQL Server Express 2017 with latest patches
EF Core 2.2 on .NET Core 2.2
Technical details
I have a very simple DB model. It's basically a 3-level tree structure, with individual roots being disjunct (they don't share any entities). We modify the trees in transactions, using
ReadCommited
isolation level. The error occurs when our ASP.NET Core web app receives two requests to modify two different trees AND the transactions run for over 600ms. The deadlock only occurs if we delete some entities from the tree.Under these conditions, the two operations sometime cause a deadlock, resulting in an exception and one request failing. But since these trees share no content, operations on the trees should not, in my opinion, produce any deadlocks.
I managed to reduce the problem into a unit test project, which I'm attaching. I even managed to find a work-around of a sort. It turns out if I switch one of the foreign keys in the model from
Cascade
toRestrict
(and then manually delete the child entities), the deadlocks stop happening.This is mentioned in the
Solution 2
comment in fileDatabaseContext.cs
in my demo.Another, even weirder solution, is to comment out a property of an entity. The code then works without deadlocks even with
Cascade
deletes. This is in fileStateItem.cs
as Solution 1.I see no reason why any of the workarounds should prevent a database deadlock. Could someone please have a look at the code and try to explain what's going on in there? As I mentioned, this came from production and I'd like to resolve it sooner rather than later, so any insight is greatly appreciated.
The repro code
DeadlockRepro.zip
The text was updated successfully, but these errors were encountered: