-
Notifications
You must be signed in to change notification settings - Fork 292
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Deadlock when using SinglePhaseCommit with distributed transactions #1800
Comments
@roji thanks. We will resume back on the issues on Tuesday. Monday is a holiday in Canada. |
I did a bit more investigation into this. First, here's a minimal repro console app - target net7.0 with SDK 7.0.100-rc.1.22431.12 or later. As you can see it's very trivial - anyone using distributed transactions with .NET 7.0 is likely to hit it: while (true)
{
using var transaction = new CommittableTransaction();
// Uncommenting the following makes the deadlock go away as a workaround. If the transaction is promoted before
// the first SqlClient enlistment, it never goes into the delegated state.
// _ = TransactionInterop.GetTransmitterPropagationToken(transaction);
await using var conn = new SqlConnection(...);
await conn.OpenAsync();
conn.EnlistTransaction(transaction);
// Enlisting the second transaction causes causes the transaction to be promoted.
// After this, the transaction state will be "delegated" (delegated to SQL Server), and the commit below will
// trigger a call to SqlDelegatedTransaction.SinglePhaseCommit.
await using var conn2 = new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false");
await conn2.OpenAsync();
conn2.EnlistTransaction(transaction);
Console.WriteLine("Committing transaction...");
transaction.Commit();
Console.WriteLine("Transaction committed.");
} The deadlock was introduced in #1042, when the lock scope in SinglePhaseCommit was extended to include the call to Note that the netfx version of SqlDelegatedTransaction doesn't include this change (and so neither did System.Data.SqlClient), which explains why this deadlock didn't occur with .NET Framework. |
One additional note on this... SqlClient currently calls CleanupConnectionOnTransactionCompletion twice when a transaction completes:
Removing one of these should make the deadlock go away. Alternatively, detecting inside that cleanup has already occured - and specifically that the transaction has already been detached - would allow skipping taking the lock, and again resolve the deadlock. This may be safer than reducing the scope of the connection lock in SinglePhaseCommit as indicated above (although I don't have the full picture here). |
Hi @roji Thanks for your findings! I'm a little occupied this week, and haven't got chance to dig deep, yet. I hope you have already, but if not, I'd like to confirm #237 doesn't reoccur (fixed by #543 - found cause here) as that's the main cause we had to limit enlistment for externally Aborted transactions. I'll try getting back to you in a day or two! |
Hi @cheenamalhotra, please could we have an update on a rough timescale for this? |
Hello @cheenamalhotra . |
Very sorry for being caught up for longer than I thought.. I hope this would be fixed soon so you can be unblocked! cc @David-Engel |
In .NET 7.0, the OleTx/MSDTC distributed transactions support has been ported over from .NET Framework (dotnet/runtime#715). This means that it will now be possible to use distributed transactions with .NET 7.0, on Windows only.
We've received a bug report (dotnet/runtime#76010) with two separate console programs, enlisting which propagate a transaction. The initiating program (the "outer" transaction) enlists only a single SqlConnection to the transaction, and hangs when the transaction is committed. This happens quite consistently, but not 100% reliably. To reproduce this, simply run the .NET 7 version of the application in dotnet/runtime#76010 in two separate windows; the first one should hang.
When the program hangs, two threads are stuck with the following stack traces:
Stack 1
Stack 2:
Here is the general flow leading up to this state:
When the TransactionScope is disposed, we get to
SqlDelegatedTransaction.SinglePhaseCommit()
, which is the SqlClient part that interacts with System.Transactions.a.
SqlDelegatedTransaction.SinglePhaseCommit()
locksconnection
(the SqlInternalConnectionTds)b. It then sends Commit to SQL Server. Since the transaction is delegated, the triggers the commit with MSDTC, which causes commit notifications to be sent to the enlisted parties. This includes the running program, which starts thread 2 below running concurrently.
b. Finally, it proceeds to call
enlistment.Committed()
, which isSinglePhaseEnlistment.Committed()
, while keeping the lock on SqlInternalConnectionTds. That method then takes_internalEnlistment.SyncRoot
, which is the InternalTransaction.The above commit causes us to get a notification from the native layer (MSDTC) that the transaction committed (triggered above in 1b).
a. We get to
InternalTransaction.DistributedTransactionOutcome()
, which locks the InternalTransaction.b. We then get to
DbConnectionInternal.DetachTransaction()
(DbConnectionInternal is registered as a listener on the Transaction.TransactionCompleted event), which attempts to lockthis
(the SqlInternalConnectionTds).So thread 1 locks the SqlInternalConnectionTds and then the InternalTransaction, while thread 2 - which runs concurrently - locks InternalTransaction and then SqlInternalConnectionTds. This produces a deadlock.
I'm not an expert in this code, but moving the call to
enlistment.Committed()
(code) outside of the lock could be a way to resolve this deadlock.Note that I couldn't reproduce the deadlock in .NET Framework; there's likely some timing differences which make the deadlock manifest far more frequently on .NET Core, but AFAICT the bug is in Framework as well. Note that this repro uses SinglePhaseCommit - only one connection is enlisted to the TransactionScope in the application. I recommend also checking having two connections to force 2PC - the same bug could be present in that flow as well.
Many thanks to @nathangreaves for providing the original repro code.
/cc @ajcvickers
The text was updated successfully, but these errors were encountered: