Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock in WriteToServerAsync (possible reentrancy issue) #755

Closed
coderb opened this issue Oct 9, 2020 · 11 comments
Closed

Deadlock in WriteToServerAsync (possible reentrancy issue) #755

coderb opened this issue Oct 9, 2020 · 11 comments

Comments

@coderb
Copy link
Contributor

coderb commented Oct 9, 2020

(version nuget 4.7.0)

I'm not sure sure if you'll consider this a bug or by design, but there seems to be an inconsistency between the sync and async versions of bulk copy- WriteToServer and WriteToServerAsync.

When executing inside a transaction, the sync version WriteToServer allows for commands to be executed during the call to WriteToServer but prior to the first bulk inserted row.

When using the async version WriteToServerAsync if you try to execute a command even prior to the first row being sent it results in a deadlock (waiting on a semphore).

Unfortunately I'm porting a large amount of code to use async and this is going to be a big job to address. Is this something that you would consider addresing in SqlClient?

@karinazhou
Copy link
Member

Hi @coderb ,
Do you have MARS enabled in your connection string when doing bulk copy? If MARS is enabled for your case, could you try to disable it to see how things are going?

We got a relevant issue #716 about WriteToServerAsync. And we are currently investigating it.

I notice that you are using nuget 4.7.0 which I think is the System.Data.SqlClient driver. Could you also try your application with Microsoft.Data.SqlClient v2.0.1 (latest stable version)? The source code for both drivers is slightly different and more fixes are applied to MDS.

And if possible, could you provide us a standalone repro application?

Thanks,

@coderb
Copy link
Contributor Author

coderb commented Oct 9, 2020

@karinazhou hi karina- nope, no MARS, in fact i didn't even know it existed.

so the actual version i'm using the is latest stable nuget 2.01. i misspoke in the original post.

do you consider what i am describing to be a bug or (assuming what i'm seeing is accurate) is it by design?

thanks

@karinazhou
Copy link
Member

@coderb It sounds like a bug in the driver if the same code works for WriteToServer but fails for WriteToServerAsync due to deadlock. In order to understand what your application is doing and see what triggers the deadlock in the driver, it would be helpful with the reproducible code if you can provide.

Thanks,

@coderb
Copy link
Contributor Author

coderb commented Oct 16, 2020

@karinazhou please see the reproduction at https://github.com/coderb/SqlClientBulkDeadlock

@karinazhou
Copy link
Member

@coderb Thank you for the repro. We will test it locally.

@karinazhou
Copy link
Member

Hi @coderb ,

I tried your repro app. Actually, the deadlock is not inside WriteToServerAsync(). If you comment out this piece of code:

            var sql = "select count(*) from " + TableName;
            var transConnection = transTransaction.Connection;
            var command1 = transConnection.CreateCommand();
            command1.Transaction = transTransaction;
            command1.CommandText = sql;
            int? existingCount = Unwrap<int>(await command1.ExecuteScalarAsync());

the RunAsync() works fine. The deadlock happens when command1.ExecuteScalarAsync() is called. I need to look deeper into this to see why this causes the deadlock.

Thanks,

@coderb
Copy link
Contributor Author

coderb commented Oct 27, 2020

@karinazhou the code you are commenting out is called as a result of the call to WriteToServerAsync.

@karinazhou
Copy link
Member

Hi @coderb ,

You are right. SqlBulkCopy.WriteToServerAsync() requires the result returned by SqlCommand.ExecuteScalarAsync() in this case. I did more tests with your repro app:

  1. I tested it targeting .NET Framework after modifying a bit the code, it falls into the same deadlock as what we see in .NET Core.
  2. Instead of using a single transaction, I pass a single connection to both SqlBulkCopy and SqlCommand. I can see the same deadlock happening.
  3. I tried with two separate connections, one of which is for SqlBulkCopy and the other is for SqlCommand. Both connections are using the same connection string. This can avoid the deadlock and the application executes without issue.

Let me explain why the deadlock happens in an asynchronous scenario. For each SqlConnection, there is an internal SyncAsyncLock object (_parserLock) which has a SemaphoreSlim object. When the SqlBulkCopy.WriteToServerAsync() is called, we hit the SyncAsyncLock.Wait() method with canReleaseFromAnyThread = true.

https://github.com/dotnet/SqlClient/blob/master/src/Microsoft.Data.SqlClient/netcore/src/Microsoft/Data/SqlClient/SqlInternalConnectionTds.cs#L321

internal void Wait(bool canReleaseFromAnyThread)
{
      Monitor.Enter(_semaphore); // semaphore is used as lock object, no relation to SemaphoreSlim.Wait/Release methods
      if (canReleaseFromAnyThread || _semaphore.CurrentCount == 0)
      {
          _semaphore.Wait();
          if (canReleaseFromAnyThread)
          {
              Monitor.Exit(_semaphore);
          }
          else
          {
              _semaphore.Release();
          }
      }
}

image

At this point, the worker thread takes the semaphore but won't release it since it needs the result from SqlCommand.ExecuteScalarAsync(). Let the program runs and it hit the SyncAsyncLock.Wait() again when executing SQLBatch:

image

You can see from the parallel stack, the second wait happens on the same thread which has already taken the semaphore. When it hits _semaphore.Wait(), it will deadlock.

For the synchronous scenario, it doesn't deadlock because the _semaphore.Wait(); is never called when canReleaseFromAnyThread = false.

@coderb
Copy link
Contributor Author

coderb commented Oct 30, 2020

so it does seem like this is a reentrancy issue where the database thread is trying to wait() on a semaphore that it has already acquired. is this something that can be fixed in the sqlclient?

i don't think doing the ExecuteScalarAsync from a different connection is an option since i'm trying to execute from inside a transaction.

@karinazhou
Copy link
Member

@coderb We have a PR to address the deadlock in your case. Before we can merge that, we need to make sure that there won't be any potential impact on the common scenarios where different connections are used for the data source and SqlBulkCopy instance.

@cheenamalhotra
Copy link
Member

Closing issue as PR #779 has been merged.
Fix will be available in v2.1.0 stable release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants