-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Losing connection to sql server database is not always causing the lock to be lost #133
Comments
@OskarKlintrot thanks for your interest in the library. A few things to look into:
|
.AddSingleton<IDistributedLockProvider>(provider =>
new SqlDistributedSynchronizationProvider(provider.GetRequiredService<IConfiguration>().GetConnectionString(connectionStringName))
)
I'll update my original post with your feedback. |
Here's the log after 5-ish restarts: 02:17:52 info: Microsoft.Hosting.Lifetime[0] Application started. Press Ctrl+C to shut down.
02:17:52 info: Microsoft.Hosting.Lifetime[0] Hosting environment: Development
02:17:52 info: Microsoft.Hosting.Lifetime[0] Content root path: --------------------
02:17:53 info: Samples.Job1[0] Acquired cancelable lock.
02:17:53 info: Samples.Job1[0] Doing stuff.
02:18:23 info: Samples.Job1[0] Cancellation is not requested.
02:18:23 info: Samples.Job1[0] Doing stuff.
02:18:23 info: Samples.Job1[0] Cancellation is not requested.
02:18:23 fail: Samples.Job2[0] Exception Microsoft.Data.SqlClient.SqlException thrown.
02:18:23 fail: Samples.Job1[0] Exception System.InvalidOperationException thrown.
02:18:23 info: Samples.Job1[0] Will try to acquire lock.
02:18:23 info: Samples.Job2[0] Will try to acquire lock.
02:18:31 info: Samples.Job2[0] Acquired cancelable lock.
02:18:31 info: Samples.Job2[0] Doing stuff.
02:18:41 info: Samples.Job2[0] Cancellation is not requested.
02:18:41 fail: Samples.Job1[0] Exception Microsoft.Data.SqlClient.SqlException thrown.
02:18:42 info: Samples.Job1[0] Will try to acquire lock.
02:18:42 fail: Samples.Job2[0] Exception System.InvalidOperationException thrown.
02:18:42 info: Samples.Job2[0] Will try to acquire lock.
02:18:50 info: Samples.Job2[0] Acquired cancelable lock.
02:18:50 info: Samples.Job2[0] Doing stuff.
02:18:59 info: Samples.Job2[0] Cancellation is not requested.
02:18:59 fail: Samples.Job1[0] Exception Microsoft.Data.SqlClient.SqlException thrown.
02:18:59 info: Samples.Job1[0] Will try to acquire lock.
02:18:59 fail: Samples.Job2[0] Exception System.InvalidOperationException thrown.
02:18:59 info: Samples.Job2[0] Will try to acquire lock.
02:19:07 info: Samples.Job2[0] Acquired cancelable lock.
02:19:07 info: Samples.Job2[0] Doing stuff.
02:19:18 info: Samples.Job2[0] Cancellation is not requested.
02:19:18 fail: Samples.Job1[0] Exception Microsoft.Data.SqlClient.SqlException thrown.
02:19:18 info: Samples.Job1[0] Will try to acquire lock.
02:19:18 fail: Samples.Job2[0] Exception System.InvalidOperationException thrown.
02:19:18 info: Samples.Job2[0] Will try to acquire lock.
02:19:26 info: Samples.Job1[0] Acquired cancelable lock.
02:19:26 info: Samples.Job1[0] Doing stuff.
02:19:39 fail: Samples.Job2[0] Exception Microsoft.Data.SqlClient.SqlException thrown.
02:19:39 info: Samples.Job2[0] Will try to acquire lock.
02:19:48 info: Samples.Job2[0] Acquired cancelable lock.
02:19:48 info: Samples.Job2[0] Doing stuff.
02:19:56 info: Samples.Job1[0] Cancellation is not requested.
02:19:56 info: Samples.Job1[0] Doing stuff.
02:20:18 info: Samples.Job2[0] Cancellation is not requested.
02:20:18 info: Samples.Job2[0] Doing stuff. |
Thanks for the updates @OskarKlintrot ! Just a few more questions:
|
03:00:18 fail: Sample.App.Job2[0] Exception Microsoft.Data.SqlClient.SqlException thrown:
Microsoft.Data.SqlClient.SqlException (0x80131904): Cannot continue the execution because the session is in the kill state. A severe error occurred on the current command. The results, if any, should be discarded.
at Microsoft.Data.SqlClient.SqlConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
at Microsoft.Data.SqlClient.TdsParser.TryRun(RunBehavior runBehavior, SqlCommand cmdHandler, SqlDataReader dataStream, BulkCopySimpleResultSet bulkCopyHandler, TdsParserStateObject stateObj, Boolean& dataReady)
at Microsoft.Data.SqlClient.SqlCommand.FinishExecuteReader(SqlDataReader ds, RunBehavior runBehavior, String resetOptionsString, Boolean isInternal, Boolean forDescribeParameterEncryption, Boolean shouldCacheForAlwaysEncrypted)
at Microsoft.Data.SqlClient.SqlCommand.CompleteAsyncExecuteReader(Boolean isInternal, Boolean forDescribeParameterEncryption)
at Microsoft.Data.SqlClient.SqlCommand.InternalEndExecuteNonQuery(IAsyncResult asyncResult, Boolean isInternal, String endMethod)
at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryInternal(IAsyncResult asyncResult)
at Microsoft.Data.SqlClient.SqlCommand.EndExecuteNonQueryAsync(IAsyncResult asyncResult)
at System.Threading.Tasks.TaskFactory`1.FromAsyncCoreLogic(IAsyncResult iar, Func`2 endFunction, Action`1 endAction, Task`1 promise, Boolean requiresSynchronization) --- End of stack trace from previous location ---
at Medallion.Threading.Internal.Data.DatabaseCommand.InternalExecuteAndPropagateCancellationAsync[TState,TResult](TState state, Func`3 executeAsync, CancellationToken cancellationToken, Boolean isConnectionMonitoringQuery) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\DatabaseCommand.cs:line 165
at Medallion.Threading.Internal.Data.DatabaseCommand.ExecuteAsync[TResult](Func`3 executeAsync, Func`2 executeSync, CancellationToken cancellationToken, Boolean disallowAsyncCancellation, Boolean isConnectionMonitoringQuery) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\DatabaseCommand.cs:line 92
at Medallion.Threading.SqlServer.SqlApplicationLock.ExecuteAcquireCommandAsync(DatabaseConnection connection, String lockName, TimeoutValue timeout, CancellationToken cancellationToken) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.SqlServer\SqlApplicationLock.cs:line 64
at Medallion.Threading.SqlServer.SqlApplicationLock.Medallion.Threading.Internal.Data.IDbSynchronizationStrategy<System.Object>.TryAcquireAsync(DatabaseConnection connection, String resourceName, TimeoutValue timeout, CancellationToken cancellationToken) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.SqlServer\SqlApplicationLock.cs:line 47
at Medallion.Threading.Internal.Data.MultiplexedConnectionLock.TryAcquireAsync[TLockCookie](String name, TimeoutValue timeout, IDbSynchronizationStrategy`1 strategy, TimeoutValue keepaliveCadence, CancellationToken cancellationToken, Boolean opportunistic)
at Medallion.Threading.Internal.Data.MultiplexedConnectionLock.TryAcquireAsync[TLockCookie](String name, TimeoutValue timeout, IDbSynchronizationStrategy`1 strategy, TimeoutValue keepaliveCadence, CancellationToken cancellationToken, Boolean opportunistic) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\MultiplexedConnectionLock.cs:line 94
at Medallion.Threading.Internal.Data.MultiplexedConnectionLockPool.TryAcquireAsync[TLockCookie](String connectionString, String name, TimeoutValue timeout, IDbSynchronizationStrategy`1 strategy, TimeoutValue keepaliveCadence, CancellationToken cancellationToken) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\MultiplexedConnectionLockPool.cs:line 91
at Medallion.Threading.Internal.Data.MultiplexedConnectionLockPool.TryAcquireAsync[TLockCookie](String connectionString, String name, TimeoutValue timeout, IDbSynchronizationStrategy`1 strategy, TimeoutValue keepaliveCadence, CancellationToken cancellationToken) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\MultiplexedConnectionLockPool.cs:line 97
at Medallion.Threading.Internal.DistributedLockHelpers.Wrap[THandle](ValueTask`1 handleTask, Func`2 factory) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\DistributedLockHelpers.cs:line 41
at Medallion.Threading.Internal.DistributedLockHelpers.ThrowTimeoutIfNull[T](ValueTask`1 task, String object) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\DistributedLockHelpers.cs:line 142
at Medallion.Threading.Internal.Helpers.Convert[TDerived,TBase](ValueTask`1 task, ValueTaskConversion _) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Helpers.cs:line 24
at Sample.App.JobBase.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\OskarKlintrot\source\repos\Sample\Sample.App\Workers.cs:line 41
at Sample.App.JobBase.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\OskarKlintrot\source\repos\Sample\Sample.App\Workers.cs:line 65 ClientConnectionId:7b765ec6-eaef-4a74-b8ee-5214c20ed456 Error Number:596,State:1,Class:21
03:00:18 fail: Sample.App.Job1[0] Exception System.InvalidOperationException thrown:
System.InvalidOperationException: BeginExecuteNonQuery requires an open and available Connection. The connection's current state is closed.
at Microsoft.Data.SqlClient.SqlCommand.ValidateCommand(Boolean isAsync, String method)
at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQuery(TaskCompletionSource`1 completion, Boolean sendToPipe, Int32 timeout, Boolean& usedCache, Boolean asyncWrite, Boolean inRetry, String methodName)
at Microsoft.Data.SqlClient.SqlCommand.BeginExecuteNonQueryInternal(CommandBehavior behavior, AsyncCallback callback, Object stateObject, Int32 timeout, Boolean inRetry, Boolean asyncWrite)
at Microsoft.Data.SqlClient.SqlCommand.BeginExecuteNonQueryAsync(AsyncCallback callback, Object stateObject)
at System.Threading.Tasks.TaskFactory`1.FromAsyncImpl(Func`3 beginMethod, Func`2 endFunction, Action`1 endAction, Object state, TaskCreationOptions creationOptions)
at System.Threading.Tasks.TaskFactory`1.FromAsync(Func`3 beginMethod, Func`2 endMethod, Object state)
at Microsoft.Data.SqlClient.SqlCommand.InternalExecuteNonQueryAsync(CancellationToken cancellationToken) --- End of stack trace from previous location ---
at Medallion.Threading.Internal.Data.DatabaseCommand.ExecuteAsync[TResult](Func`3 executeAsync, Func`2 executeSync, CancellationToken cancellationToken, Boolean disallowAsyncCancellation, Boolean isConnectionMonitoringQuery) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\DatabaseCommand.cs:line 88
at Medallion.Threading.SqlServer.SqlApplicationLock.ExecuteReleaseCommandAsync(DatabaseConnection connection, String lockName, Boolean isTry) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.SqlServer\SqlApplicationLock.cs:line 71
at Medallion.Threading.Internal.Data.MultiplexedConnectionLock.ReleaseAsync[TLockCookie](IDbSynchronizationStrategy`1 strategy, String name, TLockCookie lockCookie) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\MultiplexedConnectionLock.cs:line 159
at Medallion.Threading.Internal.Data.MultiplexedConnectionLock.ReleaseAsync[TLockCookie](IDbSynchronizationStrategy`1 strategy, String name, TLockCookie lockCookie) in C:\Users\mikea_000\Documents\Interests\CS\DistributedLock\DistributedLock.Core\Internal\Data\MultiplexedConnectionLock.cs:line 173
at Sample.App.JobBase.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\OskarKlintrot\source\repos\Sample\Sample.App\Workers.cs:line 65
at Sample.App.JobBase.ExecuteAsync(CancellationToken stoppingToken) in C:\Users\OskarKlintrot\source\repos\Sample\Sample.App\Workers.cs:line 65
I made a repo with a fully working sample: https://github.com/OskarKlintrot/DistributedLock-bug |
Ok using your example code I was able to repro locally and I have a fix which seems to resolve the issue for me. I've published a prerelease version with my fix. Can you install it locally and let me know whether or not this resolves the issue for you? As for what I changed: Basically in
I ran into this too. The reason you don't see it is that when the cancellation triggers it completes the
I still think you'll generally want usings with your registrations; just have to be careful depending on the particular pattern! |
I installed the prerelease version and restarted the server 10 times (
Oh, that explains it! Nice catch!
Thanks for the explanation, which makes sense! What I did instead is basically this: CancellationTokenRegistration? cancellationTokenRegistration = null;
try
{
cancellationTokenRegistration = handle.HandleLostToken.Register(() => _logger.LogError("Lost lock."));
}
catch (Exception exception)
{ }
finally
{
if (cancellationTokenRegistration.HasValue)
{
await cancellationTokenRegistration.Value.DisposeAsync();
}
} |
I just finished a little experiment. I raised the time before retrying to require a lock to 10 sec so the jobs would take turn picking the lock. I also raised the time before restarting the SQL Server to 60 sec. Then I restarted the SQL Server 100 times and everything still looks perfectly fine: (I used a static field to count the lost locks, hence 100 times) |
What's the plan forward btw? Should I use the pre-release package or will you release a stable version soon? Or will you bump |
Awesome thanks for the rigorous testing!
I plan to release this as a stable patch fix to DL.Core and to all of the other SQL-based libraries (including DL.SqlServer). I may bundle in some other small fixes as well after reviewing the backlog. While I expect to have this out relatively soon (should have some time to prioritize this weekend), I'd suggest that you use the prerelease version in the meantime so that you are unblocked. |
Awesome, thanks for a very quick fix! Looking forward to the stable patch fix! I'll use the prerelease until then. |
If I run my code (simplified below) with two different instances locally (
dotnet run
or VS for the first anddotnet run --no-build
for the second) and restart my db a couple of times odds are quite high that eventually an instance will just keep going without the lock (in the while-loop). It could be as few restarts as 3-4 restarts. My understanding where that when you accesshandle.HandleLostToken.IsCancellationRequested
it will make sure that the lock still exists?Or is that not working together withI just tried withoutCancellationTokenSource.CreateLinkedTokenSource
?CancellationTokenSource.CreateLinkedTokenSource
and get the same result. Not sure what is going on...Update: I can replicate it with just one instance (even if the log gets a bit tricker to follow then):
The text was updated successfully, but these errors were encountered: