Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix async deadlock issue when sending attention fails due to network failure #1766

Merged
merged 1 commit into from
Oct 4, 2022

Conversation

cheenamalhotra
Copy link
Member

@cheenamalhotra cheenamalhotra commented Sep 17, 2022

In an edge case scenario, when a network failure occurs when drives attempts to send an attention signal, the driver fails to throw exception and terminate connection, leaving the connection open indefinitely, when async APIs are used to query the database. This can cause applications to go into hang due to driver getting into a deadlock situation as the async callbacks won't clear.

The issue is associated with the synchronous timeout flow that kicks off on ReadAsyncCallback, which will only be executed when timeout threads are not able to acquire thread from Threadpool and execute until callback returns on the other thread and finds out that the timeout is expired but attention is not yet triggered.

This PR fixes the issue by enabling 'asyncClose' flag on ThrowExceptionAndWarning methods when performing timeout activity via 'ReadAsyncCallback" as the native SNI needs to clear the callback on termination of connection. This was a missed opportunity in PR #906 that caused this edge case scenario to affect a few customers.

I tried adding tests, but it won't be possible to neatly add a test, without causing a lot of code changes, therefore skipping here. Linking repro below which can be easily executed and tested with the PR.

Link to Repro App: https://gist.github.com/cheenamalhotra/94d501e093a0392520fe2095c4bb7491

@codecov
Copy link

codecov bot commented Sep 17, 2022

Codecov Report

Base: 71.59% // Head: 71.35% // Decreases project coverage by -0.24% ⚠️

Coverage data is based on head (85dcba9) compared to base (7d348eb).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1766      +/-   ##
==========================================
- Coverage   71.59%   71.35%   -0.25%     
==========================================
  Files         293      293              
  Lines       61376    61376              
==========================================
- Hits        43940    43792     -148     
- Misses      17436    17584     +148     
Flag Coverage Δ
addons 92.38% <ø> (ø)
netcore 74.95% <100.00%> (-0.27%) ⬇️
netfx 69.19% <100.00%> (-0.14%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files Coverage Δ
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 81.57% <100.00%> (-1.33%) ⬇️
...c/Microsoft/Data/SqlClient/TdsParserStateObject.cs 81.58% <100.00%> (-0.51%) ⬇️
...ata/SqlClient/SqlConnectionTimeoutErrorInternal.cs 30.35% <0.00%> (-11.61%) ⬇️
...ActiveDirectoryAuthenticationTimeoutRetryHelper.cs 63.63% <0.00%> (-4.55%) ⬇️
...Microsoft/Data/ProviderBase/DbConnectionFactory.cs 17.04% <0.00%> (-3.41%) ⬇️
...crosoft/Data/ProviderBase/DbReferenceCollection.cs 92.63% <0.00%> (-3.16%) ⬇️
...e/src/Microsoft/Data/SqlClient/SNI/SNITcpHandle.cs 61.64% <0.00%> (-1.86%) ⬇️
...core/src/Microsoft/Data/SqlClient/SqlConnection.cs 77.31% <0.00%> (-1.21%) ⬇️
...c/Microsoft/Data/SqlClient/SqlConnectionFactory.cs 68.85% <0.00%> (-0.82%) ⬇️
... and 11 more

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

Copy link
Contributor

@DavoudEshtehari DavoudEshtehari left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 🎉

@@ -3603,7 +3604,7 @@ private Task SNIWritePacket(SNIHandle handle, SNIPacket packet, out UInt32 sniEr
{
SqlClientEventSource.Log.TryTraceEvent("<sc.TdsParser.WritePacket|Info> write async returned error code {0}", (int)sniError);
AddError(_parser.ProcessSNIError(this));
ThrowExceptionAndWarning(callerHasConnectionLock);
ThrowExceptionAndWarning(callerHasConnectionLock, false);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheenamalhotra Did we purposely call ThrowExceptionAndWarning with the asyncClose parameter hard coded to false here? In the same line in netcore, we passed the local value of asyncClose to the same parameter. (Line 3573 in the netcore TdsParserStateObject.cs file above in this PR.)

This seems like a mistake and I've changed this line to match netcore in PR #2073.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants