Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection gets stuck in SNITCPHandle.TryConnectParallel #2192

Open
sksk571 opened this issue Oct 20, 2023 · 5 comments
Open

Connection gets stuck in SNITCPHandle.TryConnectParallel #2192

sksk571 opened this issue Oct 20, 2023 · 5 comments
Labels
⏳ Waiting for Customer We're waiting for your response :)

Comments

@sksk571
Copy link

sksk571 commented Oct 20, 2023

Describe the bug

SqlConnection.Open gets stuck in SNITCPHandle.TryConnectParallel and times out when there is no available threads in thread pool.

Exception message:

Microsoft.Data.SqlClient.SqlException (0x80131904): A network-related or instance-specific error occurred while establishing a connection to SQL Server. The server was not found or was not accessible. Verify that the instance name is correct and that SQL Server is configured to allow remote connections. (provider: TCP Provider, error: 40 - Could not open a connection to SQL Server: Could not open a connection to SQL Server)

Stack trace:

   at Microsoft.Data.SqlClient.SqlInternalConnection.OnError(SqlException exception, Boolean breakConnection, Action`1 wrapCloseInAction)
   at Microsoft.Data.SqlClient.TdsParser.ThrowExceptionAndWarning(TdsParserStateObject stateObj, Boolean callerHasConnectionLock, Boolean asyncClose)
   at Microsoft.Data.SqlClient.TdsParser.Connect(ServerInfo serverInfo, SqlInternalConnectionTds connHandler, Boolean ignoreSniOpenTimeout, Int64 timerExpire, SqlConnectionString connectionOptions, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean ignoreSniOpenTimeout, TimeoutTimer timeout, Boolean withFailover)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(ServerInfo serverInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString connectionOptions, SqlCredential credential, TimeoutTimer timeout)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(TimeoutTimer timeout, SqlConnectionString connectionOptions, SqlCredential credential, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance)
   at Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(DbConnectionPoolIdentity identity, SqlConnectionString connectionOptions, SqlCredential credential, Object providerInfo, String newPassword, SecureString newSecurePassword, Boolean redirectedUserInstance, SqlConnectionString userConnectionOptions, SessionData reconnectSessionData, Boolean applyTransientFaultHandling, String accessToken, DbConnectionPool pool)
   at Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(DbConnectionOptions options, DbConnectionPoolKey poolKey, Object poolGroupProviderInfo, DbConnectionPool pool, DbConnection owningConnection, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(DbConnectionPool pool, DbConnection owningObject, DbConnectionOptions options, DbConnectionPoolKey poolKey, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionPool.CreateObject(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.UserCreateRequest(DbConnection owningObject, DbConnectionOptions userOptions, DbConnectionInternal oldConnection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, UInt32 waitForMultipleObjectsTimeout, Boolean allowCreate, Boolean onlyOneCheckConnection, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(DbConnection owningObject, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(DbConnection owningConnection, TaskCompletionSource`1 retry, DbConnectionOptions userOptions, DbConnectionInternal oldConnection, DbConnectionInternal& connection)
   at Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(DbConnection outerConnection, DbConnectionFactory connectionFactory, TaskCompletionSource`1 retry, DbConnectionOptions userOptions)
   at Microsoft.Data.SqlClient.SqlConnection.TryOpen(TaskCompletionSource`1 retry, SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open(SqlConnectionOverrides overrides)
   at Microsoft.Data.SqlClient.SqlConnection.Open()

We have an ASP.NET HTTP application running in Alpine Linux in K8S. The application makes synchronous SQL requests to the database. During the peak load we sometimes experience a cascade of connection failures.

Dump files collected during the incident contain a number of threads waiting in SNITCPHandle.TryConnectParallel. This method uses sync-over-async to connect to multiple IP addresses in parallel. This, coupled with the thread pool exhaustion caused by a big amount of incoming requests, may be the reason for timeouts.

Stack trace from the dump file:

System.Threading.ManualResetEventSlim.Wait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.SpinThenBlockingWait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.InternalWaitCore(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.Wait(Int32, System.Threading.CancellationToken)
System.Threading.Tasks.Task.Wait(System.TimeSpan)
Microsoft.Data.SqlClient.SNI.SNITCPHandle.TryConnectParallel(System.String, Int32, System.TimeSpan, Boolean, Boolean ByRef, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.SNITCPHandle..ctor(System.String, Int32, Int64, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.SNIProxy.CreateTcpHandle(Microsoft.Data.SqlClient.SNI.DataSource, Int64, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.SNIProxy.CreateConnectionHandle(System.String, Boolean, Int64, Byte[] ByRef, Byte[][] ByRef, Boolean, Boolean, Boolean, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef)
Microsoft.Data.SqlClient.SNI.TdsParserStateObjectManaged.CreatePhysicalSNIHandle(System.String, Boolean, Int64, Byte[] ByRef, Byte[][] ByRef, Boolean, Boolean, Boolean, Microsoft.Data.SqlClient.SqlConnectionIPAddressPreference, System.String, Microsoft.Data.SqlClient.SQLDNSInfo ByRef, Boolean)
Microsoft.Data.SqlClient.TdsParser.Connect(Microsoft.Data.SqlClient.ServerInfo, Microsoft.Data.SqlClient.SqlInternalConnectionTds, Boolean, Int64, Boolean, Boolean, Boolean, Boolean, Microsoft.Data.SqlClient.SqlAuthenticationMethod)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.AttemptOneLogin(Microsoft.Data.SqlClient.ServerInfo, System.String, System.Security.SecureString, Boolean, Microsoft.Data.ProviderBase.TimeoutTimer, Boolean)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.LoginNoFailover(Microsoft.Data.SqlClient.ServerInfo, System.String, System.Security.SecureString, Boolean, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, Microsoft.Data.ProviderBase.TimeoutTimer)
Microsoft.Data.SqlClient.SqlInternalConnectionTds.OpenLoginEnlist(Microsoft.Data.ProviderBase.TimeoutTimer, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, System.String, System.Security.SecureString, Boolean)
Microsoft.Data.SqlClient.SqlInternalConnectionTds..ctor(Microsoft.Data.ProviderBase.DbConnectionPoolIdentity, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SqlCredential, System.Object, System.String, System.Security.SecureString, Boolean, Microsoft.Data.SqlClient.SqlConnectionString, Microsoft.Data.SqlClient.SessionData, Boolean, System.String, Microsoft.Data.ProviderBase.DbConnectionPool)
Microsoft.Data.SqlClient.SqlConnectionFactory.CreateConnection(Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.Common.DbConnectionPoolKey, System.Object, Microsoft.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions)
Microsoft.Data.ProviderBase.DbConnectionFactory.CreatePooledConnection(Microsoft.Data.ProviderBase.DbConnectionPool, System.Data.Common.DbConnection, Microsoft.Data.Common.DbConnectionOptions, Microsoft.Data.Common.DbConnectionPoolKey, Microsoft.Data.Common.DbConnectionOptions)

To reproduce

The following code reproduces the issue in Ubuntu 22.04 running in WSL. MultiSubnetFailover=True switches SNITCPHandle to use TryConnectParallel and triggers the bug.

using Microsoft.Data.SqlClient;

// Create thread pool exhaustion condition
List<Task> tasks = new();
for (int i = 0; i < 10000; ++i)
    tasks.Add(Waiter());

Console.WriteLine("Press any key to execute query, press q to quit...");
// Run synchronous query
while (Console.ReadKey().KeyChar != 'q')
    Console.WriteLine(Version());

async Task Waiter()
{
    await Task.Delay(100);
    Task.Delay(Timeout.Infinite).Wait();
}

string Version()
{
    using var conn = new SqlConnection("Server=tcp:127.0.0.1,1433;User ID=sa;Password=Password1;Trust server certificate=True;MultiSubnetFailover=True");
    conn.Open();
    var cmd = conn.CreateCommand();
    cmd.CommandText = "SELECT @@VERSION AS Version";
    var r = cmd.ExecuteReader();
    if (r.Read())
    {
        return (string)r["Version"];
    }
    return string.Empty;
}

Expected behavior

Synchronous SqlConnection.Open should be able to connect to SQL server regardless of the current ThreadPool usage.

Further technical details

Microsoft.Data.SqlClient version: 5.1.1
.NET target: .NET6
SQL Server version: SQL Server 2022
Operating system: Alpine 3.18 in a Docker container

Additional context
SQL server for the repro case was installed in Docker using the following command

docker run -e "ACCEPT_EULA=Y" -e "MSSQL_SA_PASSWORD=Password1" -p 1433:1433 -d mcr.microsoft.com/mssql/server:2022-latest
@JRahnama
Copy link
Member

JRahnama commented Oct 20, 2023

@sksk571 this seems similar to issue #422 on the root cause. I can suggest to use MaxPoolsize of 20 or limiting available threads on the application.

Correction: Try increasing the minimum available threads.

@JRahnama JRahnama added ⏳ Waiting for Customer We're waiting for your response :) and removed untriaged labels Oct 24, 2023
@sksk571
Copy link
Author

sksk571 commented Mar 12, 2024

What I find strange in this scenario is why SNITCPHandle uses TryConnectParallel even when server hostname resolves into just one IP. Can synchronous Connect be used in this case?

@David-Engel
Copy link
Contributor

What I find strange in this scenario is why SNITCPHandle uses TryConnectParallel even when server hostname resolves into just one IP. Can synchronous Connect be used in this case?

Your connection string is specifying MultiSubnetFailover=True yet also specifying a single IP address (Server=tcp:127.0.0.1,1433). Since there aren't multiple IPs to try, MultiSubnetFailover isn't needed. Turn it off so that the TryConnectParallel path isn't used.

@sksk571
Copy link
Author

sksk571 commented Apr 17, 2024

This is just an example to reproduce the bug. In production we use multi subnet failover as a part of our DR strategy and the server hostname resolves into multiple IPs there.

@JRahnama increasing MinThreads didn't work in a linked issue why do you suggest to try it here? As I understand, increasing MinThreads has a negative performance impact because threads are created more often.

@adc-cjewett
Copy link

I believe we're running into an issue similar to this and our usage of MultiSubnetFailover with a server hostname that resolves to multiple IPs matches what @sksk571 does. Seems to only happen on Linux as well.

Are there other options to resolve this issue?

Message: Unobserved task exception
Exception: System.AggregateException: A Task's exception(s) were not observed either by Waiting on the Task or accessing its Exception property. As a result, the unobserved exception was rethrown by the finalizer thread. (Connection timed out)  ---> System.Net.Sockets.SocketException (110): Connection timed out
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.ThrowException(SocketError error, CancellationToken cancellationToken)
   at System.Net.Sockets.Socket.AwaitableSocketAsyncEventArgs.System.Threading.Tasks.Sources.IValueTaskSource.GetResult(Int16 token)
   at System.Threading.Tasks.ValueTask.ValueTaskSourceAsTask.<>c.<.cctor>b__4_0(Object state)
--- End of stack trace from previous location ---
   at System.Data.SqlClient.SNI.SNITCPHandle.ParallelConnectHelper(Socket socket, Task connectTask, TaskCompletionSource`1 tcs, StrongBox`1 pendingCompleteCount, StrongBox`1 lastError, List`1 sockets)
   --- End of inner exception stack trace ---

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
⏳ Waiting for Customer We're waiting for your response :)
Projects
Status: Needs More Info
Development

No branches or pull requests

4 participants