Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support parallel Physical Connection creation for multi-threaded scenarios #408

Closed
joshmouch opened this issue Feb 4, 2020 · 33 comments · Fixed by #466
Closed

Support parallel Physical Connection creation for multi-threaded scenarios #408

joshmouch opened this issue Feb 4, 2020 · 33 comments · Fixed by #466
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain. 📈 Performance Issues that are targeted to performance improvements.

Comments

@joshmouch
Copy link

I'm trying to spin up 50 threads to immediately open 50 SqlConnections to a Microsoft SQL Server connection pool. However, no matter what I try, the SqlConnections seem to only open one at a time. In other words, if each connection takes 2 seconds to open, then the 50'th thread's Open method will take 100 seconds to complete. Further to this point, if the Connection Timeout isn't set to > 100 seconds then the connection attempts will eventually start timing out. With the default Connection Timeout = 30, the 16'th thread's Open times out.

To put this another way, it seems like the 1st SqlConnection.Open takes 2 seconds, the 2nd SqlConnection.Open takes 4 seconds, the 3rd SqlConnection.Open takes 6 seconds... the 16th SqlConnection.Open takes 32 seconds and causes a timeout exception.

I also tried setting Min Pool Size=50 and just opening a single SqlConnection, and that has the same behavior. There seems to be a background process that starts opening connections until it gets to 50, but the 16th one times out because it's over the 30 second connection timeout.

It's like there's something blocking internally to only allow this one-at-a-time addition to the connection pool.
I've asked around on stack overflow, and the responses seem to suggest that if you open 50 connections in parallel, they should all open.

So this makes me wonder if this is a .Net Core bug or "feature".

@joshmouch joshmouch changed the title ADO.Net connection pooling doesn't seem to support multiple threads well SqlConnection connection pooling doesn't seem to support multiple threads well Feb 4, 2020
@benaadams
Copy link
Member

How are you spinning up the threads?

@joshmouch
Copy link
Author

joshmouch commented Feb 4, 2020

@benaadams , I've tried this two ways: with and without threads (without threads, I just set "Min Pool Size=50" in the connection string then open a single connection, which triggers some background process to start opening connections.

But one of the iterations with threads is:


var tasks = new List<Task>();
var connections = new ConcurrentBag<SqlConnection>();
for (int i = 0; i < 50; i++)
{
	var task = new Task(() =>
	{
		var connection = new SqlConnection(sqlConnectionString);
		connections.Add(connection);
		connection.Open();
		var a = 1; // If I set breakpoint here, there is a 2 second delay between each time it gets hit.  I get a Connection Timeout before the 16th iteration.
	}, TaskCreationOptions.LongRunning);
	task.Start();
	tasks.Add(task);
}

Task.WaitAll(tasks.ToArray());

tasks = new List<Task>();
foreach (var connection in connections)
{
	var task = new Task(() =>
	{
		connection.Close();
	}, TaskCreationOptions.LongRunning);
	task.Start();
	tasks.Add(task);
}
Task.WaitAll(tasks.ToArray());

@joshmouch
Copy link
Author

I should also add that I'm running the following SQL to monitor the connections, and I see them getting created at a rate of 1 per 2 seconds using both methods (multithreading or a single thread with Min Pool Size set).

SELECT  des.program_name
      , des.login_name
      , des.host_name
      , COUNT(des.session_id) [Connections]
FROM    sys.dm_exec_sessions des
INNER JOIN sys.dm_exec_connections DEC
        ON des.session_id = DEC.session_id
WHERE   des.is_user_process = 1
        AND des.status != 'running'
GROUP BY des.program_name
      , des.login_name
      , des.host_name
HAVING  COUNT(des.session_id) > 2
ORDER BY COUNT(des.session_id) DESC

@joshmouch
Copy link
Author

In the multi-threading scenario, if I look at the thread list once all 50 threads start, most of them (48ish) are waiting with the following stack track:

Not Flagged		44212	41	Worker Thread	<No Name>	System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitMultiple
System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitMultiple(System.ReadOnlySpan<System.Threading.WaitHandle> waitHandles, bool waitAll, int millisecondsTimeout)
System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle[] waitHandles, bool waitAll, int millisecondsTimeout)
System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitAny(System.Threading.WaitHandle[] waitHandles, int millisecondsTimeout)
Microsoft.Data.SqlClient.dll!Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(System.Data.Common.DbConnection owningObject, uint waitForMultipleObjectsTimeout, bool allowCreate, bool onlyOneCheckConnection, Microsoft.Data.Common.DbConnectionOptions userOptions, out Microsoft.Data.ProviderBase.DbConnectionInternal connection)
Microsoft.Data.SqlClient.dll!Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(System.Data.Common.DbConnection owningObject, System.Threading.Tasks.TaskCompletionSource<Microsoft.Data.ProviderBase.DbConnectionInternal> retry, Microsoft.Data.Common.DbConnectionOptions userOptions, out Microsoft.Data.ProviderBase.DbConnectionInternal connection)
Microsoft.Data.SqlClient.dll!Microsoft.Data.ProviderBase.DbConnectionFactory.TryGetConnection(System.Data.Common.DbConnection owningConnection, System.Threading.Tasks.TaskCompletionSource<Microsoft.Data.ProviderBase.DbConnectionInternal> retry, Microsoft.Data.Common.DbConnectionOptions userOptions, Microsoft.Data.ProviderBase.DbConnectionInternal oldConnection, out Microsoft.Data.ProviderBase.DbConnectionInternal connection)
Microsoft.Data.SqlClient.dll!Microsoft.Data.ProviderBase.DbConnectionInternal.TryOpenConnectionInternal(System.Data.Common.DbConnection outerConnection, Microsoft.Data.ProviderBase.DbConnectionFactory connectionFactory, System.Threading.Tasks.TaskCompletionSource<Microsoft.Data.ProviderBase.DbConnectionInternal> retry, Microsoft.Data.Common.DbConnectionOptions userOptions)
Microsoft.Data.SqlClient.dll!Microsoft.Data.ProviderBase.DbConnectionClosed.TryOpenConnection(System.Data.Common.DbConnection outerConnection, Microsoft.Data.ProviderBase.DbConnectionFactory connectionFactory, System.Threading.Tasks.TaskCompletionSource<Microsoft.Data.ProviderBase.DbConnectionInternal> retry, Microsoft.Data.Common.DbConnectionOptions userOptions)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlConnection.TryOpen(System.Threading.Tasks.TaskCompletionSource<Microsoft.Data.ProviderBase.DbConnectionInternal> retry)
Microsoft.Data.SqlClient.dll!Microsoft.Data.SqlClient.SqlConnection.Open()


@joshmouch
Copy link
Author

If I use OpenAsync instead of Open, the stuck thread has a similar stack trace:

Not Flagged		62476	26	Worker Thread	<No Name>	System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitMultiple
System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitMultiple(System.ReadOnlySpan<System.Threading.WaitHandle> waitHandles, bool waitAll, int millisecondsTimeout)
System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitMultiple(System.Threading.WaitHandle[] waitHandles, bool waitAll, int millisecondsTimeout)
System.Private.CoreLib.dll!System.Threading.WaitHandle.WaitAny(System.Threading.WaitHandle[] waitHandles, int millisecondsTimeout)
Microsoft.Data.SqlClient.dll!Microsoft.Data.ProviderBase.DbConnectionPool.TryGetConnection(System.Data.Common.DbConnection owningObject, uint waitForMultipleObjectsTimeout, bool allowCreate, bool onlyOneCheckConnection, Microsoft.Data.Common.DbConnectionOptions userOptions, out Microsoft.Data.ProviderBase.DbConnectionInternal connection)
Microsoft.Data.SqlClient.dll!Microsoft.Data.ProviderBase.DbConnectionPool.WaitForPendingOpen()
System.Private.CoreLib.dll!System.Threading.ThreadHelper.ThreadStart_Context(object state)
System.Private.CoreLib.dll!System.Threading.ExecutionContext.RunInternal(System.Threading.ExecutionContext executionContext, System.Threading.ContextCallback callback, object state)
System.Private.CoreLib.dll!System.Threading.ThreadHelper.ThreadStart()

@benaadams
Copy link
Member

I think unless you are specifying MaxPoolSize=N in the connection string it should default to 100 connections; so is a bit strange. /cc @cheenamalhotra

You might want to open this issue in https://github.com/dotnet/runtime (or have it transferred)

@joshmouch
Copy link
Author

I have MaxPoolSize=500 just in case that was a factor, but it wasn't.

@joshmouch
Copy link
Author

joshmouch commented Feb 4, 2020

I should also note that if I turn connection pooling off, these 50 connections open almost all at the same time then close again almost all at the same time. So the client/server itself isn't a factor. In other words, opening 50 connection simultaneously posed no issue until pooling is enabled.

@cheenamalhotra
Copy link
Member

We can transfer this issue to dotnet/sqlclient and investigate more.

@cheenamalhotra cheenamalhotra transferred this issue from dotnet/core Feb 4, 2020
@cheenamalhotra
Copy link
Member

@joshmouch

I'm looking into the issue, could you also test and confirm if this issue is reproducible with previous System.Data.SqlClient versions or it's only happening recently; to rule out the factor of being a regression?

@Wraith2
Copy link
Contributor

Wraith2 commented Feb 6, 2020

When you say you've tried it with and without threads do you mean that you manually created 100 threads and used them to connect? or that you used the threadpool? With direct threadpool or tasks (indirectly threadpool assuming ConfigureAwait(false)) you'll hit the rate limiter on thread creation through the pool of roughly 1 per second which would mean you'd see blocking and slow creation of threads to service the queued requests, even with 100 connection objects if all the threads are busy (assuming you don't connect and go idle) you can't do more than the pool thread count in parallel.

@joshmouch
Copy link
Author

What I mean by "without threads" is that if I simply set "Min Pool Size=50", then do this:

var connection1 = new SqlConnection(sqlConnectionString);
connection1.Open();
connection1.Close();
connection1.Dispose();
System.Threading.Thread.Sleep(60000);

Then while the Thread.Sleep is running, I can see the connections opening up 1 by 1 about every 2 seconds.

And what I mean by "Without threading", is that I remove "Min Pool Size" from the connection string then try to open up 50 connection manually at the same time.

In answer to your question, with threads, it should not be using the threadpool because I have the LongRunning option set. But just in case, I just tried this with the same result:


			var threads = new List<Thread>();
			var connections = new ConcurrentBag<SqlConnection>();
			for (int i = 0; i < 50; i++)
			{
				var task = new Thread(() =>
				{
					var connection = new SqlConnection(sqlConnectionString);
					connections.Add(connection);
					connection.Open();
					var a = 1; // If I set breakpoint here, there is a 2 second delay between each time it gets hit.  I get a Connection Timeout before the 16th iteration.
				});
				task.Start();
				threads.Add(task);
			}

			foreach (var thread in threads)
			{
				thread.Join();
			}

@cheenamalhotra
Copy link
Member

Hi @joshmouch

I tried using your repro in the comment above but could not reproduce the problem. I ran the test on Windows with both Native and Managed SNI (for Unix).

Could you provide me below info:

  • Connection String
  • Client OS
  • .NET Core version
  • SQL Server version

Please provide specific details as necessary to reproduce.

@joshmouch
Copy link
Author

@cheenamalhotra ,
So with both cases, you saw 50 connections open up to the SQL server, immediately?

@Wraith2
Copy link
Contributor

Wraith2 commented Feb 21, 2020

This is another case where a user is asking to create n (where n>30) connections instantly. Unless those are n connections to n different database servers it seems like something of an antipattern. What can a single process be doing where it's capable of feeding 50 queries at once?

@joshmouch
Copy link
Author

@Wraith2 50 is an arbitrary number. I'm actually looking for around 10.

@cheenamalhotra
Copy link
Member

408.zip

Attached my repro, I ran this in debug mode where we can toggle Managed SNI, and the logs didn't suggest any such behavior and all connections connect instantly.

Unless this is specific to Unix (I'm going to try today), Windows doesn't reproduce this error with both Native/Managed SNI.

@Wraith2
Copy link
Contributor

Wraith2 commented Feb 21, 2020

The unix implementation could hit the same problem I identified in #422 if even a single threadpool thread is unavailable at the time the connections are initiated one or more may block and fail.

@joshmouch
Copy link
Author

joshmouch commented Mar 9, 2020

@cheenamalhotra ,
I ran your console project and got the the 2-second per connection problem. So you're saying that you don't see the connections each taking 2 seconds longer than the previous connection?

I had to update a few things:
You had a direct reference to a "Microsoft.Data.SqlClient" that I had to replace with the latest nuget package (v1.1.1).
Also, I had to modify the line that checks for the environment variable to allow for nulls because that environment value doesn't exist on my machine.
Here's the output:

image

@cheenamalhotra
Copy link
Member

Strange to see you are getting different results, I also tested with direct reference of M.D.S 1.1.1 and without environment variable on Windows. It uses Native SNI (SNI.dll), which should be working anyhow.

Did you also do any modifications in test in the way you connect?

@joshmouch
Copy link
Author

@cheenamalhotra , I only modified the connection string to point to my development SQL instance.

@cheenamalhotra
Copy link
Member

I think your server is probably slow to connect here.

Anyways, when connecting to Azure DB I do see the behavior, but that's because connections are not getting closed.

I'll try to explain how does it work:

  1. There is only 1 TDS Stream to connect to SQL Server. When you open 50 tasks, 1st of them creates a new physical connection with server.
    1(a) If you do not close this connection, the second one in queue will also create physical connection, and so on..
    1(b) If you close the connection in the first task before second one even grabbed one, it finds it in the pool and uses the same without opening it (which is expensive in your case).
    1(c) if you close the connection, but not before the second task tries to get one, you need 1 more connection to work with it - ends up in a second physical connection.

This behavior is as expected, as I said, there is only 1 TDS Stream with SQL Server that driver communicates on. Opening 50 "physical" connections is going to be sequential.

@joshmouch
Copy link
Author

joshmouch commented Mar 9, 2020

@cheenamalhotra , You said you were getting different results. So you were actually seeing the same behavior on the local SQL server in your test?

And just to clarify: You're saying that the driver is not capable of opening more than one physical connection at a time? So if you have "Min Pool Size=100" in connection string, it'll take 200 seconds until the connections become available? And if you try to access them before that 200 seconds, one should expect a "Connection Timeout" exception?

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Mar 9, 2020

Yes, I think it was same, it didn't surface in my case as connection time was in milliseconds, and all 50 connections could be made within 1 second.

You're saying that the driver is not capable of opening more than one physical connection at a time. So if you have "Min Pool Size=100" in connection string, it'll take 200 seconds until the connections become available

Yes this is true. It will take 100 * (time to make 1 physical connection) to open 100 physical connections. This is current design of driver.

I've attached log with milliseconds captured, and you can see it takes diff 1-2 milliseconds to create a physical connection to local SQL Server 2019, and diff 100-150 milliseconds to connect to Azure DB.
log.txt.
The first connection needs some warm-up time hence you see it's always higher.

But 2 seconds (2000 ms) is way too slow to connect so you must investigate that case. And if you are opening 50 slow connections back to back, it will take 100 seconds.

@cheenamalhotra
Copy link
Member

Also to add:

This is not linux specific, and I've also tested this with System.Data.Odbc and System.Data.Oledb namesapces and behaviors are same.

cc @saurabh500

@cheenamalhotra
Copy link
Member

@joshmouch

Can you also give me output of SELECT @@VERSION from your server, it will help me investigate further if there is any alternative available as I see difference of behaviors with regular/Azure DBs with different client drivers.

@joshmouch
Copy link
Author

@cheenamalhotra , I also tried a different server, and it no longer took 2 seconds per connection!
So while it's odd that these connect in sequence, using this other server connects fast enough where it doesn't matter to me anymore for my unit tests.

@cheenamalhotra
Copy link
Member

cheenamalhotra commented Mar 9, 2020

No problem @joshmouch

After further investigation with different cases and comparing behaviors will other SQL Client drivers I don't see this as a limitation from protocol. Some delay is possible in later connections due to Server limitation of resources, but the driver should be able to initiate connections in parallel.

I'll keep this issue open to investigate better ways of opening connections considering multi-threading scenarios like this at our end.

This is only applicable to Azure connections using federated authentication modes with MSAL.NET. Also in doubt is the MSAL.NET library which no other client drivers use currently.

@cheenamalhotra cheenamalhotra changed the title SqlConnection connection pooling doesn't seem to support multiple threads well Support parallel Physical Connection creation for multi-threaded scenarios Mar 9, 2020
@cheenamalhotra cheenamalhotra added the 📈 Performance Issues that are targeted to performance improvements. label Mar 9, 2020
@cheenamalhotra
Copy link
Member

cheenamalhotra commented Mar 11, 2020

Hi @joshmouch

The PR #466 fixes this issue for Azure Connections, please give the NuGet package attached under a try (contains PR changes) :

Microsoft.Data.SqlClient.2.0.0-dev.nupkg.zip

<PackageReference Include="Microsoft.Data.SqlClient" Version="2.0.0-dev"/>

We will consider shipping this change in next release.

@cheenamalhotra cheenamalhotra added the 🐛 Bug! Issues that are bugs in the drivers we maintain. label Mar 11, 2020
@OQO
Copy link

OQO commented Jun 11, 2020

@cheenamalhotra is the PR #466 fix already in Nuget package 2.0.0-preview4.20142.4 from 22 May 2020?

I am still encountering sequential instead of parallel connections to Azure SQL while using this package similar to what you show in log.txt. Please let me know so I know if I should open a new issue with a clean repro.

@cheenamalhotra
Copy link
Member

@OQO
Yes, please open a new issue with repro if possible, that way we can verify faster.

@saurabh500
Copy link
Contributor

saurabh500 commented Jun 12, 2020

@cheenamalhotra @OQO FWIW this serial opening of concurrent connections for the same pool, even in case of multithreaded scenario is intentional and by design of the Connection Pool.

From the past owners of SqlClient I understand that the idea was to not overwhelm a SQL server because of a flood of connections from the single Connection Pool which needs many connections as well as many clients trying to connect to a single SQL server when failover is in progress.
The recommendation is to use MinPoolSize in Connection String for connection pool and allow the pool to warm up to the MinPoolSize after the first connection attempt is made. This is something that can be tried. After that the connection pool will try to maintain MinPoolSize count of connections in the pool.

As of whether the behavior should be changed or not, it may be OK to relax the requirement for Azure SQL DBs, but it may be good to keep this behavior for on-prem SQL servers.
There are scenarios where a server fails over and multiple clients with connection pools are trying to login to the same server. In those cases, we may not want the server overwhelmed by a slew of connections coming in.

@OQO
Copy link

OQO commented Jun 13, 2020

@cheenamalhotra Thank you. After I asked, I guessed the same so I already did. Please see #601

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🐛 Bug! Issues that are bugs in the drivers we maintain. 📈 Performance Issues that are targeted to performance improvements.
Projects
None yet
6 participants