Consider a better design for ADO.NET connection pooling #24856

roji · 2018-01-31T19:59:40Z

In today's ADO.NET, connection pooling is purely an internal concern of ADO providers - no pooling API is exposed to the user in any way. It seems to be an unwritten contract that providers are supposed to pool by default, and disable pooling if the Pooling=false connection string parameter is used. The following is a (too lengthy) analysis of possible directions to get the conversation started.

Disadvantages of the current model

Since the pool isn't exposed, DbConnection.Open() must look up the pool internally, keyed on the connection string. A naïve implementation using a dictionary would add a dictionary string lookup for each pooled open, which is a significant overhead.
Pooling has to be reimplemented (efficiently!) by each provider.
The provider-implemented pool is forced upon the user - not really possible to have competing pool implementations.
Since pooling is internal to the provider, DbConnection is typically a lightweight façade to a more heavyweight internal "physical connection" class. The relationship and bookkeeping between these two objects adds complexity (and possibly some perf hit). The DbConnection façade is also newed-up and disposed, adding a needless allocation.

Option 1: An ADO.NET user-facing pool API

The connection string lookup imposed by the internal pooling mechanism could be mitigating by adding a simple, user-facing API. User code would look something like this:

var factory = DbProviderFactories.GetFactory(providerName);
var connFactory = factory.GetConnectionFactory(connectionString);
using (var conn = connFactory.GetConnection()) { ... }

Regarding the naming, instead of GetConnectionFactory():

We could have GetPool(), although seems to imply that all providers implement pooling, but some shouldn't (e.g. in-memory DBs or sqlite)
We could have GetDatabase(), although databases and pools aren't the same thing (e.g. you can have multiple pools for the same database0.
Maybe GetConnectionProvider()?
ConnectionFactory is abstract enough that it could be used for other purposes (e.g. a wrapper of other connection factories which returns connections that allow interception of certain events, or whatever).
JDBC calls it DataSource which isn't too bad IMHO.

Option 2: A complete pooling API (not just user facing)

The above only provides a user-facing abstraction which allows avoiding the lookup (and possibly opens up possibility for composing connection provider), but the pool itself is still implemented by each provider. We could go further and provide a more complete abstraction that allows writing connection pools. This would allow pooling implementations which aren't connected to any specific provider (and which could be used with any provider). Users would be able to select a 3rd-party pooling implementation which fits their specific application needs, rather than being locked into a single, provider-implemented pool.

This mainly has to do with defining standard public APIs for communication between the ADO provider and the pool in use. For example, when a connection is returned to the pool its state may need to be reset in a provider-specific, the DbConnection would have to expose that API. There may be other things as well.

If we go down this route, we could also optionally provide a provider-independent, highly efficient connection pool that could be used instead of the providers' ones.

Comparison with JDBC

JDBC has had competing pool implementations for a very long time, which can be used with any JDBC database provider. A nice inspiration is https://brettwooldridge.github.io/HikariCP/, which claims to be the highest-performing pool implementation. The JDBC API includes several abstractions for manaing pooling (ConnectionPoolDataSource, PooledConnection), although more research is needed to understand exactly how the pooling model operates there.

Things to keep in mind

Connections enlisted in a TransactionScope and which are closed before the TransactionScope is disposed: they cannot be reused until the transaction commits/rolls back, and need to be returned if the user opens another connection with the same database and transaction to avoid escalation to a distributed transaction. This would all have some impact on the pooling.
Including/omitting authentication information in the key. When using "integrated security" the username can be omitted, I know SqlServer has some provisions for including that information alongside the connection string in the key.
Ability to include some arbitrary data on the pool. For example, Npgsql caches the password-less connection string on the pool (Persist Security Info=false). There could be other things.

/cc @anpete @ajcvickers @divega @davidfowl

Edit by @roji 15/3: Added "things to keep in mind" with some new points.

The text was updated successfully, but these errors were encountered:

divega · 2018-02-08T02:15:58Z

System.Data Triage: This looks like a good idea, although we are not going to implement it right away. Re option 2, we should consider providing an implicit connection pool that implements know best practices, e.g. to avoid locking and yield good performance, so that ADO.NET providers can reuse it. However that would probably either require defining a public type for the underlying connection object or having the pool be general purpose and not an ADO.NET thing.

Moving to Future for now.

The PoolManager is responsible for looking up a connector pool given a connection string. The previous implementation used a ConcurrentDictionary, but 99% of all apps will have a single pool and the lookup is needlessly expensive. Reimplemented PoolManager as an append-only array of tuples where reading is a lock-free operation (but not adding). Note that this is needed only because ADO.NET doesn't have a proper user-facing API which exposes pooling to the user, otherwise the user could simply take a reference to the pool and avoid this (complex) lookup altogether. See: https://github.com/dotnet/corefx/issues/26714

ConnectorPool is now fully lock-free - benchmarks showed consierable contention on the lock in the previous implementation in highly parallel scenarios. Also reimplemented PoolManager. The previous implementation used a ConcurrentDictionary, but 99% of all apps will have a single pool and the lookup is needlessly expensive. It is now an append-only array of tuples where reading is a lock-free operation (but not adding). Note that this is needed only because ADO.NET doesn't have a proper user-facing API which exposes pooling to the user, otherwise the user could simply take a reference to the pool and avoid this (complex) lookup altogether. See: https://github.com/dotnet/corefx/issues/26714 Also corrected some inaccurate performance counter updating. Closes #1839

Grauenwolf · 2019-01-11T02:04:41Z

If we go down this route, we could also optionally provide a provider-independent, highly efficient connection pool that could be used instead of the providers' ones.

That sounds reasonable.

But I would like to point out a huge advantage of System.Data over JDBC is that we don't have to think about connection pooling. Things just work regardless of which database provider we use.

So compared to the other things on the list this is a very low priority item.

roji · 2019-01-12T17:58:04Z

@Grauenwolf there's one more option that's not really listed above - provide a generic pool implementation suited to database connections, but without exposing it to the user. Providers would then be able to use that implementation instead of their current internal implementations, without any impact on how the user interacts with connections or with the provider (it would be a purely an internal implementation detail). This would still provide many of the advantages listed above - a single, highly-tuned and tested reusable implementation.

But I would like to point out a huge advantage of System.Data over JDBC is that we don't have to think about connection pooling.

Having said that, I've worked with JDBC in the past, and I can't say I understand why not thinking about connection pooling can be considered a "huge advantage"... Dealing with the pool seems to be a very minor detail that's taken care of in 1-2 lines of code... Can you explain what's bothersome in the JDBC approach?

Things just work regardless of which database provider we use.

If you mean that you don't need to concern yourself with pooling, then that's true. However, precisely since every provider has its own pool implementation, things don't just work the same everywhere, as each provider has implementation differences (as well as tuning options and various knobs). Externalizing the pool would provide a consistent pooling experience regardless of provider.

bricelam · 2019-02-07T22:08:56Z

The internal System.Data.ProviderBase namespace contains common pooling logic shared by Odbc, OleDb & SqlClient. It might be worth taking a look at.

roji · 2019-02-07T23:28:10Z

Will do, thanks for the pointer @bricelam!

ghost · 2023-08-13T12:02:36Z

Due to lack of recent activity, this issue has been marked as a candidate for backlog cleanup. It will be closed if no further activity occurs within 14 more days. Any new comment (by anyone, not necessarily the author) will undo this process.

This process is part of our issue cleanup automation.

cincuranet · 2023-08-13T13:51:27Z

Pinging.

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the Future milestone Jan 31, 2020

ajcvickers mentioned this issue Jun 22, 2020

Consolidate the connection pool code at one location #24808

Closed

roji mentioned this issue May 3, 2021

SqlServer.Core: Performance-oriented SQL Server .NET driver dotnet/datalab#6

Closed

roji mentioned this issue Oct 13, 2021

Add Connecting event handler/callback mysql-net/MySqlConnector#1056

Closed

roji mentioned this issue Feb 4, 2022

Introduce DbDataSource abstraction to System.Data #64812

Closed

ghost added backlog-cleanup-candidate An inactive issue that has been marked for automated closure. no-recent-activity labels Aug 13, 2023

ghost removed no-recent-activity backlog-cleanup-candidate An inactive issue that has been marked for automated closure. labels Aug 13, 2023

adamsitnik mentioned this issue Dec 4, 2023

DB health checks: design Xabaril/AspNetCore.Diagnostics.HealthChecks#2113

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider a better design for ADO.NET connection pooling #24856

Consider a better design for ADO.NET connection pooling #24856

roji commented Jan 31, 2018

divega commented Feb 8, 2018 •

edited

Loading

Grauenwolf commented Jan 11, 2019

roji commented Jan 12, 2019

bricelam commented Feb 7, 2019 •

edited

Loading

roji commented Feb 7, 2019

ghost commented Aug 13, 2023

cincuranet commented Aug 13, 2023

Consider a better design for ADO.NET connection pooling #24856

Consider a better design for ADO.NET connection pooling #24856

Comments

roji commented Jan 31, 2018

Disadvantages of the current model

Option 1: An ADO.NET user-facing pool API

Option 2: A complete pooling API (not just user facing)

Comparison with JDBC

Things to keep in mind

divega commented Feb 8, 2018 • edited Loading

Grauenwolf commented Jan 11, 2019

roji commented Jan 12, 2019

bricelam commented Feb 7, 2019 • edited Loading

roji commented Feb 7, 2019

ghost commented Aug 13, 2023

cincuranet commented Aug 13, 2023

divega commented Feb 8, 2018 •

edited

Loading

bricelam commented Feb 7, 2019 •

edited

Loading