-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Introduce DbDataSource abstraction to System.Data #64812
Comments
Tagging subscribers to this area: @roji, @ajcvickers Issue DetailsSummaryThis proposes introducing a new ADO.NET abstraction called DbDataSource, which represents a data source from which (1) connections can be retrieved, and against which (2) commands can be executed directly. tl;dr: // Provider-specific data source construction:
var dataSourceBuilder = new NpgsqlDataSourceBuilder
{
ConnectionString = "<CONNECTION_STRING>",
// Set authentication callbacks, auth tokens, ILoggerFactory, and any other options
};
using var dataSource = dataSourceBuilder.Build();
// Use the data source as a connection factory:
await using (var connection = await dataSource.OpenConnectionAsync())
{
// Use the connection as usual
}
// Provider-agnostic data source construction:
using var dataSource = dbProviderFactory.CreateDataSource("<CONNECTION_STRING>");
// Get a command that executes directly against the data source, without explicitly referring to the connection
await using (var command = dataSource.CreateCommand("SELECT * FROM Blogs"))
await using (var reader = command.ExecuteReaderAsync())
{
// Process results as usual
} Summary of benefits
Connection instantiation and dependency injectionIn ADO.NET, DbConnection implementations are typically instantiated directly, passing the connection string to the constructor: using var connection = new NpgsqlConnection("<CONNECTION_STRING>"); This requires knowing the connection string at every site where a connection is instantiated. In addition, further authentication details are frequently needed beyond the connection string (SSL callbacks, auth tokens...), which also need to be set on the connection before it's opened. In practice, this means that user applications typically need to create some sort of factory for creating connections, to centralize this information. Using ADO.NET with dependency injectionFor example, when using ADO.NET (or Dapper) with DI, it's possible to configure a scoped connection as follows: builder.Services.AddScoped(_ => new NpgsqlConnection(
builder.Configuration.GetConnectionString("BloggingContext"))); However, if multiple connections are needed for any reason, then this becomes more complicated, and some sort of factory is necessary. Rather than forcing every application to build this, DbDataSource would allow the following: builder.Services.AddSingleton(sp => NpgsqlDataSource.Create(
builder.Configuration.GetConnectionString("BloggingContext"))); Provider-specific sugar APIs could make this nicer: builder.Services.AddNpgsqlDataSource(
builder.Configuration.GetConnectionString("BloggingContext"), o => o
.UseEncryption(...)
.SomeOtherConfig(...)); Applications can then get injected with the DbDataSource, and open connections through it. Alternatively, they can configure a scoped connection from the DbDataSource to be injected directly. API point for connection configurationThe fact that connections are currently directly instantiated makes it difficult to add any sort of configuration APIs; the main way to tweak ADO.NET behavior is currently via connection strings, but that's not appropriate in many cases. For example, let's suppose an ADO.NET provider wishes to add support Microsoft.Extensions.Logging. It's not possible to pass ILoggerFactory via the connection string, and while it's possible to add a DbConnection constructor overload which accepts ILoggerFactory, that would mean re-instantiating all the required loggers each time a connection is created - which isn't acceptable for perf reasons. With DbDataSource, the logging factory could be assigned directly on the provider's DbDataSource implementation, and all connection retrieved from that DbDataSource would use it. In a similar vein, Npgsql has an extensible plugin system for modifying type mappings. This allows users to read PostgreSQL date/time values as NodaTime types, instead of as built-in .NET types such as DateTime. Similarly, database spatial types can be read as types from the NetTopologySuite library. At this point, Npgsql plugins can be enabled either globally (affecting all connections in the program), or for a specific NpgsqlConnection instance (generally inappropriate for perf reasons). It would be more appropriate to enable such plugins at the DbDataSource level, allowing multiple type mapping schemes to exist in the same applications without any perf overhead, etc. Other possible features enabled by DbDataSource could be centrally specifying SQL statements to be prepared on all physical connections, SQL to execute on every physical connection when it's first opened (e.g. to set some state variable), etc. It's suggested that ADO.NET drivers expose builders - e.g. NpgsqlDataSourceBuilder - which can configure all provider-specific aspects of the data source. Once a data source has been built, it should generally be immutable with respect to configuration options since connections may already have been handed out. An exception to this would probably be the rotation of authentication tokens. DbProviderFactoryADO.NET already contains an abstraction called using var connection = dbProviderFactory.CreateConnection(); The resulting DbConnection has no connection string or any other information on it; DbProviderFactory is nothing more than an abstraction for instantiating the provider-specific DbConnection implementation. It is generally a singleton type which represents the driver, as opposed to a database, a connection pool or similar. We could theoretically retrofit DbProviderFactory with the above new APIs instead of introducing a new DbDataSource type. However, mixing the two concerns could be confusing, aswe'd then have two types of DbProviderFactory instances:
As a result, I'm proposing to keep DbProviderFactory and DbDataSource separate, and to add a new CreateDataSource method on the DbProviderFactory abstraction, to allow DbDataSource to be created and used in database-agnostic code. Executing commands directly on DbDataSourceBeyond being just a factory for connections, we can allow executing commands directly on DbDataSource: await using var command = dataSource.CreateCommand("SELECT ...");
command.Parameters.Add(...);
var result = await command.ExecuteScalarAsync(); The main advantage here is API usability: in many (most?) scenarios, there's no need for users to concern themselves with connections - they simply want to execute a query against a database. Any scenarios which involve connection state (e.g. transaction) do require explicitly handling connections, but in the common case where no connection state is involved, we can simplify things. In typical DI scenarios, it's expected that a DbDataSource get injected, and that most commands would be executed directly against it. Note that this isn't a Dapper-like improvement of ADO.NET: users still deal with the standard DbCommand API, add parameters and read results as usual. In fact, Dapper could add extension methods for better usability over DbDataSource just as it does over DbConnection. A second advantage is to open up execution models which bypass the concept of a connection - this is what Npgsql does with multiplexing. In this model, commands are provided to the driver for execution on any arbitrary connection; the driver is free to coalesce unrelated commands together for better efficiency, and to continue sending commands on connections even when they're already busy, allowing better utilization of pooled connections. Executing a command directly on a DbDataSource is the right API for this kind of execution model. To make this available on all ADO.NET providers without requiring changes, the default implementation of DbDataSource.CreateCommand would return a DbCommand wrapper over the native DbCommand returned by the driver's DbProviderFactory.CreateCommand. DbCommand.ExecuteReader (and other execution methods) would be overridden to open the connection (retrieved from DbDataSource.GetConnection) before execution, and to close it afterwards (e.g. with CommandBehavior.CloseConnection). Since the connection is owned by the command in this scenario, accessing DbCommand.Connection should raise an exception. Additional notes
API proposalpublic abstract class DbDataSource : IDisposable, IAsyncDisposable
{
public abstract string ConnectionString { get; } // TODO: Nullable?
protected abstract DbConnection GetDbConnection();
protected virtual DbConnection OpenDbConnection()
{
var connection = GetDbConnection();
connection.Open();
return connection;
}
protected virtual async ValueTask<DbConnection> OpenDbConnectionAsync(CancellationToken cancellationToken = default)
{
var connection = GetDbConnection();
await connection.OpenAsync(cancellationToken);
return connection;
}
protected virtual DbCommand CreateDbCommand()
{
var connection = GetDbConnection();
var command = connection.CreateCommand();
// Return a DbCommand wrapper over connection and command, which takes care of opening/closing the connection around executions.
}
// protected virtual DbCommand CreateDbBatch() ...
public DbConnection GetConnection() => GetDbConnection();
public DbConnection OpenConnection() => OpenDbConnection();
public ValueTask<DbConnection> OpenConnectionAsync() => OpenDbConnectionAsync();
public abstract void Dispose();
public abstract ValueTask DisposeAsync();
} For provider-agnostic code, we'd add the following on System.Data.Common.DbProviderFactory: public class DbProviderFactory
{
public virtual DbDataSource CreateDataSource(string? connectionString) => new DefaultDataSource(this, connectionString);
} To make this work on existing providers, the default implementation of CreateDataSource would return an internal "shim" implementation implemented which simply gets connections from the DbProviderFactory that created it: internal sealed class DefaultDataSource : DbDataSource
{
private readonly DbProviderFactory _dbProviderFactory;
private readonly string _connectionString;
internal DefaultDataSource(DbProviderFactory dbProviderFactory, string connectionString)
=> (_dbProviderFactory, _connectionString) = (dbProviderFactory, connectionString);
public override string ConnectionString => _connectionString;
protected override DbConnection GetDbConnection()
{
var connection = _dbProviderFactory.CreateConnection();
if (connection is null)
{
throw new InvalidOperationException("DbProviderFactory returned a null connection");
}
connection.ConnectionString = _connectionString;
return connection;
}
// TODO: Command execution
} Edit history
|
This would be great. I am getting tired of implementing factories for producing connections... though I often need to create a connection that varies based on something variable (known at runtime) other than the actual connection string. Almost something like a named data source that could be pre-registered and requested ala httpclient. To explain, we have a single application, but multiple tenant databases. My factories often are built around named retrieval and the factory is able to build a connection using the correct connection string for the customer I'm working with. It would be great if this could somehow be supported natively, especially with DI. |
@pinkfloydx33 thanks for the feedback. As a connection factory abstraction, a single DbDataSource isn't meant to handle multiple databases - that requires you to specify which database (or tenant) you want when you request a connection (and the connection objects themselves can't be pooled together if they're connected to different databases). Of course, you can have a tenant manager service in front of multiple DbDataSource instances, each corresponding to a single connection string (and your tenant manager service would be responsible for instantiating a DbDataSource when a connection to a new tenant is requested). In other words, I'd expect multi-tenant multi-database applications to have a DbDataSource per tenant. |
I like it; I think a lot of existing code has to build their own concept of a factory for DI, even if it's as simple as
I'm not yet convinced that this is a good API. For symmetry with other Secondly, the fact that a connection string is the only way to customise the Finally, it's hard to imagine a scenario where
This should be non-null. Even if a driver can be used with all defaulted values in the connection string, that could be expressed as
Connection pools are usually implicit by connection string (“This typically involves an internal dictionary lookup on the connection string”). If
This actually seems a little confusing to me; coming to the API with fresh eyes, I would expect the Should getting back to the |
If DbDataSource is immutable it's really just a type that encapsulates the two properties on a connection string config setting, the value and the provider type, isn't it? I think I've mentioned before that I'm not a fan of passing connection strings around because they're missing the really important information about the provider. It also encourages people to manipulate them as strings when i think using the ConnectionStringBuilder class is a better idea. Is it worth considering getting a correctly types connection string builder from the DbDataSource? It might be valuable to provide extensions to Microsoft.Extensions configuration readers so that DbDataSources can be created by users directly from config files rather than making them resolve the types themselves. So provide a |
On that abstract base class, perhaps yes. But it's highly likely that derived types will add properties for LoggerFactory and callbacks for client certificates or rotating passwords (e.g., Amazon IAM).
FWIW, |
Am I right in that the main purpose of this class is solely dependency injection support? As to me every data access library under the sun has implemented code which supports what this class covers, so not sure what core problems it is solving, besides DI related ones? |
A simple implementation (for discussion purposes) is in MySqlConnector: |
Hey @FransBouma,
Well, not really :) I tried to make the other arguments above, but here's an attempt at a much tighter summary:
So this isn't something that will directly help everyone, and end users using EF Core/LLBLGen Pro probably won't care, though it could still be important for the interaction between the data layer and the ADO provider (e.g. how the EF Core layer manages type mapper plugins in the ADO layer). Hope that helps explain the goals here... |
That's a good question. I think that symmetry with other DbProviderFactory methods (i.e. parameterless methods) isn't necessarily very important, whereas I agree the immutability of DbDataSource (at least with respect to the connection string) is critical (otherwise what happens when the connection string is modified after connections have been handed out etc.). It so happens that existing ADO.NET abstractions (connection, command) are mutable, whereas we want the one we're introducing to be immutable. I think a DbDataSourceBuilder abstraction would have little value.. The only connection configuration detail that's cross-database is the connection string; once you want to touch anything else you have to code against provider-specific API anyway - and at that point there's no real reason to use DbProviderFactory (just new up your provider's DbDataSource builder, or whatever mechanism they have for creating a DbDataSource). In other words, if we had DbProviderFactory.CreateDataSourceBuilder, the only thing you can do with the builder it returns is to set the connection string, and then get the DbDataSource; may as well just provide the connection string directly on the DbProviderFactory method, no?
Sure, though I think this a somewhat inherent limitation of wanting to write database-agnostic code, and at the same time tweak provider-specific settings. Doing this via the connection string may work well enough for simple settings (configure an int/bool thing), but becomes problematic for more complex stuff (register SSL certificate validation code, set an ILoggerFactory...). Do you have other ideas here?
Well, first, the DbDataSource could still be e.g. registered once in DI, whereas if you're creating connections and setting connection strings on them every time, you have multiple parts of your program needing to reference the connection string etc. Second, I agree that for programs which don't ever want to reference any provider-specific code, this would be of somewhat limited use; those programs would generally not be able to configure SSL callbacks or other complex provider-specific things. However, there's also another model: programs which are OK with referencing provider features for setting up the DbDataSource (via a provider-specific builder), but once setup is complete, the DbDataSource is there and can be consumed via database-agnostic code. So I see the value here less in the setup (which has to be provider-specific the moment you want to do something complex), and more in the consumption of a DbDataSource that has already been set up. Re .NET 7.0, are you saying that this wouldn't be useful since programs would still need code that works on older versions of .NET? If so, is that objection specific to this proposal somehow? I mean, it can be said against almost any new feature in .NET 7.0 that it wouldn't be useful since programs need to target older .NET versions... Which I'm not sure is a strong argument, since end programs can typically target the latest .NET 7.0 if they want to. Maybe I've misunderstood you here though.
Agreed, that's also in line with how the property is null-annotated on DbConnection (for reading).
That's an important point indeed, thanks. As you wrote, if I instantiate two DbDataSources via new, typically that would mean two separate connection pools even if they have the same connection string (this could be an advantage in some cases). And this is indeed not the case with DbDataSources acquired from the shim, since in that case there's only one pool for a given connection string. I'll add a note for that above.
I agree - it's unfortunate. I almost proposed a different name to this feature because of this (e.g. DbDatabase, DbConnectionSource), but the names I came up with were inferior in various ways, and doing that just because of this property seemed like too much.
I think that the cloning concern should probably be handled separately, i.e. without necessarily tying it to DbDataSource (something like Clone is definitely an option). BTW another need is to clone a connection while changing a setting; in the Npgsql EF Core provider, for database health checking I clone an ADO connection but without pooling, to ensure that Open actually creates a new physical connection etc. So I have a provider-specific thing for that, but I'm not aware of anyone asking for this in general... Apart from that, I can see how getting back to the DbDataSource could be useful (though not sure of concrete scenarios yet). But it would also not be possible to shim that without wrapping the connection, which I'd rather avoid unless there's a good reason to do so. But am open to ideas to the contrary! |
Can you give a concrete example scenario for this? Since DbDataSource needs to be immutable, I assume you're describing a case where some application code needs to know whether e.g. the connection string for some DbDataSource contains some option or not? For that specific scenario (which seems quite rare, at least I haven't seen requests for it)... In theory I could see the use of exposing a DbConnectionStringBuilder on DbDataSource. However, this would have to be immutable too, since changing that DbConnectionStringBuilder cannot impact the DbDataSource; and DbConnectionStringBuilder is a completely mutable class. Note also that DbDataSource doesn't make this better or worse - people already have to do deal with this today, where a connection string is externally-specified. If they need to, they can always parse the connection string themselves (by constructing a DbConnectionStringBuilder), and then check the setting etc. Hopefully I've understood your question...
As @bgrainger wrote, the idea is also for providers to have their own DbDataSource implementations which support more than just connection strings (e.g. SSL callbacks), and it would be impossible to specify those in config (since .NET code is involved). But if someone does want to simply instantiate a DbDataSource out of a simple connection string, it should be very trivial to do that via the proposed DbProviderFactory.CreateDataSource method. |
That was indeed a weak argument; please ignore it. 😀 |
I like the idea of having an object that encapsulates all database specific (in contrast to provider/driver specific) settings and thereby functioning as a factory for a concrete database/configuration pair. While this provides little additional benefit to users of providers that are already able to serialize/deserialize all their settings to/from a connection string, I do see the upside for providers that have extended configuration options that cannot be easily represented in a connection string. Implicitly using this type as the boundary for connection pools is neat. Being able to execute some simple SQL statements without much setup (and multiple The docs/intellisense would need to be very specific about whether an object returned from an It makes sense that using this abstraction should result in a slight performance boost. While this might not be significant for 98% of the userbase, it could be for high performance apps (and of course benchmarks, but benchmarks alone are probably not a viable argument to introduce another abstraction). @roji Are there any preliminary benchmark results? BTW, is this kind of encapsulation typical in other stacks/frameworks/languages and if so what type names do they use? |
Thanks for the feedback @lauxjpn.
Agreed - and any disposable object returned from these any of this methods should be disposed as usual.
Sure. What do you see as still lacking here, any suggestions?
Not yet, and I agree the perf improvement is unlikely to be very significant. I see the perf improvement more as a nice side-benefit than a primary reason for doing this.
A big inspiration for this proposal came from the JDBC DataSource type, which is similar both in design and in naming. JDBC drivers are expected to implement their own DataSource, which doesn't do pooling; database-agnostic pooling packages (e.g. HikariCP, c3p0...) provide their own DataSource implementation overlayed upon the driver's, adding pooling (see notes above about how we could go in this direction, though I'm not sure we should). But JDBC's DataSource is purely a connection factory (API docs). Various other languages/drivers also have a way to just "execute SQL" on a database without getting a connection first, unlocking stateless execution models as I described above. If you're interested I can dig around and show an example or two. |
@roji I think the remaining issue is with the existing command/data reader model. It feels just too clunky for simple data access in 2022. And whenever someone is supposed to use a Something like the following would be nice: // returns something like IEnumerable<IRow>
var horsies = dataSource.ExecuteSql($"SELECT * FROM Horsies WHERE IWouldMuchRatherBeAUnicorn = {someCondition}");
foreach (var horsie in horsies)
{
var horsieName = horsie.GetValue<string>("Name");
// ...
} Some stuff would need to be figured out in regards to automatic resource disposal. For example, query associated resources could be disposed when the parent data source gets disposed or the next Or maybe it would be simpler to manage the database resources with the (I have not thought through all the implications here.)
I think it is long overdue (unless it already exists and I missed it) to introduce a general .NET runtime attribute that can be used to annotate a method with the intended action a user is supposed to take, once the app is done with the returned (Or maybe simpler: the attribute marks those methods that return objects that should usually not be disposed by the caller.) The IDE's IntelliSense could then help generating/fixing the caller code (e.g. adding/suggesting a Because basically every time I don't new-up an |
Ah, I see - this is the kind of proposal pushes ADO.NET more in the direction of e.g. Dapper. I don't think that's a bad idea, but I specifically tried to avoid going there, at least in this particular proposal. In other words, we can always suggest something like the above, both on DbConnection and on DbDataSource - I see this proposal as largely orthogonal to that. Specifically to your proposal, yeah, there's lots to figure out - like how to generate parameter placeholders (i.e. #25022). Most importantly, for the above to be safe against SQL injection, it would have to use FormattableString; we've had APIs like that in EF Core and they've been problematic at least in some aspects. But I'd rather we kept that specific conversation separate, just so that we can get the above through without having to deal with the additional complexity.
Isn't that what the IDisposable/IAsyncDisposable interfaces already do? I get your point that when you're not newing up yourself, there may be an ambiguity as to if the API handing you the IDisposable already disposes, but at least in this case it's a relatively straightforward factory API, so the fact that it's the user's responsibility to dispose seems pretty clear (though I agree it should be documented). |
I agree. And I think your proposal takes a first step into that general direction. Other functionality can always be added later. As for my example code, it is really just a blend of a couple of things that would work nicely together. Not needing to use explicit parameters would be nice, but even just being able to execute some SQL without that tedious I think that access to a general API for simple query scenarios would line-up nicely with the recent push for simpler code (like the new console app templates etc.) à la Python. But that is not really part of this proposal and could be handled/discussed in a different one, once this one made it through.
Yeah, my comment is really only about the ambiguity when it comes to who should dispose a disposable object. The A random ambiguity examples is RelationalDatabaseFacadeExtensions.GetDbConnection(DatabaseFacade) in EF Core. Though nowadays, it is properly documented that you usually should not dispose the returned object yourself. Anyway, this ambiguity issue and a related attribute as a possible solution to it would need to be handled in a different proposal as well and is not specific to |
I completely agree. Adding usability improvements to ADO.NET has been a long-standing discussion (not everyone agrees 😉), but I can indeed see a path forward where the raw database access API is something that's nice and simple (which it is definitely not today).
👍 |
FYI will wait on this a little longer in case people have any further feedback, and then I'll flag this as ready-for-review (and flesh out the precise API and a possible implementation for the shim). |
Interesting. In my TDS Implement I also skipped the command and execute a Sql string directly from the connection. I think mapping to Poco or some kind of datareader /writer is possible. When reading the columns from the TDS stream a user defined expression / function can be called to the mapping to poco or datareader. I did a hardcode poco implementation but that not necessary. A row and Columnreader is implemented here and here |
A columnreader give you also the possibility to a conversion / streaming before the complete row is read. A big jsoncolomn can parsed before creating a big string. Also images etc can be streamed to the client before reading a complete row |
This proposal isn't about skipping the command - it's about skipping the connection. You're still executing a DbCommand, you'd just be doing it directly against a DbDataSource instead of against a DbCommand. As such, you'd still be using DbDataReader to consume the results - mapping to POCO is definitely out of scope here (and probably not a great fit for a low-level database API like ADO.NET).
Streaming column values without reading the complete row is already possible in ADO.NET via CommandBehavior.SequentialAccess. This allows you to get a readable Stream or TextReader for a big column, and so the entire row never needs to be buffered in memory etc. |
public abstract class DbDataSource : IDisposable, IAsyncDisposable
{
public DbConnection GetConnection();
public DbConnection OpenConnection();
public ValueTask<DbConnection> OpenConnectionAsync();
public DbCommand CreateCommand(string? commandText = null);
public DbBatch CreateBatch();
public virtual void Dispose();
public virtual ValueTask DisposeAsync();
public abstract string ConnectionString { get; }
protected abstract DbConnection GetDbConnection();
protected virtual DbConnection OpenDbConnection();
protected virtual async ValueTask<DbConnection> OpenDbConnectionAsync(CancellationToken cancellationToken = default);
protected virtual DbCommand CreateDbCommand(string? commandText = null);
protected virtual DbBatch CreateDbBatch();
} |
|
There aren't any. Unfortunately the ADO object model forces all providers to use a connection string as the serialized form of the connection specific settings. Even if a new api were added at this point to keep the data in another way there is a 20 year legacy of using strings so we need to provide a way to get the string from it.
It's one of the the primary pieces of information that this api is wrapping in a useful way. You can't add the connection string to DI as the string type. This type provides users (end users and code users) with two important pieces of information together, the provider factory information and the connection string that will be needed by that provider factory. |
It's an interesting question.. As @Wraith2 says, the connection string is already baked into the ADO.NET API - DbConnection has it, etc. So at least in that sense we're in line with the rest of the API. But a DbDataSource encompassing multiple databases (and therefore connection strings) is an important notion - there could definitely be a layering DbDataSource that accepts other, primitive DbDataSources. It's true that there wouldn't be a meaningful connection string to expose, but it's not the end of the world either - an empty string doesn't seem too problematic for this case. I can't work up a very strong argument either way - exposing the connection string seems like it could be useful in some cases (like debugging), and for the few cases where it isn't relevant, it can be left unset... Any additional thoughts?
Note that as of now, the API doesn't expose a DbProviderFactory (but is itself a factory for connections). The idea is indeed for DbDataSource to abstract a database (which typically means connection string + driver), but not necessarily to provide users access to those details (as opposed to opening a connection). Also, @GSPP's comment about a layered DbDataSource holds true for DbProviderFactory as well - you could imagine a data source that hands out connections to different databases using different providers (though I admit that's a bit far-fetched). |
Everyone, a PR implementing DbDataSource is out: #70006. I'd appreciate a thorough review from anyone who's interested. With regards to the API review comments (#64812 (comment)):
Note: a WIP implementation in Npgsql can be found here. |
I read through the changes last night and had nothing to add or ask. Looks good to me. |
If the only difference between |
If you believe that's the right thing, I can accept that. From my own experience I'd say I have rarely (if ever) used the
Looking at the API it seems both FWIW, I think you still want to come up with a name for these kind of commands, not for the API itself, but I think this will be a very useful thing for the documentation in order to describe what these kind of commands can go. Having a concept also allows you to refer to them when throwing exceptions, e.g. "You cannot set a connection on connection-less commands". People can then search for "connection-less commands" to learn what these are and how they work. Otherwise you have to refer to them in a convoluted way, like "You cannot set a connection on commands created by DbDataSource.CreateCommand()", which might also be confusing when people get these command objects from a different place, e.g. |
@Wraith2 thanks for taking the time to review! @ithline collapsing OpenConnection and Createconnection into a single method is possible, however:
|
That's a good point - we can indeed make them just always return null when read (the getter currently also throws in the PR); this may unblock certain scenarios where code inspects these properties for some reason. For Connection specifically that may not be a good idea, since in the normal case a command with no connection isn't executable, but in this case it is; so code making standard assumptions would fail here, and a fail-fast exception from the getter may be better (for transaction it seems more OK). What do you think?
You're right - in my head I've been calling these "data source commands", though "connection-less commands" is probably better. Will keep this in mind for the docs. |
@roji my thought process was that If the public signature would be From provider implementation it is a bit more work, but it would result in a cleaner public API, mainly because |
The thing is that almost nobody needs to get an un-opened connection, but everyone would be "burdened" with understanding what the optional bool parameter is about (e.g. Intellisense). This is very different from the situation with DbCommand and commandText, since every command requires a commandText in order to work. I basically want to separate/hide away the unopen scenario because very few users need it.
I get that, though I personally don't feel that OpenConnection conveys that either way (it can be read as "open the connection" or "open a connection"); FWIW I felt the same way about GetConnection - after all, we have IEnumerable.GetEnumerator in .NET which returns a new instance. |
Summary
This proposes introducing a new ADO.NET abstraction called DbDataSource, which represents a data source from which (1) connections can be retrieved, and against which (2) commands can be executed directly. tl;dr:
Summary of benefits
Connection instantiation and dependency injection
In ADO.NET, DbConnection implementations are typically instantiated directly, passing the connection string to the constructor:
This requires knowing the connection string at every site where a connection is instantiated. In addition, further authentication details are frequently needed beyond the connection string (SSL callbacks, auth tokens...), which also need to be set on the connection before it's opened. In practice, this means that user applications typically need to create some sort of factory for creating connections, to centralize this information.
Using ADO.NET with dependency injection
For example, when using ADO.NET (or Dapper) with DI, it's possible to configure a scoped connection as follows:
However, if multiple connections are needed for any reason, then this becomes more complicated, and some sort of factory is necessary. Rather than forcing every application to build this, DbDataSource would allow the following:
Provider-specific sugar APIs could make this nicer:
Applications can then get injected with the DbDataSource, and open connections through it. Alternatively, they can configure a scoped connection from the DbDataSource to be injected directly.
API point for connection configuration
The fact that connections are currently directly instantiated makes it difficult to add any sort of configuration APIs; the main way to tweak ADO.NET behavior is currently via connection strings, but that's not appropriate in many cases.
For example, let's suppose an ADO.NET provider wishes to add support Microsoft.Extensions.Logging. It's not possible to pass ILoggerFactory via the connection string, and while it's possible to add a DbConnection constructor overload which accepts ILoggerFactory, that would mean re-instantiating all the required loggers each time a connection is created - which isn't acceptable for perf reasons. With DbDataSource, the logging factory could be assigned directly on the provider's DbDataSource implementation, and all connection retrieved from that DbDataSource would use it.
In a similar vein, Npgsql has an extensible plugin system for modifying type mappings. This allows users to read PostgreSQL date/time values as NodaTime types, instead of as built-in .NET types such as DateTime. Similarly, database spatial types can be read as types from the NetTopologySuite library. At this point, Npgsql plugins can be enabled either globally (affecting all connections in the program), or for a specific NpgsqlConnection instance (generally inappropriate for perf reasons). It would be more appropriate to enable such plugins at the DbDataSource level, allowing multiple type mapping schemes to exist in the same applications without any perf overhead, etc.
Other possible features enabled by DbDataSource could be centrally specifying SQL statements to be prepared on all physical connections, SQL to execute on every physical connection when it's first opened (e.g. to set some state variable), etc.
It's suggested that ADO.NET drivers expose builders - e.g. NpgsqlDataSourceBuilder - which can configure all provider-specific aspects of the data source. Once a data source has been built, it should generally be immutable with respect to configuration options since connections may already have been handed out. An exception to this would probably be the rotation of authentication tokens.
DbProviderFactory
ADO.NET already contains an abstraction called
DbProviderFactory
, which can be used to create connections and other ADO.NET objects. However, this abstraction only serves for writing database-agnostic code that isn't specific to a certain driver:The resulting DbConnection has no connection string or any other information on it; DbProviderFactory is nothing more than an abstraction for instantiating the provider-specific DbConnection implementation. It is generally a singleton type which represents the driver, as opposed to a database, a connection pool or similar.
We could theoretically retrofit DbProviderFactory with the above new APIs instead of introducing a new DbDataSource type. However, mixing the two concerns could be confusing, aswe'd then have two types of DbProviderFactory instances:
As a result, I'm proposing to keep DbProviderFactory and DbDataSource separate, and to add a new CreateDataSource method on the DbProviderFactory abstraction, to allow DbDataSource to be created and used in database-agnostic code.
Executing commands directly on DbDataSource
Beyond being just a factory for connections, we can allow executing commands directly on DbDataSource:
The main advantage here is API usability: in many (most?) scenarios, there's no need for users to concern themselves with connections - they simply want to execute a query against a database. Any scenarios which involve connection state (e.g. transaction) do require explicitly handling connections, but in the common case where no connection state is involved, we can simplify things. In typical DI scenarios, it's expected that a DbDataSource get injected, and that most commands would be executed directly against it.
Note that this isn't a Dapper-like improvement of ADO.NET: users still deal with the standard DbCommand API, add parameters and read results as usual. In fact, Dapper could add extension methods for better usability over DbDataSource just as it does over DbConnection.
A second advantage is to open up execution models which bypass the concept of a connection - this is what Npgsql does with multiplexing. In this model, commands are provided to the driver for execution on any arbitrary connection; the driver is free to coalesce unrelated commands together for better efficiency, and to continue sending commands on connections even when they're already busy, allowing better utilization of pooled connections. Executing a command directly on a DbDataSource is the right API for this kind of execution model.
To make this available on all ADO.NET providers without requiring changes, the default implementation of DbDataSource.CreateCommand would return a DbCommand wrapper over the native DbCommand returned by the driver's DbProviderFactory.CreateCommand. DbCommand.ExecuteReader (and other execution methods) would be overridden to open the connection (retrieved from DbDataSource.GetConnection) before execution, and to close it afterwards (e.g. with CommandBehavior.CloseConnection). Since the connection is owned by the command in this scenario, accessing DbCommand.Connection and Transaction should raise an exception.
Additional notes
ValueTask<T>
which isn't covariant. So we use the old ADO.NET "covariance" pattern (protected virtual plus public non-virtual that gets replaced in implementations).API proposal
For provider-agnostic code, we'd add the following on System.Data.Common.DbProviderFactory:
To make this work on existing providers, the default implementation of CreateDataSource would return an internal "shim" implementation implemented which simply gets connections from the DbProviderFactory that created it:
Edit history
The text was updated successfully, but these errors were encountered: