Better API for ADO.NET command batching #15375

roji · 2015-10-07T10:13:10Z

ADO.NET currently supports batching several statements in one command by packing them into a single string as follows:

cmd.CommandText =
        "UPDATE message SET text = $text1 WHERE id = 1;" +
        "UPDATE message SET text = $text2 WHERE id = 2";

While workable, it's a problematic API. The PostgreSQL protocol, for example, needs to send each single SQL statement as a separate protocol message, forcing Npgsql to split CommandText by semicolon at the client side, a pretty complex task (need to understand when the semicolon is quoted, etc.).

It would be much better to provide a list-based API, where the user provides, say, an array of strings. A specific provider can still choose to join by semicolon if needed, but this should be done at the provider level rather than by the user.

For backwards compatibility, the CommandText property would still support the legacy semicolon batching mode. A boolean LegacyBatchingMode connection string parameter, true by default, could be turned off to disable it (thereby removing the client-side SQL parsing in Npgsql, etc.).

kkurni · 2015-10-30T02:32:05Z

@roji Does PostgreSQL support SQL Batching ?

It is used for SQL Batching support.
We sent the whole SQL Statement to Server and sql server will understand that statement.
https://msdn.microsoft.com/en-us/library/vstudio/haa3afyz(v=vs.100).aspx

e.g "select 1; update *...; select 2;"
Sql Server will return 2 results set.
How this will work in PostgreSQL ?

roji · 2015-10-30T08:22:36Z

@kkurni, here are some details. PostgreSQL supports two network protocols: simple and extended (here are the detailed docs).

In the simple protocol you can send a single message with "select 1; update *...; select 2;", PostgresSQL will parse out the multiple statements and return 2 result sets. However, the simple protocol has several shortcomings: no binary data, no parameter support, no prepared statements. For these reasons, Npgsql 3.x switched to the extended protocol.

In the extended protocol you send a series of message - Parse, Describe, Bind, Execute, Sync. This allows you to do everything, but you can send only one SQL statement each time. You can still pack multiple messages into a single packet, achieving good network performance.

Npgsql 3.x does allow users to set DbCommand.CommandText to "select 1; update *...; select 2;". However, since it uses the extended protocol it is forced to parse the statements client-side, and send multiple sets of messages in a single packet (2 Parse messages each with a single statement, 2 Describe messages, and so on).

To summarize, PostgreSQL does support SQL batching, but requires you to submit the different SQL statements in separate protocol messages (if you use the extended protocol). Therefore, the ADO.NET batching API, which requires you to pack multiple statements into a single string with semicolon separators, is unsuitable...

Let me know if you need any more details!

kkurni · 2015-11-03T02:58:44Z

Thanks @roji for providing us with this detail.

This specification seems to be a provider specific.
As there are other providers which is fine to use CommandText e.g SqlClient.
Do you have any other provider which require this ?

If we end up implementing your request, this will be breaking changes for other provider as well.
This will be much simpler for specific provider to override this if they need to.
E.g

    public class MyCommand : DbCommand
    {
        public List<string> CommandList
        {
            get;
            set;
        }

        private string _commandText;
        public override string CommandText
        {
            get { return _commandText; }
            set
            {
                if (IsExtendedQuery(value))
                {
                    throw new InvalidOperationException("Use CommandList for extended Query");
                }
                _commandText = value;
            }
        }

        private bool IsExtendedQuery(string query)
        {
            //check if it is extended query
            return false; //or true
        }
    }

roji · 2015-11-03T05:50:43Z

@kkurni, I think I didn't explain myself clearly...

The PostgreSQL extended vs. simple protocol isn't really relevant here, it was just to give more context. The problem is that ADO.NET's current batching API assumes that provider drivers can simply send a single string with multiple SQL statements and semicolons, which the backend will take care of splitting and executing. This simply isn't the case for PostgreSQL (extended protocol).

The solution could simply be to add a string[] CommandTexts property to ADO.NET's DbCommand. The default setter implementation for this property would simply be to join the array on semi-colon and set the DbCommand's CommandText to it - this would allow people to use the new batching API with providers such as SqlClient, without a single change to those providers. In Npgsql's case, however, I would override this property and construct the PostgreSQL protocol messages directly from the array, without any sort of semicolon parsing or joining.

This would provide a structured, fully-flexible batching API while fully preserving all existing ADO.NET providers.

Please let me know if this is unclear in any way.

kkurni · 2015-11-03T07:54:10Z

Thanks @roji for clarifying this.

by adding string[] CommandTexts will add more complexity for other providers which doesn't need it.
e.g How will you manage CommandTexts and CommandText value ?
Which one will be the source of truth ?

cmd.CommandTexts = new string[] {"select 1;", "select 2;"};
cmd.CommandText = "select 3;";
Assert.Equals(String.Join(cmd.CommandTexts), cmd.CommandText); //this will be false.

We can potentially add a flag like you suggested above boolean LegacyBatchingMode, which will decide CommandText or CommandTexts as source of truth or throw exception if developer set the wrong setter.
But again this will add additional complexity for other providers which doesn't need this.

It will be simpler to add string[] CommandTexts only to provider which need this. E.g PostGreSQL.
Due to specific requirement of PostGreSQL, your custom provider can handle this separately without impacting on other existing provider.

roji · 2015-11-03T09:27:25Z

The addition of a CommandTexts property definitely adds a little of complexity (as always when something is added), but I'm not sure this complexity is added to other providers. If, as I proposed above, the default implementation of this property is to simply set and get the CommandText property, then things are extremely simple - there is only one CommandText value, and CommandTexts reads and writes to it). No need for any sort of legacy flag - the last set of either CommandText or CommandTexts will be the definitive one.

So again, unless I'm missing something this adds a useful, more structured batching API without complexifying existing providers in any way.

Of course I could add CommandTexts to Npgsql, but then I'm forcing people to write Npgsql-specific code in order to do batching. The point is that ADO.NET is supposed to be a provider-independent API, but its batching API models things in a way which isn't.

kkurni · 2015-11-03T19:28:15Z

How will CommandTexts read back the value from CommandText ?
Will it need Split CommandText to return them as a list ?
This will push back the complexity from split those CommandText in the first place.

Except that we only allow to SetCommandTexts without able to read them back.
e.g cmd.SetCommandText(string[])

roji · 2015-11-03T20:40:53Z

Providing a set-only function definitely could work, as you suggest.

roji · 2015-12-01T16:40:18Z

@YoungGah, please let me know if you have any questions on this. It's extremely problematic to require providers to fully parse SQL only because their database protocol doesn't allow sending semicolon-joined statements.

YoungGah · 2015-12-01T20:55:00Z

@roji, I understand the issue and I agree that the API is not optimal. In the past, we discussed providing a better API (similar to what you proposed) within the owning team, but due to complexity of error handling and other higher priority works, we never got to add the new API. We will evaluate and prioritize this request post V1, but the ask is clear and make sense. We will update on this request after we release V1.

roji · 2015-12-02T07:53:25Z

Great, thanks @YoungGah. This is obviously not high-priority - existing providers such as Npgsql already have some sort of answer to the problem. But it's an important long-term goal to be able to cut out SQL parsing.

roji · 2015-12-25T08:20:37Z

There's one more related issue with the ADO.NET batching API: managing parameters. Parameters are currently managed as a collection on the DbCommand - they are therefore "common" to all statements being sent in a batch; there's no association between a parameter and a specific statement in a batch.

Here's why this is a problem. In the PostgreSQL wire protocol, each statement (whether batched or not) comes with its own parameters - there's no way to send a single parameter and then to reference it from two different statements. This makes ADO.NET's "one parameter collection for all batched statements" approach totally incompatible with PostgreSQL. The current workaround is, again, to parse the statement SQLs, identify parameter placeholders, and send each statement in the batch with its own set of parameters.

If and when you guys get around to working on this, I'd recommend you review how the JDBC API works, it may provide some answers.

NickCraver · 2016-01-22T03:00:35Z

@roji I'm curious here: for what range do you consider this API useful? By that I mean, what's the reasonable upper-bound before we're into bulk insert territory anyway? I'm thinking about how Dapper would support these scenarios across providers as well and just trying to gain an idea of expected use.

roji · 2016-01-23T11:14:55Z

That's a very good question - there's definitely overlap between bulk insert and command batching. One key thing about batching is the ability to do anything, not just insert - I can batch DELETEs, UPDATEs, or even stored procedure calls. On the other hand, bulk insert is supposed to provide a way of inserting even more optimized than batching INSERT commands.

My main goal here is to make sure that ADO.NET's batching capabilities are more compatible with databases other than SqlServer. After all, a batching "API" already exists in ADO.NET (if you can call shoving a semicolon in the text an API); but while it works well for SqlServer, it's not well-suited for PostgreSQL, for example.

Does that make sense?

roji · 2016-04-24T07:27:46Z

Note: have opened npgsql/npgsql#1042 to handle this specifically in Npgsql, in case you guys don't get around to working on it.

roji · 2016-07-18T00:08:10Z

FYI, I'm working on implementing this for PostgreSQL, i.e. a structured statement API within DbCommand which doesn't depend on semicolon-delimited statements packed inside CommandText. The API looks like this:

using (var cmd = new NpgsqlCommand
{
    Connection = conn,
    Statements =
    {
        new NpgsqlStatement
        {
            SQL = "SELECT $1",
            InputParameters = { new NpgsqlParameter { Value = 8 } }
        },
        new NpgsqlStatement
        {
            SQL = "SELECT $1",
            InputParameters = { new NpgsqlParameter { Value = 9 } }
        }
    }
})

Note how each NpgsqlStatement has its own parameter list, this again corresponds to how the PostgreSQL protocol actually represents statements (parameter per statement rather than a shared parameter list for all statements of a command).

To decide which mode the command uses, i.e. whether it looks at CommandText or at the structured Statements list, it's simply a matter of which API is used last. Setting CommandText sets a flag one way, and modifying the Statements list in any way sets it the other way.

Any comments would be welcome, I'd love to see something like this in standard ADO.NET one day.

fs7744 · 2016-07-31T23:51:49Z

in abo.net , we can use System.Data.SqlClient.SqlCommandSet for batch insert

although SqlCommandSet actually internal

    SqlCommandSet commandSet = new SqlCommandSet();
    commandSet.Connection = conn;
    commandSet.Transaction = trans;

    for (int i = 0; i < data.Length; i++)
    {
        SqlCommand cmd = conn.CreateCommand();
        cmd.CommandText = @"insert into product(name, category, discontinued, id) values(@p1, @p2, @p3, @p4)";
        cmd.CommandType = System.Data.CommandType.Text;
        cmd.Parameters.AddWithValue("@p1", data[i][0]);
        cmd.Parameters.AddWithValue("@p2", data[i][1]);
        cmd.Parameters.AddWithValue("@p3", 0);
        cmd.Parameters.AddWithValue("@p4", Guid.NewGuid());
        commandSet.Append(cmd);
    }
    totalEffected += commandSet.ExecuteNonQuery();

roji · 2016-08-01T00:04:53Z

@fs7744 thanks for commenting, this is the first time I hear of SqlCommandSet! This could indeed be a way forward, but this class seems to be internal? How are you using it?

roji · 2016-08-01T00:05:28Z

Oh sorry, I missed that you actually said that. Any idea what this class is used for?

fs7744 · 2016-08-01T00:16:43Z

in abo.net's design, SqlCommandSet is actually for SqlDataAdapter, so it's internal

someone like use SqlCommandSet when without DataAdapters

like http://stackoverflow.com/questions/402630/batching-in-ado-net-without-dataadapters

You cane see SqlDataAdapter how to use SqlCommandSet in SqlDataAdapter.cs

roji · 2016-08-01T00:23:57Z

Well, it's true that this class is an API for performing batched operations without CommandText concatentation. However, it's not clear what this contributes to the overall problem - the class isn't a public API...

@YoungGah, do you have any guidance here? I'd hate to go and implement my API in Npgsql just to change things again later. Does the team have any direction in mind, or at least an idea whether this will be tackled at some point?

ngbrown · 2016-08-21T21:14:55Z

NHibernate, for example, got around SqlCommandSet being internal by using reflection to expose it: /src/NHibernate/AdoNet/SqlClientSqlCommandSet.cs.

roji · 2016-08-24T10:22:04Z

I wonder if there's really a performance benefit associated with using SqlCommandSet as opposed to simply appending multiple SQL statements into a single command text (i.e. INSERT...; INSERT)...

roji · 2016-09-14T17:29:07Z

@YoungGah, it's been a long while since someone from Microsoft commented on this. Is it possible to get some visibility on whether the ADO.NET API is going to get any attention soon?

I'm specifically looking for some guidance on what to do here, and I'd like to avoid going down a path that wouldn't be compatible with ADO.NET later. Specifically, SqlClient appears to support batching via two separate APIs - a single command with semicolons in its CommandText, and the internal SqlCommandSet. As I said above, I have a branch with a an NpgsqlRawCommand, which itself can contain multiple NpgsqlStatements - but I'd be happy to have a serious conversation about where this is going before merging this kind of thing.

bgrainger · 2017-11-06T05:47:44Z

MySQL has a similar problem/opportunity.

There is a text protocol that allows multiple commands to be sent, and returns multiple result sets. However, it doesn't support parameter binding, prepared statements, and transmits all data types in a textual format. Then there is a binary protocol that supports parameter binding, prepared statements, and an efficient binary resultset; however, it only supports one command at a time.

As a result, in order to use the binary protocol, the client has to be restricted to only executing a single command at a time (which is inefficient in terms of network round-trips), or the connector library has to parse the SQL and split it into multiple commands (which runs into all the problems @roji has mentioned with SQL parsing, parameter management, etc.).

roji · 2017-11-06T08:46:50Z

@bgrainger that's interesting, the MySQL protocol situation seems to be exactly identical to PostgreSQL's protocol situation... originally there was only a simple protocol which support multiple statements in the same protocol message, but no parameters, prepared statements or binary data representation. Then they added the "extended protocol", which supports all the above features but does one-statement-per-message. Statements can still easily be batched by packing several messages into the same packet (which is what Npgsql does), but ADO.NET forces Npgsql to parse since it doesn't has a proper batching API and forces users to shove multiple statements into the same CommandText.

I really hope you guys at Microsoft start looking at ADO.NET soon, these issues have been around for quite a while and the API has stagnated.

bgribaudo · 2017-11-06T14:08:47Z

There's actually a MS SQL Server use case for this feature request, as well.

With SQL Server and ADO.Net, as has already been pointed out, it's currently possible to build a single command consisting of multiple SQL statements.

However, the same doesn't hold true for stored procedures: It's currently not possible to invoke multiple stored procedures using a single CommandType = CommandType.StoredProcedure command. The wire protocol used for client-to-SQL Server communications (Tabular Data Stream [TDS]) supports this scenario (allowing a single execute request to invoke multiple stored procedures) but ADO.Net/SqlCommand's API doesn't expose the functionality to take advantage of it.

Implementing this feature request could make this use case possible.

Keep in mind that, if a query uses parameters, SqlCommand converts the query into a stored procedure call behind-the-scenes. So, supporting this use case would affect a lot more than just direct user invocations of stored procedures--it would allow more efficient transmission of sets of queries that use parameters.

dario-l · 2017-12-11T13:24:21Z

For batching I'm using solution from NHibernate.

Unfortunately on .net core is not implemented. SqlCommandSet type doesn't exist.

PS
Compiled on netstandard2 and run over framework 4.6.1 works.

roji · 2017-12-11T16:01:10Z

@dario-l SqlCommandSet is an SqlClient-specific API, and is not even exposed publicly. The point here would be to add a standard API in ADO.NET, which may or may not end up resembling SqlCommandSet.

dario-l · 2017-12-12T14:05:13Z

@roji That's the point. I must use hacky way to do things performant. There should be the way to do this with official standard API.

If batching is not implemented in .net core yet then, maybe, it is a chance to do this properly even if this requires some changes/extensions in API.

roji · 2017-12-13T03:15:21Z

@dario-l, unless I'm mistaken you can do batching with the official standard API, as described above - you just pack your multiple SQL statements into a single string and set CommandText to it. So the issue is not that an API is lacking, it's that it's unstructured and therefore problematic.

dario-l · 2017-12-13T15:47:04Z

@roji Yes, I can but this leads to rebuild query by database server every time it will change and will be not cached. I've tested that scenario.

roji · 2017-12-13T17:07:11Z

What do you mean when you say that the query "will change" and have to be rebuilt? If you properly use parameters why would the query have to change?

dario-l · 2017-12-14T09:22:37Z

"UPDATE X SET Y = 0 WHERE ID = @id;UPDATE X SET Y = 1; WHERE ID = @id;" != "UPDATE X SET Y = 0 WHERE ID = @id;"

I have better performance through SqlCommandSet and batching.

roji · 2017-12-14T19:20:50Z

@dario-l that's interesting. I wonder if you have the same speed difference if the queries are not identical between them (i.e. two different updates).

In any case, we both agree and are asking for the same thing - a standard, structured API in ADO.NET for batching.

roji · 2018-12-09T12:06:50Z

Useful article on SqlClient batching options: https://blogs.msdn.microsoft.com/dataaccess/2005/05/19/does-ado-net-update-batching-really-do-something/ (copied from https://github.com/dotnet/corefx/issues/31070#issuecomment-405393102).

roji · 2018-12-09T12:14:48Z

It's important to note that batching via CommandText with semicolons also seems to have some pretty serious drawbacks even for SqlServer, as described here: https://blogs.msdn.microsoft.com/dataaccess/2005/05/19/does-ado-net-update-batching-really-do-something/ (also see dotnet/SqlClient#19).

An ADO.NET batching API would permit exposing SqlCommandSet in a portable way and boost performance there as well.

fubar-coder · 2019-01-30T11:28:45Z

The SQL Anywhere ADO.NET driver solved this problem quite elegantly:

Create a single DbCommand
Add n*m parameters (with n=number of parameters for a single insert and m=number of rows)
Execute

Example:

cmd.CommandText = "UPDATE message SET text = @text WHERE id = @id";
cmd.Parameters.Add(new SqlParameter("text", "text1");
cmd.Parameters.Add(new SqlParameter("id", 1);
cmd.Parameters.Add(new SqlParameter("text", "text2");
cmd.Parameters.Add(new SqlParameter("id", 2);

EDIT: Added example

roji · 2019-01-30T14:27:41Z

One problematic issue with this approach is that it doesn't allow you to batch different statements, only reexecute the same statement with different parameter sets (IIRC the JDBC batching API has the same limitation). One of the goals here is to allow batching of arbitrary statements together (different types of inserts, updates and even selects).

Another limitation is that it's difficult to imagine how information on a specific statement in the batch would be returned. For example, it's important to be able to return the records affected for each individual statement in the batch, but this doesn't seem possible without a more structured API that has an explicit idea of a batch and a contained statement.

Nevertheless the approach above is elegant within the confines of the current API.

roji · 2019-02-06T16:45:05Z

Superceded by dotnet/corefx#35135

joshfree assigned saurabh500 Oct 7, 2015

kkurni assigned kkurni and unassigned saurabh500 Oct 29, 2015

kkurni assigned YoungGah and unassigned kkurni Nov 4, 2015

saurabh500 unassigned YoungGah May 6, 2017

roji closed this as completed Feb 6, 2019

msftgits transferred this issue from dotnet/corefx Jan 31, 2020

msftgits added this to the 3.0 milestone Jan 31, 2020

This was referenced Jan 31, 2020

Retrieving records affected by statement #17445

Closed

New System.Data.Common batching API #28633

Closed

ghost locked as resolved and limited conversation to collaborators Jan 4, 2021

Better API for ADO.NET command batching #15375

Better API for ADO.NET command batching #15375

Comments

roji commented Oct 7, 2015

kkurni commented Oct 30, 2015

roji commented Oct 30, 2015 • edited Loading

kkurni commented Nov 3, 2015

roji commented Nov 3, 2015

kkurni commented Nov 3, 2015

roji commented Nov 3, 2015

kkurni commented Nov 3, 2015

roji commented Nov 3, 2015

roji commented Dec 1, 2015

YoungGah commented Dec 1, 2015

roji commented Dec 2, 2015

roji commented Dec 25, 2015

NickCraver commented Jan 22, 2016

roji commented Jan 23, 2016

roji commented Apr 24, 2016

roji commented Jul 18, 2016

fs7744 commented Jul 31, 2016

roji commented Aug 1, 2016

roji commented Aug 1, 2016

fs7744 commented Aug 1, 2016

roji commented Aug 1, 2016

ngbrown commented Aug 21, 2016

roji commented Aug 24, 2016

roji commented Sep 14, 2016

bgrainger commented Nov 6, 2017

roji commented Nov 6, 2017

bgribaudo commented Nov 6, 2017 • edited Loading

dario-l commented Dec 11, 2017

roji commented Dec 11, 2017

dario-l commented Dec 12, 2017

roji commented Dec 13, 2017

dario-l commented Dec 13, 2017

roji commented Dec 13, 2017

dario-l commented Dec 14, 2017

roji commented Dec 14, 2017

roji commented Dec 9, 2018

roji commented Dec 9, 2018

fubar-coder commented Jan 30, 2019 • edited Loading

roji commented Jan 30, 2019

roji commented Feb 6, 2019

roji commented Oct 30, 2015 •

edited

Loading

bgribaudo commented Nov 6, 2017 •

edited

Loading

fubar-coder commented Jan 30, 2019 •

edited

Loading