Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Batch execution of multiple ExecuteUpdate / ExecuteDelete / ExecuteSql #33143

Closed
clement911 opened this issue Feb 22, 2024 · 8 comments
Closed

Comments

@clement911
Copy link
Contributor

Problem:

We try to use EF as much as possible for all interactions with SQL server, using strongly typed queries and the change tracker. Because of all the benefits that this provides in terms of maintenance, parameterization, etc...

However, there are still many places in our codebase where we have to resort to building sql commands manually by concatenating strings and issuing commands manually. For virtually all of these cases, this is because we need to do bulk deletes and updates.

EF 7 introduced ExecuteUpdate and ExecuteDelete and we have been trying to use these instead of building sql commands manually.
Unfortunately we have run into some performance concerns.

It's very often the case that we need to issue multiple commands at a time. For example, if a root object is deleted, we issue several bulk delete commands to delete all its related children objects. As a (fictitious) example, if a customer is deleted, we might issue commands to delete all of their orders, all their fulfilments, etc... (in this particular example, we'd probably want to keep these objects but in our domain, we don't). As single root object deletion can often result in 5+ separate bulk delete commands to delete all related objects.
For performance reasons, we issue a single batch of SQL which contain all the commands.
Unfortunately, batching doesn't seem to be possible with ExecuteUpdate / ExecuteDelete / ExecuteSql, hence why we can't use them for now.

See https://learn.microsoft.com/en-us/ef/core/saving/execute-insert-update-delete#change-tracking

Users familiar with SaveChanges are used to performing multiple changes, and then calling SaveChanges to apply all these changes to the database; this is made possible by EF's change tracker, which accumulates - or tracks - these changes.

ExecuteUpdate and ExecuteDelete work quite differently: they take effect immediately, at the point in which they are invoked. This means that while a single ExecuteUpdate or ExecuteDelete operation can affect many rows, it isn't possible to accumulate multiple such operations and apply them at once, e.g. when calling SaveChanges. In fact, the functions are completely unaware of EF's change tracker, and have no interaction with it whatsoever. This has several important consequences.

As of today, the recommend solution is to create a transaction to wrap all commands. Something like this:

using (context.Database.BeginTransaction())
{
     await context.Orders.ExecuteDeleteAsync(/* some delete */);
     await context.Fulfillments.ExecuteDeleteAsync(/* another delete*/);
     await context.SaveChangesAsync();
}

The problem is that a separate roundtrip is necessary for each ExecuteDelete/ExecuteUpdate. For us, this matters a lot.
My feature request would be to be able to batch multiple ExecuteUpdate / ExecuteDelete / ExecuteSql into a single SQL batch to sql server.

Proposed solution:

The idea is to be able to create delete/update/sql statements, without executing them immediately.

Example:

var deleteOrdersStatement = context.Orders.CreateDeleteStatement(/* some delete */);
var deleteFulfillmentsStatement = context.Fulfillments.CreateDeleteStatement(/* some delete */);
var updateStatement = context.Transactions.CreateUpdateStatement(/*some update*/);
var sqlStatement = context.Database.CreateSqlStatement(/*arbitrary sql statement*/)
...

using (context.Database.BeginTransaction())
{
     //All statements are concatenated and issued with one command to the database
     await context.Database.ExecuteSqlBatchAsync(deleteOrdersStatement, deleteFulfillmentsStatement, updateStatement, sqlStatement);
     await context.SaveChangesAsync();
}

Side notes:

  • There might be other benefits to have a representation of a statement object.
    For example, right now we can't use ToQueryString() with ExecuteDelete/ExecuteUpdate. If we can create statements as above, maybe they could also support ToQueryString, which we find very useful for logging, debugging and other diagnostics.
  • Even with the proposed API above, 2 round trips are still required. One for the batch and one for the SaveChanges. It would even better if we could run both in a single call such as await context.RunStatementsThenSaveChangesAsync(statement1, statement2, ....).

I would love to hear your thoughts.

@roji
Copy link
Member

roji commented Feb 22, 2024

Duplicate of #10879

@roji roji marked this as a duplicate of #10879 Feb 22, 2024
@roji
Copy link
Member

roji commented Feb 22, 2024

Although #10879 discusses allowing batching queries, ExecuteUpdate/ExecuteDelete are implemented via the EF query pipeilne and would very likely be part of that issue.

As a (fictitious) example, if a customer is deleted, we might issue commands to delete all of their orders, all their fulfilments, etc...

I imagine your actual scenario is more complex, but this sounds like something that's better handled by cascade deletes in the database. I agree that batching updates/deletes is definitely useful and should be implemented, but it's also better to let the database take care of cascades whenever that's possible.

For example, right now we can't use ToQueryString() with ExecuteDelete/ExecuteUpdate. If we can create statements as above, maybe they could also support ToQueryString, which we find very useful for logging, debugging and other diagnostics.

The problem here is mainly an API issue - ToQueryString() currently works over IQueryable, and ExecuteUpdate/Delete don't expose one, but rather execute. The same problem exists with terminating querying operators such as Max/Sum.

Having a ToQueryString variant which accepts the query as an argument (i.e. context.Blogs.ToQueryString(b => b....ExecuteUpdate()) could be a way around this, if it's considered important enough.

@clement911
Copy link
Contributor Author

The problem here is mainly an API issue - ToQueryString() currently works over IQueryable, and ExecuteUpdate/Delete don't expose one, but rather execute. The same problem exists with terminating querying operators such as Max/Sum.

I understand. That's why I suggested an API to create a Statement object instead, so that the statement can be represented abstractly without being executed immediately.

As far as #10879, I think it's different because it is about issuing multiple queries with readers. This is complicated because multiple results sets need to be handled.

The batching AP I'm proposing above only processes non-reader statements which are executed via ExecuteNonQuery. As you know ExecuteUpdate / ExecuteDelete / ExecuteSql cannot return results and only return an int with the number of rows affected.

Because of this difference, I think it would be much easier to implement because there are no results sets to manage.

@ajcvickers
Copy link
Contributor

@roji Keep in mind that cascade deletes in SQL Server are quite limited because of the aggressive cycle checks. Hence we have plans to use set-based updates to implement cascade deletes, at least optionally. See #18960.

@roji
Copy link
Member

roji commented Feb 23, 2024

Good point @ajcvickers

@roji
Copy link
Member

roji commented Feb 23, 2024

As far as #10879, I think it's different because it is about issuing multiple queries with readers. This is complicated because multiple results sets need to be handled.

Not really - both queries and ExecuteUpdate/Delete are actually quite similar under the hood. In any case, the important thing here isn't the implementation details, but rather the user API we'd expose for batching; and from a user perspective, there's very little different between e.g. ExecuteDelete() and Max() as terminating operators - both return a single int, so the API for batching them can be the same. The fact that under the hood the former is invoked via DbCommand.ExecuteNonQuery() and the latter via ExecuteReader() (or possibly via ExecuteScalar()) is unimportant for the user-facing API.

Because of this difference, I think it would be much easier to implement because there are no results sets to manage.

I'd suggest spending some time looking at the implementation - that should make you a bit more familiar with the internals.

@clement911
Copy link
Contributor Author

I was checking out SqlBatch in Microsoft.Data.SqlClient.
That looks awesome @roji !
The ability to reuse the same parameter name for multiple commands will really help reuse of plans in sql server plan cache.

@roji
Copy link
Member

roji commented Feb 27, 2024

Happy you like it! Yeah, it's an important step forward - we'll need to start using that in EF at some point too.

In any case, I'll go ahead and close this issue as a duplicate of #10879, as discussed above.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants