Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement bulk deletion #27550

Open
Tracked by #28479
roji opened this issue Mar 2, 2022 · 5 comments
Open
Tracked by #28479

Implement bulk deletion #27550

roji opened this issue Mar 2, 2022 · 5 comments

Comments

@roji
Copy link
Member

roji commented Mar 2, 2022

The SQL Server provider implements bulk insertion, i.e. collapsing multiple insertions into a single multi-row MERGE (or INSERT) statement. We could do the same with deletion, so instead of multiple DELETE statements, we'd have:

DELETE FROM Foo WHERE Id1 IN (1, 2, 3) RETURNING 1;

Using an array parameter in PostgreSQL (to preserve same SQL/prepared statement):

DELETE FROM Foo WHERE Id1 = ANY (@ids) RETURNING 1;

For multiple conditions (composite keys and/or concurrency token):

DELETE FROM Foo WHERE (Id1, Id2) IN ((1, 2), (3, 4)) RETURNING 1;

Or for SQL Server, which does not support row values here:

DELETE FROM Foo OUTPUT 1 WHERE Id1 = 1 AND Id2 = 2 OR Id1=3 AND Id2=4;

For SQL Server specifically, MERGE can also be used (but no advantage apparently):

MERGE [foo] USING (
    VALUES (1), (2)
) AS i (id) ON ([foo].id = i.id)
WHEN MATCHED THEN DELETE
OUTPUT 1;

Benchmark with one condition (non-composite key, no concurrency token):

Method DatabaseType NumRows Mean Error StdDev Median Ratio RatioSD
Separate_deletes_with_transaction_via_api SqlServer 2 1,179.8 us 9.04 us 7.55 us 1,178.2 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 2 523.6 us 8.67 us 7.69 us 522.7 us 0.44 0.01
Separate_deletes SqlServer 2 523.2 us 9.71 us 9.08 us 524.2 us 0.44 0.01
Single_delete_with_in SqlServer 2 506.2 us 9.81 us 11.30 us 505.3 us 0.43 0.01
Single_delete_with_in_with_transaction_via_raw_sql SqlServer 2 512.5 us 8.40 us 7.85 us 510.9 us 0.43 0.00
Delete_with_merge SqlServer 2 553.6 us 10.30 us 10.57 us 555.1 us 0.47 0.01
Separate_deletes_with_transaction_via_api SqlServer 4 1,219.3 us 15.55 us 13.79 us 1,219.1 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 4 550.7 us 10.76 us 14.37 us 550.1 us 0.45 0.01
Separate_deletes SqlServer 4 577.8 us 7.62 us 7.13 us 578.7 us 0.47 0.01
Single_delete_with_in SqlServer 4 502.3 us 10.02 us 13.02 us 505.0 us 0.41 0.01
Single_delete_with_in_with_transaction_via_raw_sql SqlServer 4 518.2 us 6.43 us 6.01 us 520.4 us 0.42 0.01
Delete_with_merge SqlServer 4 564.8 us 10.46 us 10.27 us 562.6 us 0.46 0.01
Separate_deletes_with_transaction_via_api SqlServer 6 1,256.5 us 18.59 us 17.39 us 1,254.4 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 6 573.3 us 9.15 us 8.11 us 575.6 us 0.46 0.01
Separate_deletes SqlServer 6 617.8 us 11.85 us 11.64 us 619.5 us 0.49 0.01
Single_delete_with_in SqlServer 6 511.3 us 8.03 us 7.51 us 509.8 us 0.41 0.01
Single_delete_with_in_with_transaction_via_raw_sql SqlServer 6 530.3 us 10.46 us 12.45 us 529.4 us 0.42 0.01
Delete_with_merge SqlServer 6 568.4 us 9.49 us 8.87 us 568.9 us 0.45 0.01
Separate_deletes_with_transaction_via_api SqlServer 10 1,315.1 us 14.11 us 13.20 us 1,318.4 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 10 638.3 us 12.70 us 24.47 us 641.5 us 0.48 0.03
Separate_deletes SqlServer 10 743.2 us 20.66 us 60.92 us 754.6 us 0.56 0.06
Single_delete_with_in SqlServer 10 525.8 us 10.36 us 15.82 us 528.4 us 0.40 0.01
Single_delete_with_in_with_transaction_via_raw_sql SqlServer 10 534.4 us 9.70 us 9.08 us 533.4 us 0.41 0.01
Delete_with_merge SqlServer 10 529.9 us 18.47 us 54.45 us 533.1 us 0.44 0.01
Separate_deletes_with_transaction_via_api SqlServer 20 1,477.9 us 29.29 us 84.99 us 1,503.2 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 20 848.9 us 29.25 us 86.23 us 863.8 us 0.58 0.07
Separate_deletes SqlServer 20 990.8 us 57.22 us 168.72 us 1,013.8 us 0.67 0.12
Single_delete_with_in SqlServer 20 554.5 us 10.99 us 10.28 us 556.3 us 0.42 0.02
Single_delete_with_in_with_transaction_via_raw_sql SqlServer 20 560.9 us 8.63 us 8.07 us 558.9 us 0.43 0.02
Delete_with_merge SqlServer 20 605.7 us 11.61 us 16.27 us 607.0 us 0.44 0.04
Benchmark results with composite key
Method DatabaseType NumRows Mean Error StdDev Median Ratio RatioSD
Separate_deletes_with_transaction_via_api SqlServer 2 1,193.6 us 21.43 us 20.04 us 1,198.7 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 2 536.1 us 10.62 us 11.80 us 536.2 us 0.45 0.01
Separate_deletes SqlServer 2 539.2 us 10.59 us 10.88 us 538.2 us 0.45 0.01
Single_delete_with_in_and_row_values SqlServer 2 NA NA NA NA ? ?
Single_delete_with_in_and_row_values_with_transaction_via_raw_sql SqlServer 2 NA NA NA NA ? ?
Single_delete_with_expanded_logic SqlServer 2 508.7 us 5.48 us 4.86 us 508.7 us 0.43 0.01
Single_delete_with_expanded_logic_with_transaction_via_raw_sql SqlServer 2 517.9 us 10.05 us 10.32 us 514.6 us 0.43 0.01
Separate_deletes_with_transaction_via_api SqlServer 4 1,216.6 us 15.73 us 12.28 us 1,219.6 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 4 562.6 us 11.05 us 16.19 us 565.9 us 0.47 0.01
Separate_deletes SqlServer 4 581.9 us 11.61 us 24.74 us 585.6 us 0.47 0.04
Single_delete_with_in_and_row_values SqlServer 4 NA NA NA NA ? ?
Single_delete_with_in_and_row_values_with_transaction_via_raw_sql SqlServer 4 NA NA NA NA ? ?
Single_delete_with_expanded_logic SqlServer 4 518.3 us 10.33 us 13.07 us 518.4 us 0.42 0.01
Single_delete_with_expanded_logic_with_transaction_via_raw_sql SqlServer 4 531.7 us 10.58 us 12.19 us 532.0 us 0.43 0.01
Separate_deletes_with_transaction_via_api SqlServer 6 1,280.8 us 25.28 us 23.65 us 1,274.4 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 6 595.0 us 11.89 us 27.79 us 601.4 us 0.46 0.03
Separate_deletes SqlServer 6 643.7 us 12.76 us 31.31 us 648.9 us 0.49 0.04
Single_delete_with_in_and_row_values SqlServer 6 NA NA NA NA ? ?
Single_delete_with_in_and_row_values_with_transaction_via_raw_sql SqlServer 6 NA NA NA NA ? ?
Single_delete_with_expanded_logic SqlServer 6 533.0 us 10.48 us 16.92 us 537.7 us 0.42 0.01
Single_delete_with_expanded_logic_with_transaction_via_raw_sql SqlServer 6 539.0 us 10.57 us 20.87 us 543.5 us 0.42 0.01
Separate_deletes_with_transaction_via_api SqlServer 10 1,359.0 us 21.81 us 20.40 us 1,357.8 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 10 683.0 us 13.77 us 40.59 us 688.7 us 0.50 0.03
Separate_deletes SqlServer 10 809.2 us 27.65 us 81.54 us 832.0 us 0.60 0.05
Single_delete_with_in_and_row_values SqlServer 10 NA NA NA NA ? ?
Single_delete_with_in_and_row_values_with_transaction_via_raw_sql SqlServer 10 NA NA NA NA ? ?
Single_delete_with_expanded_logic SqlServer 10 562.6 us 10.92 us 12.57 us 564.3 us 0.41 0.01
Single_delete_with_expanded_logic_with_transaction_via_raw_sql SqlServer 10 559.1 us 11.13 us 25.57 us 564.9 us 0.41 0.02
Separate_deletes_with_transaction_via_api SqlServer 20 1,673.7 us 33.18 us 36.88 us 1,666.2 us 1.00 0.00
Separate_deletes_with_transaction_via_raw_sql SqlServer 20 943.2 us 56.15 us 165.57 us 998.5 us 0.58 0.09
Separate_deletes SqlServer 20 1,103.5 us 76.15 us 224.53 us 1,184.0 us 0.66 0.13
Single_delete_with_in_and_row_values SqlServer 20 NA NA NA NA ? ?
Single_delete_with_in_and_row_values_with_transaction_via_raw_sql SqlServer 20 NA NA NA NA ? ?
Single_delete_with_expanded_logic SqlServer 20 632.5 us 12.60 us 29.95 us 640.5 us 0.38 0.01
Single_delete_with_expanded_logic_with_transaction_via_raw_sql SqlServer 20 643.1 us 13.41 us 39.33 us 652.2 us 0.37 0.04

Notes/caveats:

  • This allows removing the transaction if the bulk delete is the only thing being saved (single command).
  • For composite keys, row values can be used on MySQL and PostgreSQL:
    • DELETE FROM Foo WHERE (Id1, Id2) IN ((1, 2), (3, 4))
    • This isn't supported by SQL Server or SQLite (they could opt out of this optimization for composite keys).
  • At least in theory, multiple (identical) DELETEs could reuse the same query plan, where single multi-row DELETEs for varying row numbers would not.
Single-condition benchmark code
BenchmarkRunner.Run<Benchmark>();

public class Benchmark
{
    [Params(DatabaseType.Postgresql, DatabaseType.SqlServer)]
    public DatabaseType DatabaseType { get; set; }

    [Params(2, 4, 6, 10, 20)]
    public int NumRows { get; set; }

    private DbConnection _connection;
    private DbCommand _command;

    private async Task Setup()
    {
        _connection = DatabaseType switch
        {
            DatabaseType.Postgresql => new NpgsqlConnection("Host=localhost;Username=test;Password=test"),
            DatabaseType.SqlServer => new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Trust Server Certificate=true"),
            _ => throw new ArgumentOutOfRangeException()
        };

        await _connection.OpenAsync();

        await using var command = _connection.CreateCommand();
        command.CommandText = "DROP TABLE IF EXISTS foo; CREATE TABLE foo (id INT, data INT)";
        await command.ExecuteNonQueryAsync();
    }

    [GlobalSetup(Targets = new[] { nameof(Separate_deletes), nameof(Separate_deletes_with_transaction_via_api) })]
    public async Task Setup_Separate_deletes()
    {
        await Setup();

        _command = _connection.CreateCommand();
        _command.CommandText = new StringBuilder()
            .AppendJoin(" ", Enumerable.Range(0, NumRows).Select(i => $"DELETE FROM foo WHERE id = @p{i};"))
            .ToString();

        for (var i = 0; i < NumRows; i++)
        {
            var param = _command.CreateParameter();
            param.ParameterName = "p" + i;
            param.Value = i;
            _command.Parameters.Add(param);
        }
    }

    [GlobalSetup(Target = nameof(Separate_deletes_with_transaction_via_raw_sql))]
    public async Task Setup_Separate_deletes_with_transaction_via_raw_sql()
    {
        await Setup_Separate_deletes();

        _command.CommandText =
            (DatabaseType == DatabaseType.SqlServer ? "BEGIN TRAN;" : "BEGIN;")
            + _command.CommandText
            + "COMMIT";
    }

    [GlobalSetup(Target = nameof(Single_delete_with_in))]
    public async Task Setup_Single_delete_with_in()
    {
        await Setup();

        _command = _connection.CreateCommand();
        _command.CommandText = new StringBuilder()
            .Append("DELETE FROM foo WHERE id IN (")
            .AppendJoin(", ", Enumerable.Range(0, NumRows).Select(i => "@p" + i))
            .Append(")")
            .ToString();

        for (var i = 0; i < NumRows; i++)
        {
            var param = _command.CreateParameter();
            param.ParameterName = "p" + i;
            param.Value = i;
            _command.Parameters.Add(param);
        }
    }

    [GlobalSetup(Target = nameof(Single_delete_with_in_with_transaction_via_raw_sql))]
    public async Task Setup_Single_delete_with_in_with_transaction_via_raw_sql()
    {
        await Setup_Single_delete_with_in();

        _command.CommandText =
            (DatabaseType == DatabaseType.SqlServer ? "BEGIN TRAN;" : "BEGIN;")
            + _command.CommandText
            + "; COMMIT";
    }

    [GlobalSetup(Target = nameof(Delete_with_merge))]
    public async Task Setup_Delete_with_merge()
    {
        await Setup();

        if (DatabaseType != DatabaseType.SqlServer)
            throw new NotSupportedException();

        _command = _connection.CreateCommand();
        _command.CommandText = new StringBuilder()
            .AppendLine("MERGE [foo] USING (VALUES ")
            .AppendJoin(", ", Enumerable.Range(0, NumRows).Select(i => $"(@p{i})"))
            .AppendLine(")")
            .Append(@"AS i (id) ON ([foo].id = i.id)
WHEN MATCHED THEN DELETE
OUTPUT 1;")
            .ToString();

        for (var i = 0; i < NumRows; i++)
        {
            var param = _command.CreateParameter();
            param.ParameterName = "p" + i;
            param.Value = i;
            _command.Parameters.Add(param);
        }
    }

    [Benchmark(Baseline = true)]
    public async Task Separate_deletes_with_transaction_via_api()
    {
        var tx = _connection.BeginTransaction();
        _command.Transaction = tx;
        await _command.ExecuteNonQueryAsync();
        tx.Commit();
    }

    [Benchmark]
    public async Task Separate_deletes_with_transaction_via_raw_sql()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Separate_deletes()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Single_delete_with_in()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Single_delete_with_in_with_transaction_via_raw_sql()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Delete_with_merge()
        => await _command.ExecuteNonQueryAsync();
}

public enum DatabaseType
{
    Postgresql,
    SqlServer
}
Multiple-condition benchmark code
BenchmarkRunner.Run<Benchmark>();

public class Benchmark
{
    [Params(DatabaseType.Postgresql, DatabaseType.SqlServer)]
    public DatabaseType DatabaseType { get; set; }

    [Params(2, 4, 6, 10, 20)]
    public int NumRows { get; set; }

    private DbConnection _connection;
    private DbCommand _command;

    private async Task Setup()
    {
        _connection = DatabaseType switch
        {
            DatabaseType.Postgresql => new NpgsqlConnection("Host=localhost;Username=test;Password=test"),
            DatabaseType.SqlServer => new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Trust Server Certificate=true"),
            _ => throw new ArgumentOutOfRangeException()
        };

        await _connection.OpenAsync();

        await using var command = _connection.CreateCommand();
        command.CommandText = "DROP TABLE IF EXISTS foo; CREATE TABLE foo (id1 INT, id2 INT, data INT)";
        await command.ExecuteNonQueryAsync();

        _command = _connection.CreateCommand();
        for (var i = 0; i < NumRows; i++)
        {
            var param = _command.CreateParameter();
            param.ParameterName = $"p{i * 2}";
            param.Value = i * 2;
            _command.Parameters.Add(param);

            param = _command.CreateParameter();
            param.ParameterName = $"p{i * 2 + 1}";
            param.Value = i * 2 + 1;
            _command.Parameters.Add(param);
        }
    }

    [GlobalSetup(Targets = new[] { nameof(Separate_deletes), nameof(Separate_deletes_with_transaction_via_api) })]
    public async Task Setup_Separate_deletes()
    {
        await Setup();

        _command.CommandText = new StringBuilder()
            .AppendJoin(" ", Enumerable.Range(0, NumRows).Select(i => $"DELETE FROM foo WHERE id1 = @p{i * 2} AND id2 = @p{i * 2 + 1};"))
            .ToString();
    }

    [GlobalSetup(Target = nameof(Separate_deletes_with_transaction_via_raw_sql))]
    public async Task Setup_Separate_deletes_with_transaction_via_raw_sql()
    {
        await Setup_Separate_deletes();

        _command.CommandText =
            (DatabaseType == DatabaseType.SqlServer ? "BEGIN TRAN;" : "BEGIN;")
            + _command.CommandText
            + "COMMIT";
    }

    [GlobalSetup(Target = nameof(Single_delete_with_in_and_row_values))]
    public async Task Setup_Single_delete_with_in_and_row_values()
    {
        await Setup();

        _command.CommandText = new StringBuilder()
            .Append("DELETE FROM foo WHERE (id1, id2) IN (")
            .AppendJoin(", ", Enumerable.Range(0, NumRows).Select(i => $"(@p{i * 2}, @p{i * 2 + 1})"))
            .Append(")")
            .ToString();
    }

    [GlobalSetup(Target = nameof(Single_delete_with_in_and_row_values_with_transaction_via_raw_sql))]
    public async Task Setup_Single_delete_with_in_and_row_values_with_transaction_via_raw_sql()
    {
        await Setup_Single_delete_with_in_and_row_values();

        _command.CommandText =
            (DatabaseType == DatabaseType.SqlServer ? "BEGIN TRAN;" : "BEGIN;")
            + _command.CommandText
            + "; COMMIT";
    }

    [GlobalSetup(Target = nameof(Single_delete_with_expanded_logic))]
    public async Task Setup_Single_delete_with_expanded_logic()
    {
        await Setup();

        _command.CommandText = new StringBuilder()
            .Append("DELETE FROM foo WHERE ")
            .AppendJoin(" OR ", Enumerable.Range(0, NumRows).Select(i => $"id1 = @p{i * 2} AND id2 = @p{i * 2 + 1}"))
            .ToString();
    }

    [GlobalSetup(Target = nameof(Single_delete_with_expanded_logic_with_transaction_via_raw_sql))]
    public async Task Setup_Single_delete_with_expanded_logic_with_transaction_via_raw_sql()
    {
        await Setup_Single_delete_with_expanded_logic();

        _command.CommandText =
            (DatabaseType == DatabaseType.SqlServer ? "BEGIN TRAN;" : "BEGIN;")
            + _command.CommandText
            + "; COMMIT";
    }

    [Benchmark(Baseline = true)]
    public async Task Separate_deletes_with_transaction_via_api()
    {
        var tx = _connection.BeginTransaction();
        _command.Transaction = tx;
        await _command.ExecuteNonQueryAsync();
        tx.Commit();
    }

    [Benchmark]
    public async Task Separate_deletes_with_transaction_via_raw_sql()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Separate_deletes()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Single_delete_with_in_and_row_values()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Single_delete_with_in_and_row_values_with_transaction_via_raw_sql()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Single_delete_with_expanded_logic()
        => await _command.ExecuteNonQueryAsync();

    [Benchmark]
    public async Task Single_delete_with_expanded_logic_with_transaction_via_raw_sql()
        => await _command.ExecuteNonQueryAsync();
}

public enum DatabaseType
{
    Postgresql,
    SqlServer
}
@roji
Copy link
Member Author

roji commented Mar 3, 2022

Note: the problem here is concurrency detection, see #27562.

@roji roji added the blocked label Mar 3, 2022
@roji
Copy link
Member Author

roji commented Mar 10, 2022

NOTE: See below, the concurrency check can be partially solved.

If we do decide to keep concurrency checks, we could do a single query after multiple updates to get the number of rows with the IDs we wanted to delete (and expect zero to come back). However, if it's possible for keys to repeat (i.e. not IDENTITY/GUID), then it's possible for a concurrent insert to add the row we just deleted, causing a spurious concurrency exception. Note that this is a problem even if we're in a transaction, since newly-committed rows do appear from outside the transaction; the only way to protect against this AFAIK is via the serializable isolation level, which we definitely shouldn't do.

Bottom line, unless we decide to get rid of the concurrency checks, we probably shouldn't do anything here.

@roji
Copy link
Member Author

roji commented Mar 10, 2022

Thinking about it some more, it seems the the concurrency check isn't actually a problem here. One can do:

DELETE FROM Foo WHERE id IN (1, 2, 3) RETURNING 1;

This returns a row with 1 for each row deleted. This allows us to detect when a concurrency issue, but does not allow us to know exactly which entry was the problematic one.

The above assumes no concurrency token. When there is one, we can use row values for most databases (but not SQL Server):

DELETE FROM Foo WHERE (id, token) IN ((1, 2), (3, 4)) RETURNING 1;

@roji
Copy link
Member Author

roji commented Mar 16, 2022

Design discussion:

  • We can preserve full concurrency checking by returning the IDs of the deleted rows, instead of 1 (thanks @bricelam). Probably requires some client-side changes.
  • Re query plan fragmentation, we already have this problem with inserts, and one mitigation is the limitation of the batch size. No need to do anything specific here.
  • As the optimization isn't huge, we'll see whether we implement it for 7.0.

@roji
Copy link
Member Author

roji commented Apr 20, 2022

Note to self: when implementing this, note that a single bulk delete statement is not safe for running without transactions, as long as concurrency checking is enabled - for the same reasons as in #27532 (comment). In other words, if less rows were deleted than expected, we want to throw and roll back, but if there's no transaction the change has already been committed and there's nothing to roll back.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants