Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use INSERT w/OUTPUT Instead of INSERT+SELECT #7188

Closed
bgribaudo opened this issue Dec 5, 2016 · 11 comments
Closed

Use INSERT w/OUTPUT Instead of INSERT+SELECT #7188

bgribaudo opened this issue Dec 5, 2016 · 11 comments

Comments

@bgribaudo
Copy link

bgribaudo commented Dec 5, 2016

Entity Framework Core's SQL Server provider (at least, as of v. 1.1.0) follows each INSERT statement with a SELECT statement that fetches database-generated values and verifies the inserted row count.

Proposal

Eliminate the SELECT statement. Instead:

Examples

Currently (EF Core 1.1.0)

-- Table w/identity column:
SET NOCOUNT ON;
INSERT INTO [TestTable1] ([Data])
VALUES (@p0);
SELECT [Id]
FROM [TestTable1]
WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity();

-- Table w/identity and default value (`DEFAULT(GETDATE())`) columns:
SET NOCOUNT ON;
INSERT INTO [TestTable2] ([Data])
VALUES (@p0);
SELECT [Id], [HasDefaultValue]
FROM [TestTable2]
WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity();

Proposed Simplification

-- Table w/identity column:
SET NOCOUNT ON;
INSERT INTO [TestTable1] ([Data])
    OUTPUT INSERTED.[Id]
VALUES (@p0);

-- Table w/identity and default value (`DEFAULT(GETDATE())`) columns:
SET NOCOUNT ON;
INSERT INTO [TestTable2] ([Data])
    OUTPUT INSERTED.[Id], INSERTED.[HasDefaultValue]
VALUES (@p0);

Example Code

Example.zip

Further technical details

EF Core version: 1.1.0
Operating system: Windows 10
Visual Studio version: VS 2015

@rowanmiller
Copy link
Contributor

We used to take this approach, but it prevents you from using triggers (see #1441 for a complete history). We've profiled many approaches to this, and the performance difference between them all is negligible (a couple of percent either way).

@bgribaudo
Copy link
Author

Thanks, @rowanmiller . This makes sense. I forgot about triggers and OUTPUT being mutually incompatible. :-(

@Coder3333
Copy link

I am getting heavy deadlocking issues from the 2 step approach, and this is when I am only doing 2 simultaneous inserts into a table. If EF were using the output clause to get the identity, instead of a separate query, I do not think I would have the deadlock issues. Would it make sense to add an option for developers to choose how the insert is handled for folks that are not concerned about triggers?

@roji
Copy link
Member

roji commented Feb 4, 2022

Opening to revisit.

For tables without identity, we currently do INSERT ... OUTPUT INTO @tvp; SELECT ... FROM @tvp, since OUTPUT INTO doesn't have the restriction on triggers. However, it seems that this is quite bad for perf (see #27372).

Assuming we switch to using regular OUTPUT (without INTO) for non-IDENTITY tables, it may make sense to do the same here:

  • It would probably make the deadlock issues go away (though I haven't researched this)
  • It would simplify our implementation - we'd use the same logic regardless of IDENTITY.
  • It's about 4% faster:

BenchmarkDotNet=v0.13.0, OS=ubuntu 21.10
Intel Xeon W-2133 CPU 3.60GHz, 1 CPU, 12 logical and 6 physical cores
.NET SDK=6.0.101
[Host] : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT
DefaultJob : .NET 6.0.1 (6.0.121.56705), X64 RyuJIT

Method Mean Error StdDev Ratio RatioSD
NoOutput 2.085 ms 0.0413 ms 0.0863 ms 1.04 0.06
Output 1.912 ms 0.0378 ms 0.0641 ms 0.96 0.04
OutputInto 5.337 ms 0.1041 ms 0.1651 ms 2.67 0.11
AdditionalQuery 2.002 ms 0.0391 ms 0.0561 ms 1.00 0.00
Benchmark code
BenchmarkRunner.Run<IdentityBenchmark>();

public class IdentityBenchmark
{
    const string ConnectionString = "Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false";
    private SqlConnection _connection;

    [GlobalSetup]
    public async Task Setup()
    {
        _connection = new SqlConnection(ConnectionString);
        await _connection.OpenAsync();

        await using var cmd = new SqlCommand(@"
DROP TABLE IF EXISTS [Foo];
CREATE TABLE [Foo] ([Id] INT IDENTITY PRIMARY KEY, [Bar] INT)", _connection);
        await cmd.ExecuteNonQueryAsync();
    }

    [Benchmark]
    public async Task NoOutput()
    {
        await using var cmd = new SqlCommand("INSERT INTO [Foo] ([Bar]) VALUES (8)", _connection);
        _ = await cmd.ExecuteScalarAsync();
    }

    [Benchmark]
    public async Task Output()
    {
        await using var cmd = new SqlCommand("INSERT INTO [Foo] ([Bar]) OUTPUT INSERTED.[Id] VALUES (8)", _connection);
        _ = await cmd.ExecuteScalarAsync();
    }

    [Benchmark]
    public async Task OutputInto()
    {
        await using var cmd = new SqlCommand(@"DECLARE @inserted TABLE ([Id] int);
INSERT INTO [Foo] ([Bar]) OUTPUT INSERTED.[Id] INTO @inserted VALUES (8);
SELECT [i].[Id] FROM @inserted i;", _connection);
        _ = await cmd.ExecuteScalarAsync();
    }

    [Benchmark(Baseline = true)]
    public async Task AdditionalQuery()
    {
        await using var cmd = new SqlCommand(@"INSERT INTO [Foo] ([Bar]) VALUES (8);
SELECT [Id] FROM [Foo] WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity()", _connection);
        _ = await cmd.ExecuteScalarAsync();
    }

    [GlobalCleanup]
    public ValueTask Cleanup()
        => _connection.DisposeAsync();
}

@GSPP
Copy link

GSPP commented Feb 7, 2022

If I remember correctly, the MERGE statement has capabilities beyond the normal DML. I fail to recall what it was, though. So that's a potential avenue.

MERGE OUTPUT can reference the source table and output values from it. That can be used to input and get back out row IDs. Not sure if this could help, but I'm mentioning it.

MERGE should collapse to an essentially identical plan to normal DML if there's only one kind of write clause.

@roji
Copy link
Member

roji commented Feb 9, 2022

@GSPP yeah, we already use MERGE in many situations when inserting (specifically when we batch multiple insertions). I'm investigating switching to it entirely.

@ErikEJ
Copy link
Contributor

ErikEJ commented Feb 9, 2022

@roji Be careful out there: https://www.mssqltips.com/sqlservertip/3074/use-caution-with-sql-servers-merge-statement/

lets-be-careful-out-there-hill-street-blues

@roji
Copy link
Member

roji commented Feb 9, 2022

@ErikEJ I just read that post a couple days ago, when I found out that MERGE indeed isn't atomic... That's indeed a problem for using it to implement UPSERT.

However, in this context, note that EF Core already uses MERGE for insertions; I'd only be proposing changing when we use it (i.e. always instead of over 4), and our use of OUTPUT vs. OUTPUT INTO clauses. Will post a summary soon on #27372 and would very much appreciate any feedback!

@GSPP
Copy link

GSPP commented Feb 19, 2022

@roji I didn't realize that MERGE is so much in the radar. That's great to know!

With respect to atomicity: Merge is no different than any other SQL Server DML statement.

Merge is implemented in the following way: For each target row affected, a modification is computed. That modification is then applied, be it insert, update or delete. There really is no substantial difference to, say, an insert statement.

The updates are computed by first full-joining source and target, then by computing (using normal SQL logic) the modification type and parameters. All this is visible in the plan.

That full join and the logic are subject to normal optimizer behavior. This is why that full join can collapse to a simpler join type and then be optimized further.

I believe that not all of this machinery can "collapse". This might cause merge to remain a bit slower. But there should be no fundamental difference in behavior or performance.

That's indeed a problem for using it to implement UPSERT.

Could you elaborate on where the problem lies? MERGE is exactly as atomic as any other DML statement (on SQL Server).

@roji
Copy link
Member

roji commented Feb 19, 2022

@GSPP note that MERGE is already used today when inserting (and has been for a long time); this proposal (tracked mostly in #27372) simply extends using it in additonal scenarios. See #27372 (comment) for what we plan to change for 7.0. I've also done some extensive benchmarking comparing different techniques for inserting rows (including several variations with merge). Any further input you have here would be appreciated.

That's indeed a problem for using it to implement UPSERT.

Could you elaborate on where the problem lies? MERGE is exactly as atomic as any other DML statement (on SQL Server).

That is not my understanding... See this comment with links on the UPSERT issue - I'd be happy to continue the conversation there.

But as far as I'm aware, the atomicity issue isn't really relevant for our usage of MERGE for pure insertion (as opposed to UPSERT).

@roji
Copy link
Member

roji commented Feb 19, 2022

Closing this issue in favor of #27372, which tracks all the changes we intend to do for SQL Server insertion for 7.0.

@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

8 participants