Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SQL Server: When retrieving generated columns in insert, concurrency may cause us not to get the original generated value #27446

Closed
roji opened this issue Feb 15, 2022 · 9 comments
Assignees
Labels
area-save-changes area-sqlserver closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed.

Comments

@roji
Copy link
Member

roji commented Feb 15, 2022

When inserting a new entity instance which has both identity and other generated values, we current use the INSERT+SELECT pattern:

INSERT INTO [Blogs] ([Name])
VALUES (@p0);
SELECT [Id], [Foo]
FROM [Blogs]
WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity();

Although the two statements are wrapped in a transaction, we use the default isolation level, which is read committed; this means that if an update happens between the INSERT and the SELECT, the SELECT returns the updated value for Foo, and not the originally-generated value.

I'm not sure this is an actual problem, but it does seem like we should ideally propagate back the original generated values, and not be sensitive to a race condition such as this. Switching to a single-command method (SELECT ... OUTPUT) would remove this problem, but would degrade performance when triggers are present (SELECT ... OUTPUT INTO) - see #27372 for more info.

@roji roji changed the title When retrieving generated columns in insert, concurrency may cause us to get the original generated value SQL Server: When retrieving generated columns in insert, concurrency may cause us to get the original generated value Feb 15, 2022
@AndriySvyryd AndriySvyryd changed the title SQL Server: When retrieving generated columns in insert, concurrency may cause us to get the original generated value SQL Server: When retrieving generated columns in insert, concurrency may cause us not to get the original generated value Feb 16, 2022
@roji roji added this to the 7.0.0 milestone Feb 18, 2022
@roji roji self-assigned this Feb 18, 2022
@roji roji added the type-bug label Feb 18, 2022
@roji
Copy link
Member Author

roji commented Feb 18, 2022

Design decision: we will move away from doing INSERT+SELECT. See #27372 for further decisions on insertion behavior.

@GSPP
Copy link

GSPP commented Feb 21, 2022

On SQL Server, this should not be a problem because the INSERT X-locks the row.

The switch to another strategy might still be wise, just mentioning it.

@roji
Copy link
Member Author

roji commented Feb 21, 2022

@GSPP you're right...

In addition, any other transaction wouldn't even see the newly-inserted row before its SELECT completes, since the INSERT+SELECT hasn't been committed yet (unless that other transaction is using READ UNCOMMITTED). In fact, I could arrive at a deadlock by doing INSERT+SELECT in one transaction and UPDATE in another - even in Read Commit Snapshot Isolation mode, which is somewhat odd.

So I agree that there isn't a correctness issue here, and if we want to keep perf optimized for all scenarios (including trigger-enabled ones), we may want to keep this technique. @GSPP any other reason you see to switch away from this?

Note: in the context of avoiding transactions around single updates (#27439), this is a good example of a single modification command which is executed via 2 SQL commands, which do need to be wrapped in a transaction.

/cc @lauxjpn

@GSPP
Copy link

GSPP commented Feb 21, 2022

deadlock by doing INSERT+SELECT in one transaction and UPDATE in another

How do you see that deadlock happening? I can't see it.


Using a single DML statement that outputs the data that the client needs feels right to me. It can't get better than this. The only downside is that MERGE might not collapse to the exact same efficiency as an INSERT. It should be pretty close, though. I see no reason why there would be any difference at all, but you never know.

To me, these idiosyncrasies in SQL Server with the various strategies seem a bit unnecessary. They seem like unforced design errors in SQL Server. We have to make do with what we've got.

@roji
Copy link
Member Author

roji commented Feb 21, 2022

deadlock by doing INSERT+SELECT in one transaction and UPDATE in another

How do you see that deadlock happening? I can't see it.

I didn't give enough details above; I could reproduce a deadlock when the INSERT+SELECT transaction waits between the two commands on an unrelated UPDATE. So INSERT acquires the lock as you wrote above, then a separate UPDATE transaction blocks on that lock, as expected. The only weirdness IMHO is that this also happens in RSCI mode, where the separate UPDATE transaction isn't even supposed to see the inserted row, and so not block on it (this is the PG behavior).

But I'm not SQL Server expert, so any insights would be appreciated. Here's a code sample:

Deadlock INSERT+UPDATE
await using var conn1 = new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false");
await conn1.OpenAsync();

await using var conn2 = new SqlConnection("Server=localhost;Database=test;User=SA;Password=Abcd5678;Connect Timeout=60;ConnectRetryCount=0;Encrypt=false");
await conn2.OpenAsync();

await using var cmd1 = new SqlCommand();
cmd1.Connection = conn1;
cmd1.CommandText = @"
ALTER DATABASE test SET READ_COMMITTED_SNAPSHOT ON;
DROP TABLE IF EXISTS Foo;
CREATE TABLE Foo (Id INT IDENTITY PRIMARY KEY, Bar INT)";
await cmd1.ExecuteNonQueryAsync();

await using var tx = (SqlTransaction)await conn1.BeginTransactionAsync();
cmd1.Transaction = tx;
cmd1.CommandText = @"INSERT INTO Foo (Bar) VALUES (8)";
await cmd1.ExecuteNonQueryAsync();

await using var cmd2 = new SqlCommand("UPDATE Foo SET Bar = 1000", conn2);
await cmd2.ExecuteNonQueryAsync();

cmd1.CommandText = @"SELECT [Bar] FROM [Foo] WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity()";
Console.WriteLine(await cmd1.ExecuteScalarAsync());

Using a single DML statement that outputs the data that the client needs feels right to me. It can't get better than this. The only downside is that MERGE might not collapse to the exact same efficiency as an INSERT. It should be pretty close, though. I see no reason why there would be any difference at all, but you never know.

I certainly agree, and what you write is true - except if there happen to be triggers on the table. This is because you can't use INSERT ... OUTPUT (or MERGE ... OUTPUT) without the INTO clause (using a TVP), and that is the source of considerable slowdown. So for the trigger case, doing INSERT+SELECT does make sense from a perf perspective, for a small number of rows (before MERGE ... OUTPUT INTO becomes faster).

@roji
Copy link
Member Author

roji commented Feb 21, 2022

To me, these idiosyncrasies in SQL Server with the various strategies seem a bit unnecessary. They seem like unforced design errors in SQL Server. We have to make do with what we've got.

Definitely. FWIW none of this kind of complexity is present in PG, though I admit I haven't done an exhaustive perf analysis there yet.

@lauxjpn
Copy link
Contributor

lauxjpn commented Feb 21, 2022

There are two snapshot related isolation levels for SQL Server:

  • READ COMMITTED SNAPSHOT ISOLATION (RCSI)
  • SNAPSHOT ISOLATION (SI).

Both modes come at the cost of a bit of tempdb overhead.

SI provides real lock-free concurrent operations for reads and writes, but comes with the risk of failing statements that modify the same row.
RCSI uses snapshots for lock-free concurrent operations for reads, but will still lock for writes.

For more information, see:


With the write lock in mind for RCSI, the code sample would need to be change to something like the following to work as expected:

Program.cs: RCSI
using System.Diagnostics;
using Microsoft.Data.SqlClient;

namespace MssqlSnapshotIsolation01;

internal static class Program
{
    private static async Task Main()
    {
        await using (var conn0 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False"))
        {
            await conn0.OpenAsync();
                
            await using var cmd0 = new SqlCommand();
            cmd0.Connection = conn0;
            cmd0.CommandText = @"
DROP DATABASE IF EXISTS [EFCoreIssue27446_01];
CREATE DATABASE [EFCoreIssue27446_01];
ALTER DATABASE [EFCoreIssue27446_01] SET READ_COMMITTED_SNAPSHOT ON;";
            await cmd0.ExecuteNonQueryAsync();
        }

        await using (var conn0 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_01"))
        {
            await conn0.OpenAsync();
                
            await using var cmd0 = new SqlCommand();
            cmd0.Connection = conn0;
            cmd0.CommandText = @"
CREATE TABLE [Foo] (Id INT IDENTITY PRIMARY KEY, Bar INT)";
            await cmd0.ExecuteNonQueryAsync();
        }

        await using var conn1 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_01");
        await conn1.OpenAsync();

        await using var conn2 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_01");
        await conn2.OpenAsync();

        await using var tx1 = (SqlTransaction)await conn1.BeginTransactionAsync();
        await using var cmd1 = new SqlCommand();
        cmd1.Connection = conn1;
        cmd1.Transaction = tx1;
        cmd1.CommandText = @"INSERT INTO Foo (Bar) VALUES (8)";
        await cmd1.ExecuteNonQueryAsync();

        await using var cmd2 = new SqlCommand("UPDATE Foo SET Bar = 1000", conn2);
        var startTime = DateTime.UtcNow;
        var task2 = cmd2.ExecuteNonQueryAsync();
        await Task.Delay(TimeSpan.FromSeconds(8));

        cmd1.CommandText = @"SELECT [Bar] FROM [Foo] WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity()";
        var selectResult1 = (int?)await cmd1.ExecuteScalarAsync();

        await tx1.CommitAsync();
        
        var changedCount2 = await task2;
        var cmd2Duration = DateTime.UtcNow - startTime;

        Trace.Assert(cmd2Duration.TotalSeconds > 6); // cmd2 had to wait for cmd1 to release its write lock 
        Trace.Assert(selectResult1 == 8); // this is 8, because cmd2 could only execute after cmd1 released its write lock
        Trace.Assert(changedCount2 == 1); // this is 1, not 0, because cmd2 could only execute after cmd1 released its write lock
    }
}

Using SI instead, the following simple code (that is very close to the initial sample code) will just work and will not contain any unexpected side-effects (SELECT returns 8 and no rows are changed by the UPDATE statement):

Program.cs: SI
using System.Data;
using System.Diagnostics;
using Microsoft.Data.SqlClient;

namespace MssqlSnapshotIsolation02;

internal static class Program
{
    private static async Task Main()
    {
        await using (var conn0 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False"))
        {
            await conn0.OpenAsync();
                
            await using var cmd0 = new SqlCommand();
            cmd0.Connection = conn0;
            cmd0.CommandText = @"
DROP DATABASE IF EXISTS [EFCoreIssue27446_02];
CREATE DATABASE [EFCoreIssue27446_02];
ALTER DATABASE [EFCoreIssue27446_02] SET ALLOW_SNAPSHOT_ISOLATION ON;";
            await cmd0.ExecuteNonQueryAsync();
        }

        await using (var conn0 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_02"))
        {
            await conn0.OpenAsync();
                
            await using var cmd0 = new SqlCommand();
            cmd0.Connection = conn0;
            cmd0.CommandText = @"
CREATE TABLE [Foo] (Id INT IDENTITY PRIMARY KEY, Bar INT)";
            await cmd0.ExecuteNonQueryAsync();
        }

        await using var conn1 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_02");
        await conn1.OpenAsync();
        await using var tx1 = (SqlTransaction)await conn1.BeginTransactionAsync(IsolationLevel.Snapshot);

        await using var conn2 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_02");
        await conn2.OpenAsync();
        await using var tx2 = (SqlTransaction)await conn2.BeginTransactionAsync(IsolationLevel.Snapshot);

        await using var cmd1 = new SqlCommand();
        cmd1.Connection = conn1;
        cmd1.Transaction = tx1;
        cmd1.CommandText = @"INSERT INTO Foo (Bar) VALUES (8)";
        await cmd1.ExecuteNonQueryAsync();

        await using var cmd2 = new SqlCommand("UPDATE Foo SET Bar = 1000");
        cmd2.Connection = conn2;
        cmd2.Transaction = tx2;
        var changedCount2 = await cmd2.ExecuteNonQueryAsync();
            
        cmd1.CommandText = @"SELECT [Bar] FROM [Foo] WHERE @@ROWCOUNT = 1 AND [Id] = scope_identity()";
        var selectResult1 = (int?)await cmd1.ExecuteScalarAsync();
        
        await tx1.CommitAsync();
        await tx2.CommitAsync();
        
        Trace.Assert(selectResult1 == 8);
        Trace.Assert(changedCount2 == 0);
    }
}

Finally, using no transaction to group the INSERT and SELECT statements together, the UPDATE statement will be able run in-between:

Program.cs: RCSI (but only statement wide implicit transactions)
using System.Diagnostics;
using Microsoft.Data.SqlClient;

namespace MssqlSnapshotIsolation03;

internal static class Program
{
    private static async Task Main()
    {
        await using (var conn0 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False"))
        {
            await conn0.OpenAsync();
                
            await using var cmd0 = new SqlCommand();
            cmd0.Connection = conn0;
            cmd0.CommandText = @"
DROP DATABASE IF EXISTS [EFCoreIssue27446_03];
CREATE DATABASE [EFCoreIssue27446_03];
ALTER DATABASE [EFCoreIssue27446_03] SET READ_COMMITTED_SNAPSHOT ON;";
            await cmd0.ExecuteNonQueryAsync();
        }

        await using (var conn0 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_03"))
        {
            await conn0.OpenAsync();
                
            await using var cmd0 = new SqlCommand();
            cmd0.Connection = conn0;
            cmd0.CommandText = @"
CREATE TABLE [Foo] (Id INT IDENTITY PRIMARY KEY, Bar INT)";
            await cmd0.ExecuteNonQueryAsync();
        }

        await using var conn1 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_03");
        await conn1.OpenAsync();

        await using var conn2 = new SqlConnection(@"Server=.\MSSQL14;Integrated Security=True;Encrypt=False;Database=EFCoreIssue27446_03");
        await conn2.OpenAsync();

        await using var cmd1 = new SqlCommand();
        cmd1.Connection = conn1;
        cmd1.CommandText = @"
INSERT INTO Foo (Bar) VALUES (8);
WAITFOR DELAY '00:00:08';
SELECT [Bar] FROM [Foo] WHERE [Id] = scope_identity();";
        var task1 = cmd1.ExecuteScalarAsync();
        await Task.Delay(TimeSpan.FromSeconds(3)); // ensure cmd1 has been sent

        await using var cmd2 = new SqlCommand();
        cmd2.CommandText = @"UPDATE Foo SET Bar = 1000;";
        cmd2.Connection = conn2;

        var startTime = DateTime.UtcNow;
        var changedCount2 = await cmd2.ExecuteNonQueryAsync();
        var cmd2Duration = DateTime.UtcNow - startTime;

        var selectResult1 = (int?)await task1;
        
        Trace.Assert(cmd2Duration.TotalSeconds < 6); // cmd2 was able to run between INSERT and SELECT of cmd1 
        Trace.Assert(selectResult1 == 1000); // this is 1000, not 8, because cmd2 was able to run between INSERT and SELECT of cmd1 
        Trace.Assert(changedCount2 == 1); // this is 1, not 0, because cmd2 was able to run between INSERT and SELECT of cmd1 
    }
}

Although the two statements are wrapped in a transaction, we use the default isolation level, which is read committed; this means that if an update happens between the INSERT and the SELECT, the SELECT returns the updated value for Foo, and not the originally-generated value.

Between an INSERT and SELECT statement of the current connection, it should not be possible to read a modified value as the result of an UPDATE statement from a different connection, as long as the INSERT and SELECT statement of the current connection are protected by the same RCSI transaction (see the Program.cs: RCSI sample code).

@roji
Copy link
Member Author

roji commented Feb 22, 2022

@lauxjpn thanks for looking into this!

Regarding locking and RCSI:

With the write lock in mind for RCSI, the code sample would need to be change to something like the following to work as expected [...]

I think I'm probably missing something... Compared to my code sample above, your code gives UPDATE 8 seconds of waiting before continuing to do the SELECT and commit the transaction. Unless I'm mistaken, that means that the UPDATE is still blocking, waiting for the X-Lock from the INSERT transaction to be released. After the 8 seconds wait, your code commits the INSERT transaction and so the UPDATE is released.

In other words, my point above was only to point out that UPDATE does see the row and lock it, although the transaction creating it hasn't committed yet - and all that with RCSI turned on, and with READ COMMITTED (not READ UNCOMMITTED). That's the weird part for me, and on PG it does not occur.

But this has no actual bearing on this discussion on whether INSERT+SELECT is safe: the only important point is that as long as INSERT+SELECT are wrapped in a transaction, an external update cannot interfere with the values the SELECT sees (on SQL Server).

@roji
Copy link
Member Author

roji commented Feb 25, 2022

Closing; as discussed above, there's no safety issue with INSERT+SELECT as long as they're wrapped in a transaction (thanks @GSPP). We'll keep this behavior for trigger mode.

@roji roji closed this as completed Feb 25, 2022
@roji roji removed this from the 7.0.0 milestone Feb 25, 2022
@ajcvickers ajcvickers added closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed. and removed closed-not-needed labels Mar 10, 2022
@ajcvickers ajcvickers reopened this Oct 16, 2022
@ajcvickers ajcvickers closed this as not planned Won't fix, can't repro, duplicate, stale Oct 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-save-changes area-sqlserver closed-out-of-scope This is not something that will be fixed/implemented and the issue is closed.
Projects
None yet
Development

No branches or pull requests

4 participants