Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Silent data loss when adding objects to an sqlite DB with PRAGMA journal_mode=DELETE in a multi-threaded environment. #30851

Closed
EdwardNickson opened this issue May 9, 2023 · 19 comments
Assignees
Labels
area-adonet-sqlite closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-bug
Milestone

Comments

@EdwardNickson
Copy link

Hi,

I'm experiencing a data loss issue when using EF Core with sqlite in a multi threaded scenario. This bug seems to only happen if PRAGMA journal_mode=DELETE. I can't replicate it in WAL mode.
What's happening is that sometimes when a new object is added and SaveChanges is called on a context, the database insert appears to run successfully, and DatabaseGenerated primary keys are created by the DB and set against the C# object, but when the database is later queried for that primary key the record doesn't exist. What's more, the next time an object is created and saved to the database it gets assigned the same primary key as the previous object. It's almost like it rolled-back the first transaction without throwing an exception.

This issue does not exist in EF Core 6, it started in EF Core 7. We have found that adding
ReplaceService<IUpdateSqlGenerator, SqliteLegacyUpdateSqlGenerator>() to our options builder resolves this issue, so it seems to be a bug in the new SqliteUpdateSqlGenerator.

Sample Program

This program sets up a database with PRAGMA journal_mode=DELETE, creates two dBContexts for one sqlite database and then adds objects to the contexts in one-off tasks which run in parallel. After each addition it checks to ensure the primary keys set in each context's call to SaveChanges is unique. If the keys are not unique it throws an exception.

What I expect to happen:
The program runs to completion with no exceptions.

What actually happens:
The program throws an exception because the objects added to the two different contexts ended up with the same primary key.

<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net7.0</TargetFramework>
    <ImplicitUsings>enable</ImplicitUsings>
    <Nullable>enable</Nullable>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="Microsoft.EntityFrameworkCore.Sqlite" Version="7.0.5" />
  </ItemGroup>

</Project>
using System.ComponentModel.DataAnnotations;
using System.ComponentModel.DataAnnotations.Schema;
using Microsoft.EntityFrameworkCore;
using Microsoft.EntityFrameworkCore.Sqlite.Update.Internal;
using Microsoft.EntityFrameworkCore.Update;

namespace collisionTest;

public class LogEvent
{
    [Key]
    [DatabaseGenerated(DatabaseGeneratedOption.Identity)]
    public int Id { get; set; }
    public required string Message { get; set; }
}

public class LogDbContext : DbContext
{
    public DbSet<LogEvent> LogEvents => Set<LogEvent>();
    protected override void OnConfiguring(DbContextOptionsBuilder optionsBuilder)
    {
        optionsBuilder.UseSqlite("Data source=testDb.db");

        // Bug can be worked around by using legacy update SQL generator
        // .ReplaceService<IUpdateSqlGenerator, SqliteLegacyUpdateSqlGenerator>();
    }
}

public class Program
{
    static async Task Main(string[] _)
    {
        using (var contextToCreateDb = new LogDbContext())
        {
            contextToCreateDb.Database.EnsureDeleted();
            contextToCreateDb.Database.EnsureCreated();
            using var connection = contextToCreateDb.Database.GetDbConnection();
            connection.Open();
            using var command = connection.CreateCommand();
            command.CommandText = "PRAGMA journal_mode=DELETE;";
            command.ExecuteNonQuery();
        }

        using var contextOne = new LogDbContext();
        using var contextTwo = new LogDbContext();
        for (int i = 0; i < 500; i++)
        {
            var taskOne = Task.Run(() =>
            {
                var logEvent = new LogEvent(){ Message = "Log Message" };
                contextOne.Add(logEvent);
                contextOne.SaveChanges();
                return logEvent.Id;
            });
            var taskTwo = Task.Run(() =>
            {
                var logEvent = new LogEvent(){ Message = "Log Message" };
                contextTwo.Add(logEvent);
                contextTwo.SaveChanges();
                return logEvent.Id;
            });
            await taskOne;
            await taskTwo;
            if (taskOne.Result == taskTwo.Result)
            {
                throw new Exception($"Duplicate IDs Returned: {taskOne.Result}");
            }
        }
    }
}

Include provider and version information

EF Core version: 7.0.5
Database provider: Microsoft.EntityFrameworkCore.Sqlite
Target framework: .NET 7.0
Operating system: Ubuntu 18.04 & Ubuntu 20.04
IDE: VS Code

@EdwardNickson
Copy link
Author

EdwardNickson commented May 10, 2023

I can also trigger this with one reader and one writer.
Changing the loop above to this:

for (int i = 0; i < 5000; i++)
{
    var taskOne = Task.Run(() =>
    {
        var logEvent = new LogEvent(){ Message = "Log Message" };
        contextOne.LogEvents.Add(logEvent);
        contextOne.SaveChanges();
        return logEvent.Id;
    });
    var taskTwo = Task.Run(() =>
    {
        var a = contextTwo.LogEvents.ToList();
    });
    await taskOne;
    await taskTwo;
}

eventually yields this exception:

Unhandled exception. Microsoft.EntityFrameworkCore.DbUpdateException: An error occurred while saving the entity changes. See the inner exception for details.
 ---> System.InvalidOperationException: The instance of entity type 'LogEvent' cannot be tracked because another instance with the same key value for {'Id'} is already being tracked. When attaching existing entities, ensure that only one entity instance with a given key value is attached. Consider using 'DbContextOptionsBuilder.EnableSensitiveDataLogging' to see the conflicting key values.
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.IdentityMap`1.ThrowIdentityConflict(InternalEntityEntry entry)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.IdentityMap`1.Add(TKey key, InternalEntityEntry entry, Boolean updateDuplicate)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.NavigationFixer.KeyPropertyChanged(InternalEntityEntry entry, IProperty property, IEnumerable`1 containingPrincipalKeys, IEnumerable`1 containingForeignKeys, Object oldValue, Object newValue)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.ChangeDetector.DetectKeyChange(InternalEntityEntry entry, IProperty property)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.ChangeDetector.PropertyChanged(InternalEntityEntry entry, IPropertyBase propertyBase, Boolean setModified)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.InternalEntityEntry.SetProperty(IPropertyBase propertyBase, Object value, Boolean isMaterialization, Boolean setModified, Boolean isCascadeDelete, CurrentValueType valueType)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.InternalEntityEntry.SetStoreGeneratedValue(IProperty property, Object value, Boolean setModified)
   at Microsoft.EntityFrameworkCore.Update.ColumnModification.set_Value(Object value)
   at Microsoft.EntityFrameworkCore.Update.ModificationCommand.PropagateResults(RelationalDataReader relationalReader)
   at Microsoft.EntityFrameworkCore.Update.AffectedCountModificationCommandBatch.ConsumeResultSet(Int32 startCommandIndex, RelationalDataReader reader)
   at Microsoft.EntityFrameworkCore.Update.AffectedCountModificationCommandBatch.Consume(RelationalDataReader reader)
   --- End of inner exception stack trace ---
   at Microsoft.EntityFrameworkCore.Update.AffectedCountModificationCommandBatch.Consume(RelationalDataReader reader)
   at Microsoft.EntityFrameworkCore.Update.ReaderModificationCommandBatch.Execute(IRelationalConnection connection)
   at Microsoft.EntityFrameworkCore.Update.Internal.BatchExecutor.Execute(IEnumerable`1 commandBatches, IRelationalConnection connection)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(IList`1 entriesToSave)
   at Microsoft.EntityFrameworkCore.ChangeTracking.Internal.StateManager.SaveChanges(StateManager stateManager, Boolean acceptAllChangesOnSuccess)
   at Microsoft.EntityFrameworkCore.DbContext.SaveChanges(Boolean acceptAllChangesOnSuccess)
   at collisionTest.Program.<>c__DisplayClass0_0.<Main>b__0() in /work/code/packager/dbtest/Program.cs:line 52
   at System.Threading.Tasks.Task`1.InnerInvoke()
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
--- End of stack trace from previous location ---
   at System.Threading.ExecutionContext.RunFromThreadPoolDispatchLoop(Thread threadPoolThread, ExecutionContext executionContext, ContextCallback callback, Object state)
   at System.Threading.Tasks.Task.ExecuteWithThreadLocal(Task& currentTaskSlot, Thread threadPoolThread)
--- End of stack trace from previous location ---
   at collisionTest.Program.Main(String[] _) in /code/dbtest/Program.cs:line 59
   at collisionTest.Program.<Main>(String[] _)

In this case the exception happens because of this chain of events:

  1. Insert LogEvent 1 into the database, DB returns ID 1. EF Core starts tracking ID 1.
  2. Insert LogEvent 2 into the database, DB returns ID 2. EF Core starts tracking ID 2.
  3. Bug occurs and LogEvent 2 doesn’t actually get committed to the DB. DB sequence for LogEvent table doesn’t get incremented to 2.
  4. Insert LogEvent 3 into the database, DB returns ID 2. EF Core tries to track ID 2. It’s already tracked in step 2 so we get the exception.

@AndriySvyryd
Copy link
Member

cc @bricelam

@bricelam
Copy link
Contributor

This looks like an issue with SQLite itself. I'd follow up with them.

We have found that adding
ReplaceService<IUpdateSqlGenerator, SqliteLegacyUpdateSqlGenerator>() to our options builder resolves this issue

This causes EF to use both an INSERT and SELECT statement...

INSERT INTO "Table0" ("Column0", "Column1")
VALUES (@p0, @p1);

SELECT "Id"
FROM "Table0"
WHERE changes() = 1 AND "rowid" = last_insert_rowid();

...instead of an INSERT statement with a RETURNING clause.

INSERT INTO "Table0" ("Column0", "Column1")
VALUES (@p0, @p1)
RETURNING "Id";

Which leads me to believe that the issue probably has something to do with a race condition in SQLite's handling of the RETURNING clause.

@bricelam
Copy link
Contributor

bricelam commented May 10, 2023

I wonder if the AUTOINCREMENT feature is also playing into this--we know that RETURNING doesn't play well with triggers, and autoincrement has many of the same characteristics as a trigger.

@ajcvickers
Copy link
Contributor

In addition to changing the generated SQL, SqliteLegacyUpdateSqlGenerator also reverts to explicitly using a transaction again. Based on this, I tried leaving the SQL the same, but using an explicit transaction:

using (var transaction = contextOne.Database.BeginTransaction())
{
    contextOne.SaveChanges();
    transaction.Commit();
}

The bug no longer repros with this code. I wonder if SQLite needs an explicit transaction for this to be safe, despite it being a "single statement"? @roji?

@roji
Copy link
Member

roji commented May 11, 2023

Ouch :/ Is this something we can also reproduce without EF, i.e. directly with Microsoft.Data.Sqlite (just to get an even more self-contained repro)? If so, I'd definitely try reporting this to the SQlite maintainers to see what they say... If it's really needed, we can see about always having the transaction there based on the provider - but this seems like it would be a pretty serious bug...

@EdwardNickson
Copy link
Author

I'l create a bug report. I can repro with Microsoft.Data.Sqlite & System.Data.SQLite.
The code below doesn't use AUTOINCREMENT, it's just inserting the same data over and over but we still get the bug, so it's the RETURNING which is causing this.

using System.Diagnostics;
using Microsoft.Data.Sqlite;

namespace collisionTest;

public class Program
{
    public static string CreateDB()
    {
        string filePath = Path.GetTempFileName();
        var connectionString = $"Data Source={filePath}";
        using (var connection = new SqliteConnection(connectionString))
        {
            connection.Open();
            using var command = connection.CreateCommand();
            command.CommandText = @"
                CREATE TABLE IF NOT EXISTS 'LogEvents' (
                    'Id' INTEGER NOT NULL,
                    'Message' TEXT NOT NULL
                );
                PRAGMA journal_mode=DELETE;
            ";
            command.ExecuteNonQuery();
        }
        return connectionString;
    }

    public static long CountRecords(string connectionString)
    {
        using var connection = new SqliteConnection(connectionString);
        connection.Open();
        using var command = connection.CreateCommand();
        command.CommandText = "Select count(*) from 'LogEvents';";
        return (long) command.ExecuteScalar()!;
    }

    static async Task Main(string[] _)
    {
        var connectionString = CreateDB();

        string notReturning = @$"
            INSERT INTO 'LogEvents' ('ID', 'Message')
            VALUES (1, 'test message');
        ";
        string returning = @$"
            INSERT INTO 'LogEvents' ('ID', 'Message')
            VALUES (1, 'test message')
            returning ID;
        ";
        foreach (var statement in new string[]{notReturning, returning})
        {
            var startingCount = CountRecords(connectionString);
            int inserts = 5000;
            var stopWatch = Stopwatch.StartNew();
            for (int i = 0; i < inserts; i++)
            {
                var taskOne = Task.Run(() =>
                {
                    using var connection = new SqliteConnection(connectionString);
                    connection.Open();
                    using var command = connection.CreateCommand();
                    command.CommandText = statement;
                    command.ExecuteScalar();
                });
                var taskTwo = Task.Run(() =>
                {
                    CountRecords(connectionString);
                });
                await taskOne;
                await taskTwo;
            }
            stopWatch.Stop();
            var finalCount = CountRecords(connectionString);
            Debug.WriteLine($"SQL Statement: {statement}");
            Debug.WriteLine($"{inserts} Inserts");
            Debug.WriteLine($"{finalCount - startingCount} Records Created");
            Debug.WriteLine($"Time Taken: {stopWatch.ElapsedMilliseconds}");
        }
    }
}

The output of this is:

SQL Statement: 
            INSERT INTO 'LogEvents' ('ID', 'Message')
            VALUES (1, 'test message');
5000 Inserts
5000 Records Created
Time Taken: 2064

SQL Statement: 
            INSERT INTO 'LogEvents' ('ID', 'Message')
            VALUES (1, 'test message')
            returning ID;
5000 Inserts
4901 Records Created
Time Taken: 714

Interestingly the statement with the RETURNING is significantly faster than the one without, which seems wrong given it has to do more.

@EdwardNickson
Copy link
Author

Bug report in sqlite forum - https://sqlite.org/forum/forumpost/44aaf065a3

@EdwardNickson
Copy link
Author

The SQLite team don't think this is an issue with the core SQLite Lib and are viewing this as a bug in System.Data.SQLite. See details in forum post linked above. Assuming that's true then this bug will also need to be fixed in Microsoft.Data.Sqlite.

@bricelam
Copy link
Contributor

Ah yes, I think we also assume that we can stop stepping after a statement returns results.

@EdwardNickson
Copy link
Author

That being said, someone in the forum pointed out that the documentation for RETURNING states that all database modifications are made in the first step so the proposed fix shouldn't be needed.

From https://www.sqlite.org/lang_returning.html#processing_order

When a DELETE, INSERT, or UPDATE statement with a RETURNING clause is run, all of the database changes occur during the first invocation of sqlite3_step(). The RETURNING clause output is accumulated in memory. The first sqlite3_step() call returns one row of RETURNING output and subsequent rows of RETURNING output are returned by subsequent calls to sqlite3_step(). To put this another way, all RETURNING clause output is embargoed until after all database modification actions are completed

@roji
Copy link
Member

roji commented May 16, 2023

@EdwardNickson thanks for opening the issue and managing the back-and-forths! Have you also reported this to System.Data.Sqlite by any chance?

@mistachkin
Copy link

mistachkin commented May 16, 2023

@EdwardNickson thanks for opening the issue and managing the back-and-forths! Have you also reported this to System.Data.Sqlite by any chance?

Yes.

To clarify further: System.Data.SQLite is officially supported by the SQLite Development Team, primarily by me.

The issue itself proved to be somewhat more complex than I initially thought, here is a brief summary:

  1. For any SQL query, regardless of content, the sqlite3_step() function must be called until SQLITE_DONE is returned.
  2. If this rule is not followed, there is a risk that an (implicit) write transaction (containing any pending changes) will be rolled back, making it appear just like the SQL query was never executed.
  3. The bug impacted both the ExecuteScalar and ExecuteNonQuery methods of System.Data.SQLite.
  4. In addition to the bug fix itself, extra diagnostics and backward compatibility overrides have been added to help de-risk the pending 1.0.118.0 release.

Please feel free to reach out directly to me (via email or phone) if further issues are encountered.

@freeze-out
Copy link

freeze-out commented May 16, 2023

I can confirm the bug. I recently had in my logs that data sets were saved without any error and when I searched for some entries in my database they were not there. I have a huge data loss because of that. I just realized that my databases were set to Delete journal mode. Probably that was default setting in EF few versions ago? Those databases are pretty old and I haven't set them to Delete mode by myself. My newer databases are set to WAL.

P.S: I have multiple processes writing and reading from same database.

@freeze-out
Copy link

Any idea how many percentage of data can get lost because of the bug? Is the 4901 data set saved out of 5000 a realistic scenario or could it be even more under any circumstances?

@roji
Copy link
Member

roji commented May 19, 2023

@freeze-out as with most concurrency bugs, I don't think anyone can answer that reliably; it also shouldn't really matter - if even one row can be silently lost, that's already pretty critical.

@freeze-out
Copy link

freeze-out commented May 29, 2023

@freeze-out as with most concurrency bugs, I don't think anyone can answer that reliably; it also shouldn't really matter - if even one row can be silently lost, that's already pretty critical.

Just wanted to give the management a number how much data might be lost.

@bricelam
Copy link
Contributor

Looks like this can be mitigated in varuous ways:

  1. PRAGMA journal_mode=wal
  2. Cache=Shared in the connection string
  3. Draining the reader with while (reader.Read())

@bricelam
Copy link
Contributor

It's a shame we can't retry these. I tried, but SQLite resets the whole statement and starts returning duplicate rows.

This whole thing feels backwards to me. Why start returning "inserted" values before they've actually been inserted??

bricelam added a commit to bricelam/efcore that referenced this issue May 31, 2023
@bricelam bricelam added the closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. label May 31, 2023
@ajcvickers ajcvickers added this to the 7.0.x milestone Jun 7, 2023
bricelam added a commit to bricelam/efcore that referenced this issue Jun 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-adonet-sqlite closed-fixed The issue has been fixed and is/will be included in the release indicated by the issue milestone. customer-reported Servicing-approved type-bug
Projects
None yet
Development

No branches or pull requests

7 participants