Microsoft.Data.Sqlite: Cleanup when error occurs on dispose. #32605

ajcvickers · 2023-12-13T14:19:21Z

vonzshik · 2023-12-14T08:29:40Z

src/Microsoft.Data.Sqlite.Core/SqliteTransaction.cs

+            }
+            finally
+            {
+                Complete();


That's probably not a good thing to do. According to documentation, COMMIT may throw a SQLITE_BUSY error, which means that while the transaction is not committed, it still can be retried. Calling Complete anyway will make it so that transaction will be stuck with no way to close it (other than querying COMMIT/ROLLBACK explicitly).

@vonzshik Thanks for looking at this. Do you think we should re-try for SQLITE_BUSY (as we do elsewhere) but call Complete immediately for other errors? Or do something different?

I think it should be retried at least a few times (or for a few seconds) and Complete should be called only if Commit has completes successfully (otherwise it's gonna be called while disposing transaction since the completed flag is not set). Now, the way to implement retry logic is... surprisingly complicated.

I'm not expert on sqlite, so I took liberty to take a look at how other providers handle same problem. For example, both sqlite-jdbc and go-sqlite3 do not retry explicitly. Instead, they use sqlite's Busy Timeout, which already does the exact same thing automatically (after it's set sqlite adds a handler to retry on Busy error until the timeout is reached or the statement completes successfully).
There are 2 problems with using Busy Timeout:

MDS for some reason doesn't really support Busy Timeout out of the box (that is, there is no connection string parameter or some other method to set it other than for users to write a query themselves each time they get a connection from the pool).

MDS already has it's own mechanism for retrying (SqliteCommand and SqliteDataReader). Adding another retry mechanism on top of it might lead to some funny interactions, where CommandTimeout is not honored because Busy Timeout will be higher than it. Removing that thing is also not that easy since it additionally handles SQLITE_LOCKED and SQLITE_LOCKED_SHAREDCACHE.

efcore/src/Microsoft.Data.Sqlite.Core/SqliteDataReader.cs

Lines 210 to 211 in a7a1019

private static bool IsBusy(int rc)

=> rc is SQLITE_LOCKED or SQLITE_BUSY or SQLITE_LOCKED_SHAREDCACHE;

I haven't looked through the issues regarding MDS, but if there is no one complaining about Commit throwing SQLITE_BUSY error, I would have recommended to fix it in a separate pr (probably while implementing Busy Timeout), since from the looks of things it has always been possible to get it.

There is also #29514 which might have been fixed if Busy Timeout has been implemented, since by default it's 0.

cc @roji as it's mostly about driver design and I know how much he likes retrying in a driver 🐝

Thanks for all the insights @vonzshik :)

Yeah, retrying in a driver generally isn't great, but there seem to be SQLite-specific reasons why that may make sense... Databases (e.g. PG) don't generally tell you "busy, try again later", but SQLite does - so I guess it's justified here (but I know very little about it).

My 2c:

I definitely agree with you that changing transaction/connection state to completed when we don't know that the commit succeeded is a bad idea.

If the driver generally does a retry loop when it gets "busy" - in other places - then it makes sense for me to do that for commit as well; unless there's some specific reason why Commit is different?

The specific mechanism for retrying can be dealt with as an orthogonal question in another issue, i.e. we can basically do the same thing here in Commit that we do in other places, at least for now.

For any non-busy error, I don't think we can do anything but bubble it up to the user. The user may chose to attempt to re-commit, though I suspect there will be very little use in doing that (as non-busy errors are probably unlikely to be transient with SQLite?). Otherwise they can declare the transaction/connection as "doomed" (in an unknown/unrecoverable state), and open a different connection. From a (very) brief look at the code, Complete() doesn't seem to release any unmanaged resources etc., so not calling it shouldn't produce a leak?

But I really can't say I know much either about SQLite or about M.D.SQLite... How does the above sound?

I pretty much agree with all of the above, just to clarify:

I do think eventually there should be retrying, though I still think it's better to implement in separate pr.

The reason for that is, what value to use to terminate retrying? Is 5 seconds enough? Or maybe use CommandTimeout from connection string? How much to wait between retries?

Additionally, if nobody ever complained and it's not even blocking MDS users (they can call Commit themselves in a loop), then it's definitely not that urgent to implement and we can take time to do a bit of research.

Using Busy Timeout instead of whatever MDS does now seems a bit better in a sense that it aligns with other drivers and a more foolproof (don't have to add IsBusy check everywhere). But it mostly depends on determining the why MDS does whatever it does and whether it can be safely removed.

Thanks for all the feedback...

I took another look at the code, and I think I was a bit confused here... Commit (and Rollback) do their thing by calling "ExecuteQuery", which internally already implements retrying - so if I understand the code correctly, retrying for busy is already implemented (via M.D.Sqlite's current mechanism) and there's nothing for us to do for that... Can you confirm that's your understanding as well?

I do agree it makes sense to rethink how retries are actually handled (e.g. use Busy Timeout instead of our own thing), but that seems completely orthogonal (we should open a separate issue).

If the above is right, then we should be able to just close this and #25119.

I took another look at the code, and I think I was a bit confused here... Commit (and Rollback) do their thing by calling "ExecuteQuery", which internally already implements retrying - so if I understand the code correctly, retrying for busy is already implemented (via M.D.Sqlite's current mechanism) and there's nothing for us to do for that... Can you confirm that's your understanding as well?

Indeed it does! Made a repro just in case, here's the stacktrace we get (I can publish it on github it you're interested):

// Ignore top exception, that's just to confirm how much time it takes to fail Unhandled exception. System.Exception: Exception after 30141,8779ms ---> Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 5: 'cannot commit transaction - SQL statements in progress'. at Microsoft.Data.Sqlite.SqliteException.ThrowExceptionForRC(Int32 rc, sqlite3 db) at Microsoft.Data.Sqlite.SqliteDataReader.NextResult() at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior) at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader() at Microsoft.Data.Sqlite.SqliteCommand.ExecuteNonQuery() at Microsoft.Data.Sqlite.SqliteConnectionExtensions.ExecuteNonQuery(SqliteConnection connection, String commandText, SqliteParameter[] parameters) at Microsoft.Data.Sqlite.SqliteTransaction.Commit()

So yes, there is already a retry system working and it uses DefaultTimeout from connection string.

Great, thanks for checking it out!

Just to add to the above. There was an issue about problems with RETURNING (#30851), which was fixed by throwing an exception.

efcore/src/Microsoft.Data.Sqlite.Core/SqliteDataRecord.cs

Lines 441 to 445 in b5b1abf

var rc = sqlite3_reset(Handle);

if (!_alreadyThrown)

{

SqliteException.ThrowExceptionForRC(rc, _connection.Handle);

}

Now, it doesn't account for the error being SQLITE_BUSY, in which case it just immediately exits instead of retrying.

Unhandled exception. System.Exception: Exception after 18,7732ms ---> Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 5: 'database is locked'. at Microsoft.Data.Sqlite.SqliteDataReader.Dispose(Boolean disposing)

Setting PRAGMA busy_timeout = 30000 beforehand does fix the issue.

@vonzshik thanks, can you open an issue for that please?

ajcvickers · 2024-01-08T18:55:46Z

So, as I understand it, there is nothing to do here?

vonzshik · 2024-01-08T19:11:01Z

So, as I understand it, there is nothing to do here?

Not exactly. Essentially, the way it should be is that both public Commit and Rollback do something like:

public override void Commit()
{
    //...
    sqlite3_rollback_hook(_connection.Handle, null, null);
    _connection.ExecuteNonQuery("COMMIT;");
    Complete();
}

No try-finally there, Complete is called only if there is no error. On the other hand, Dispose does rollback and always calls Complete.

Full listing how I think everything should look (technically Dispose shouldn't throw an exception, but with ADO.NET it maybe makes sense):

public override void Commit()
{
	if (ExternalRollback
		|| _completed
		|| _connection!.State != ConnectionState.Open)
	{
		throw new InvalidOperationException(Resources.TransactionCompleted);
	}

	sqlite3_rollback_hook(_connection.Handle, null, null);
	_connection.ExecuteNonQuery("COMMIT;");
	Complete();
}

public override void Rollback()
{
	if (_completed || _connection!.State != ConnectionState.Open)
	{
		throw new InvalidOperationException(Resources.TransactionCompleted);
	}

	RollbackInternal();
	Complete();
}

protected override void Dispose(bool disposing)
{
	if (disposing
		&& !_completed
		&& _connection!.State == ConnectionState.Open)
	{
		try
		{
			RollbackInternal();
		}
		finally
		{
			Complete();
		}
	}
}

private void Complete()
{
	if (IsolationLevel == IsolationLevel.ReadUncommitted)
	{
		try
		{
			_connection!.ExecuteNonQuery("PRAGMA read_uncommitted = 0;");
		}
		catch
		{
			// Ignore failure attempting to clean up.
		}
	}

	_connection!.Transaction = null;
	_connection = null;
	_completed = true;
}

Fixes #25119

roji · 2024-01-09T15:28:34Z

technically Dispose shouldn't throw an exception, but with ADO.NET it maybe makes sense

Yeah, this is a long-standing discussion - remember decided to throw from NpgsqlDataReader.DIspose() in the end, and .NET's FileStream also does that... So I don't think it's an absolute rule etc.

ajcvickers requested a review from a team December 13, 2023 14:19

AndriySvyryd approved these changes Dec 14, 2023

View reviewed changes

vonzshik reviewed Dec 14, 2023

View reviewed changes

ajcvickers added the blocked label Dec 15, 2023

ajcvickers added 2 commits January 8, 2024 19:54

Microsoft.Data.Sqlite: Cleanup when error occurs on dispose.

b21f0f8

Fixes #25119

Make changes as suggested by vonzshik

d2f367d

ajcvickers force-pushed the 231213_DisposableNappy branch from c1b2fda to d2f367d Compare January 9, 2024 13:18

ajcvickers removed the blocked label Jan 9, 2024

ajcvickers merged commit 21a568d into main Jan 9, 2024
7 checks passed

ajcvickers deleted the 231213_DisposableNappy branch January 9, 2024 15:09

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Microsoft.Data.Sqlite: Cleanup when error occurs on dispose. #32605

Microsoft.Data.Sqlite: Cleanup when error occurs on dispose. #32605

ajcvickers commented Dec 13, 2023

vonzshik Dec 14, 2023 •

edited

Loading

ajcvickers Jan 6, 2024

vonzshik Jan 6, 2024 •

edited

Loading

roji Jan 6, 2024 •

edited

Loading

vonzshik Jan 6, 2024

roji Jan 7, 2024

vonzshik Jan 7, 2024

roji Jan 7, 2024

vonzshik Jan 7, 2024 •

edited

Loading

roji Jan 8, 2024

ajcvickers commented Jan 8, 2024

vonzshik commented Jan 8, 2024

roji commented Jan 9, 2024

	private static bool IsBusy(int rc)
	=> rc is SQLITE_LOCKED or SQLITE_BUSY or SQLITE_LOCKED_SHAREDCACHE;

	var rc = sqlite3_reset(Handle);
	if (!_alreadyThrown)
	{
	SqliteException.ThrowExceptionForRC(rc, _connection.Handle);
	}

Microsoft.Data.Sqlite: Cleanup when error occurs on dispose. #32605

Microsoft.Data.Sqlite: Cleanup when error occurs on dispose. #32605

Conversation

ajcvickers commented Dec 13, 2023

vonzshik Dec 14, 2023 • edited Loading

Choose a reason for hiding this comment

ajcvickers Jan 6, 2024

Choose a reason for hiding this comment

vonzshik Jan 6, 2024 • edited Loading

Choose a reason for hiding this comment

roji Jan 6, 2024 • edited Loading

Choose a reason for hiding this comment

vonzshik Jan 6, 2024

Choose a reason for hiding this comment

roji Jan 7, 2024

Choose a reason for hiding this comment

vonzshik Jan 7, 2024

Choose a reason for hiding this comment

roji Jan 7, 2024

Choose a reason for hiding this comment

vonzshik Jan 7, 2024 • edited Loading

Choose a reason for hiding this comment

roji Jan 8, 2024

Choose a reason for hiding this comment

ajcvickers commented Jan 8, 2024

vonzshik commented Jan 8, 2024

roji commented Jan 9, 2024

vonzshik Dec 14, 2023 •

edited

Loading

vonzshik Jan 6, 2024 •

edited

Loading

roji Jan 6, 2024 •

edited

Loading

vonzshik Jan 7, 2024 •

edited

Loading