Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Microsoft.Data.Sqlite: Cleanup when error occurs on dispose. #32605

Merged
merged 2 commits into from
Jan 9, 2024

Conversation

ajcvickers
Copy link
Contributor

Fixes #25119

@ajcvickers ajcvickers requested a review from a team December 13, 2023 14:19
}
finally
{
Complete();
Copy link
Contributor

@vonzshik vonzshik Dec 14, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's probably not a good thing to do. According to documentation, COMMIT may throw a SQLITE_BUSY error, which means that while the transaction is not committed, it still can be retried. Calling Complete anyway will make it so that transaction will be stuck with no way to close it (other than querying COMMIT/ROLLBACK explicitly).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vonzshik Thanks for looking at this. Do you think we should re-try for SQLITE_BUSY (as we do elsewhere) but call Complete immediately for other errors? Or do something different?

Copy link
Contributor

@vonzshik vonzshik Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it should be retried at least a few times (or for a few seconds) and Complete should be called only if Commit has completes successfully (otherwise it's gonna be called while disposing transaction since the completed flag is not set). Now, the way to implement retry logic is... surprisingly complicated.

I'm not expert on sqlite, so I took liberty to take a look at how other providers handle same problem. For example, both sqlite-jdbc and go-sqlite3 do not retry explicitly. Instead, they use sqlite's Busy Timeout, which already does the exact same thing automatically (after it's set sqlite adds a handler to retry on Busy error until the timeout is reached or the statement completes successfully).
There are 2 problems with using Busy Timeout:

  1. MDS for some reason doesn't really support Busy Timeout out of the box (that is, there is no connection string parameter or some other method to set it other than for users to write a query themselves each time they get a connection from the pool).
  2. MDS already has it's own mechanism for retrying (SqliteCommand and SqliteDataReader). Adding another retry mechanism on top of it might lead to some funny interactions, where CommandTimeout is not honored because Busy Timeout will be higher than it. Removing that thing is also not that easy since it additionally handles SQLITE_LOCKED and SQLITE_LOCKED_SHAREDCACHE.

private static bool IsBusy(int rc)
=> rc is SQLITE_LOCKED or SQLITE_BUSY or SQLITE_LOCKED_SHAREDCACHE;

I haven't looked through the issues regarding MDS, but if there is no one complaining about Commit throwing SQLITE_BUSY error, I would have recommended to fix it in a separate pr (probably while implementing Busy Timeout), since from the looks of things it has always been possible to get it.

There is also #29514 which might have been fixed if Busy Timeout has been implemented, since by default it's 0.

cc @roji as it's mostly about driver design and I know how much he likes retrying in a driver 🐝

Copy link
Member

@roji roji Jan 6, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the insights @vonzshik :)

Yeah, retrying in a driver generally isn't great, but there seem to be SQLite-specific reasons why that may make sense... Databases (e.g. PG) don't generally tell you "busy, try again later", but SQLite does - so I guess it's justified here (but I know very little about it).

My 2c:

  • I definitely agree with you that changing transaction/connection state to completed when we don't know that the commit succeeded is a bad idea.
  • If the driver generally does a retry loop when it gets "busy" - in other places - then it makes sense for me to do that for commit as well; unless there's some specific reason why Commit is different?
  • The specific mechanism for retrying can be dealt with as an orthogonal question in another issue, i.e. we can basically do the same thing here in Commit that we do in other places, at least for now.
  • For any non-busy error, I don't think we can do anything but bubble it up to the user. The user may chose to attempt to re-commit, though I suspect there will be very little use in doing that (as non-busy errors are probably unlikely to be transient with SQLite?). Otherwise they can declare the transaction/connection as "doomed" (in an unknown/unrecoverable state), and open a different connection. From a (very) brief look at the code, Complete() doesn't seem to release any unmanaged resources etc., so not calling it shouldn't produce a leak?

But I really can't say I know much either about SQLite or about M.D.SQLite... How does the above sound?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I pretty much agree with all of the above, just to clarify:

  • I do think eventually there should be retrying, though I still think it's better to implement in separate pr.
  • The reason for that is, what value to use to terminate retrying? Is 5 seconds enough? Or maybe use CommandTimeout from connection string? How much to wait between retries?
  • Additionally, if nobody ever complained and it's not even blocking MDS users (they can call Commit themselves in a loop), then it's definitely not that urgent to implement and we can take time to do a bit of research.
  • Using Busy Timeout instead of whatever MDS does now seems a bit better in a sense that it aligns with other drivers and a more foolproof (don't have to add IsBusy check everywhere). But it mostly depends on determining the why MDS does whatever it does and whether it can be safely removed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for all the feedback...

I took another look at the code, and I think I was a bit confused here... Commit (and Rollback) do their thing by calling "ExecuteQuery", which internally already implements retrying - so if I understand the code correctly, retrying for busy is already implemented (via M.D.Sqlite's current mechanism) and there's nothing for us to do for that... Can you confirm that's your understanding as well?

I do agree it makes sense to rethink how retries are actually handled (e.g. use Busy Timeout instead of our own thing), but that seems completely orthogonal (we should open a separate issue).

If the above is right, then we should be able to just close this and #25119.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took another look at the code, and I think I was a bit confused here... Commit (and Rollback) do their thing by calling "ExecuteQuery", which internally already implements retrying - so if I understand the code correctly, retrying for busy is already implemented (via M.D.Sqlite's current mechanism) and there's nothing for us to do for that... Can you confirm that's your understanding as well?

Indeed it does! Made a repro just in case, here's the stacktrace we get (I can publish it on github it you're interested):

// Ignore top exception, that's just to confirm how much time it takes to fail
Unhandled exception. System.Exception: Exception after 30141,8779ms                                                                                                          
 ---> Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 5: 'cannot commit transaction - SQL statements in progress'.                                          
   at Microsoft.Data.Sqlite.SqliteException.ThrowExceptionForRC(Int32 rc, sqlite3 db)                                                                                        
   at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()                                                                                                                    
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior)                                                                                            
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader()                                                                                                                    
   at Microsoft.Data.Sqlite.SqliteCommand.ExecuteNonQuery()                                                                                                                  
   at Microsoft.Data.Sqlite.SqliteConnectionExtensions.ExecuteNonQuery(SqliteConnection connection, String commandText, SqliteParameter[] parameters)                        
   at Microsoft.Data.Sqlite.SqliteTransaction.Commit()  

So yes, there is already a retry system working and it uses DefaultTimeout from connection string.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, thanks for checking it out!

Copy link
Contributor

@vonzshik vonzshik Jan 7, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to add to the above. There was an issue about problems with RETURNING (#30851), which was fixed by throwing an exception.

var rc = sqlite3_reset(Handle);
if (!_alreadyThrown)
{
SqliteException.ThrowExceptionForRC(rc, _connection.Handle);
}

Now, it doesn't account for the error being SQLITE_BUSY, in which case it just immediately exits instead of retrying.

Unhandled exception. System.Exception: Exception after 18,7732ms                                                                                                             
 ---> Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 5: 'database is locked'.                                                                              
   at Microsoft.Data.Sqlite.SqliteDataReader.Dispose(Boolean disposing)  

Setting PRAGMA busy_timeout = 30000 beforehand does fix the issue.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vonzshik thanks, can you open an issue for that please?

@ajcvickers
Copy link
Contributor Author

So, as I understand it, there is nothing to do here?

@vonzshik
Copy link
Contributor

vonzshik commented Jan 8, 2024

So, as I understand it, there is nothing to do here?

Not exactly. Essentially, the way it should be is that both public Commit and Rollback do something like:

public override void Commit()
{
    //...
    sqlite3_rollback_hook(_connection.Handle, null, null);
    _connection.ExecuteNonQuery("COMMIT;");
    Complete();
}

No try-finally there, Complete is called only if there is no error. On the other hand, Dispose does rollback and always calls Complete.

Full listing how I think everything should look (technically Dispose shouldn't throw an exception, but with ADO.NET it maybe makes sense):

public override void Commit()
{
	if (ExternalRollback
		|| _completed
		|| _connection!.State != ConnectionState.Open)
	{
		throw new InvalidOperationException(Resources.TransactionCompleted);
	}

	sqlite3_rollback_hook(_connection.Handle, null, null);
	_connection.ExecuteNonQuery("COMMIT;");
	Complete();
}

public override void Rollback()
{
	if (_completed || _connection!.State != ConnectionState.Open)
	{
		throw new InvalidOperationException(Resources.TransactionCompleted);
	}

	RollbackInternal();
	Complete();
}

protected override void Dispose(bool disposing)
{
	if (disposing
		&& !_completed
		&& _connection!.State == ConnectionState.Open)
	{
		try
		{
			RollbackInternal();
		}
		finally
		{
			Complete();
		}
	}
}

private void Complete()
{
	if (IsolationLevel == IsolationLevel.ReadUncommitted)
	{
		try
		{
			_connection!.ExecuteNonQuery("PRAGMA read_uncommitted = 0;");
		}
		catch
		{
			// Ignore failure attempting to clean up.
		}
	}

	_connection!.Transaction = null;
	_connection = null;
	_completed = true;
}

@ajcvickers ajcvickers force-pushed the 231213_DisposableNappy branch from c1b2fda to d2f367d Compare January 9, 2024 13:18
@ajcvickers ajcvickers removed the blocked label Jan 9, 2024
@ajcvickers ajcvickers merged commit 21a568d into main Jan 9, 2024
7 checks passed
@ajcvickers ajcvickers deleted the 231213_DisposableNappy branch January 9, 2024 15:09
@roji
Copy link
Member

roji commented Jan 9, 2024

technically Dispose shouldn't throw an exception, but with ADO.NET it maybe makes sense

Yeah, this is a long-standing discussion - remember decided to throw from NpgsqlDataReader.DIspose() in the end, and .NET's FileStream also does that... So I don't think it's an absolute rule etc.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
4 participants