-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Microsoft.Data.Sqlite: Cleanup when error occurs on dispose. #32605
Conversation
} | ||
finally | ||
{ | ||
Complete(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's probably not a good thing to do. According to documentation, COMMIT
may throw a SQLITE_BUSY
error, which means that while the transaction is not committed, it still can be retried. Calling Complete
anyway will make it so that transaction will be stuck with no way to close it (other than querying COMMIT/ROLLBACK
explicitly).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vonzshik Thanks for looking at this. Do you think we should re-try for SQLITE_BUSY (as we do elsewhere) but call Complete immediately for other errors? Or do something different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it should be retried at least a few times (or for a few seconds) and Complete
should be called only if Commit
has completes successfully (otherwise it's gonna be called while disposing transaction since the completed
flag is not set). Now, the way to implement retry logic is... surprisingly complicated.
I'm not expert on sqlite, so I took liberty to take a look at how other providers handle same problem. For example, both sqlite-jdbc and go-sqlite3 do not retry explicitly. Instead, they use sqlite's Busy Timeout, which already does the exact same thing automatically (after it's set sqlite adds a handler to retry on Busy
error until the timeout is reached or the statement completes successfully).
There are 2 problems with using Busy Timeout
:
- MDS for some reason doesn't really support
Busy Timeout
out of the box (that is, there is no connection string parameter or some other method to set it other than for users to write a query themselves each time they get a connection from the pool). - MDS already has it's own mechanism for retrying (SqliteCommand and SqliteDataReader). Adding another retry mechanism on top of it might lead to some funny interactions, where
CommandTimeout
is not honored becauseBusy Timeout
will be higher than it. Removing that thing is also not that easy since it additionally handlesSQLITE_LOCKED
andSQLITE_LOCKED_SHAREDCACHE
.
efcore/src/Microsoft.Data.Sqlite.Core/SqliteDataReader.cs
Lines 210 to 211 in a7a1019
private static bool IsBusy(int rc) | |
=> rc is SQLITE_LOCKED or SQLITE_BUSY or SQLITE_LOCKED_SHAREDCACHE; |
I haven't looked through the issues regarding MDS, but if there is no one complaining about Commit
throwing SQLITE_BUSY
error, I would have recommended to fix it in a separate pr (probably while implementing Busy Timeout
), since from the looks of things it has always been possible to get it.
There is also #29514 which might have been fixed if Busy Timeout
has been implemented, since by default it's 0.
cc @roji as it's mostly about driver design and I know how much he likes retrying in a driver 🐝
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the insights @vonzshik :)
Yeah, retrying in a driver generally isn't great, but there seem to be SQLite-specific reasons why that may make sense... Databases (e.g. PG) don't generally tell you "busy, try again later", but SQLite does - so I guess it's justified here (but I know very little about it).
My 2c:
- I definitely agree with you that changing transaction/connection state to completed when we don't know that the commit succeeded is a bad idea.
- If the driver generally does a retry loop when it gets "busy" - in other places - then it makes sense for me to do that for commit as well; unless there's some specific reason why Commit is different?
- The specific mechanism for retrying can be dealt with as an orthogonal question in another issue, i.e. we can basically do the same thing here in Commit that we do in other places, at least for now.
- For any non-busy error, I don't think we can do anything but bubble it up to the user. The user may chose to attempt to re-commit, though I suspect there will be very little use in doing that (as non-busy errors are probably unlikely to be transient with SQLite?). Otherwise they can declare the transaction/connection as "doomed" (in an unknown/unrecoverable state), and open a different connection. From a (very) brief look at the code, Complete() doesn't seem to release any unmanaged resources etc., so not calling it shouldn't produce a leak?
But I really can't say I know much either about SQLite or about M.D.SQLite... How does the above sound?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I pretty much agree with all of the above, just to clarify:
- I do think eventually there should be retrying, though I still think it's better to implement in separate pr.
- The reason for that is, what value to use to terminate retrying? Is 5 seconds enough? Or maybe use
CommandTimeout
from connection string? How much to wait between retries? - Additionally, if nobody ever complained and it's not even blocking MDS users (they can call
Commit
themselves in a loop), then it's definitely not that urgent to implement and we can take time to do a bit of research. - Using
Busy Timeout
instead of whatever MDS does now seems a bit better in a sense that it aligns with other drivers and a more foolproof (don't have to addIsBusy
check everywhere). But it mostly depends on determining the why MDS does whatever it does and whether it can be safely removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the feedback...
I took another look at the code, and I think I was a bit confused here... Commit (and Rollback) do their thing by calling "ExecuteQuery", which internally already implements retrying - so if I understand the code correctly, retrying for busy is already implemented (via M.D.Sqlite's current mechanism) and there's nothing for us to do for that... Can you confirm that's your understanding as well?
I do agree it makes sense to rethink how retries are actually handled (e.g. use Busy Timeout instead of our own thing), but that seems completely orthogonal (we should open a separate issue).
If the above is right, then we should be able to just close this and #25119.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took another look at the code, and I think I was a bit confused here... Commit (and Rollback) do their thing by calling "ExecuteQuery", which internally already implements retrying - so if I understand the code correctly, retrying for busy is already implemented (via M.D.Sqlite's current mechanism) and there's nothing for us to do for that... Can you confirm that's your understanding as well?
Indeed it does! Made a repro just in case, here's the stacktrace we get (I can publish it on github it you're interested):
// Ignore top exception, that's just to confirm how much time it takes to fail
Unhandled exception. System.Exception: Exception after 30141,8779ms
---> Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 5: 'cannot commit transaction - SQL statements in progress'.
at Microsoft.Data.Sqlite.SqliteException.ThrowExceptionForRC(Int32 rc, sqlite3 db)
at Microsoft.Data.Sqlite.SqliteDataReader.NextResult()
at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader(CommandBehavior behavior)
at Microsoft.Data.Sqlite.SqliteCommand.ExecuteReader()
at Microsoft.Data.Sqlite.SqliteCommand.ExecuteNonQuery()
at Microsoft.Data.Sqlite.SqliteConnectionExtensions.ExecuteNonQuery(SqliteConnection connection, String commandText, SqliteParameter[] parameters)
at Microsoft.Data.Sqlite.SqliteTransaction.Commit()
So yes, there is already a retry system working and it uses DefaultTimeout
from connection string.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great, thanks for checking it out!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just to add to the above. There was an issue about problems with RETURNING
(#30851), which was fixed by throwing an exception.
efcore/src/Microsoft.Data.Sqlite.Core/SqliteDataRecord.cs
Lines 441 to 445 in b5b1abf
var rc = sqlite3_reset(Handle); | |
if (!_alreadyThrown) | |
{ | |
SqliteException.ThrowExceptionForRC(rc, _connection.Handle); | |
} |
Now, it doesn't account for the error being SQLITE_BUSY
, in which case it just immediately exits instead of retrying.
Unhandled exception. System.Exception: Exception after 18,7732ms
---> Microsoft.Data.Sqlite.SqliteException (0x80004005): SQLite Error 5: 'database is locked'.
at Microsoft.Data.Sqlite.SqliteDataReader.Dispose(Boolean disposing)
Setting PRAGMA busy_timeout = 30000
beforehand does fix the issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vonzshik thanks, can you open an issue for that please?
So, as I understand it, there is nothing to do here? |
Not exactly. Essentially, the way it should be is that both public public override void Commit()
{
//...
sqlite3_rollback_hook(_connection.Handle, null, null);
_connection.ExecuteNonQuery("COMMIT;");
Complete();
} No try-finally there, Full listing how I think everything should look (technically public override void Commit()
{
if (ExternalRollback
|| _completed
|| _connection!.State != ConnectionState.Open)
{
throw new InvalidOperationException(Resources.TransactionCompleted);
}
sqlite3_rollback_hook(_connection.Handle, null, null);
_connection.ExecuteNonQuery("COMMIT;");
Complete();
}
public override void Rollback()
{
if (_completed || _connection!.State != ConnectionState.Open)
{
throw new InvalidOperationException(Resources.TransactionCompleted);
}
RollbackInternal();
Complete();
}
protected override void Dispose(bool disposing)
{
if (disposing
&& !_completed
&& _connection!.State == ConnectionState.Open)
{
try
{
RollbackInternal();
}
finally
{
Complete();
}
}
}
private void Complete()
{
if (IsolationLevel == IsolationLevel.ReadUncommitted)
{
try
{
_connection!.ExecuteNonQuery("PRAGMA read_uncommitted = 0;");
}
catch
{
// Ignore failure attempting to clean up.
}
}
_connection!.Transaction = null;
_connection = null;
_completed = true;
} |
c1b2fda
to
d2f367d
Compare
Yeah, this is a long-standing discussion - remember decided to throw from NpgsqlDataReader.DIspose() in the end, and .NET's FileStream also does that... So I don't think it's an absolute rule etc. |
Fixes #25119