-
Notifications
You must be signed in to change notification settings - Fork 266
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unclean close leaves Litestream in weird state #510
Comments
Do you know if it's your |
It's definitely the cli that's removing the WAL but Litestream not closing cleanly after that is slightly problematic because it's hard to know from the caller if the database has been closed or not as the function returns an error and can't be called again. Definitely not something that one would expect in normal circumstances but not being able to retry or "safely" ignore the error from Close() is what I'm looking at here. |
I'm currently testing a race where I start replicating and call Sync() once and immediately after I run sqlite3 once that writes and it somehow succeeds to roll in the WAL but that shouldn't be possible if there's a read lock by Litestream, right? I've verified the read lock is in fact acquired successfully before the CLI goes in and does damage. It doesn't always happen so there seems to be a race somewhere that I can't prevent by calling Sync() manually once on startup. My current theory is the read lock doesn't prevent rolling in the WAL file for whatever reason and is basically no-op for that sometimes but not always. |
After a bit of testing, it seems that to be safe one should always use To do this on startup, one can put the meta-command in a sqlite init script: Create a sqlite init script
sqlite3 --init sqlite3.init database.sqlite In my case, I created a wrapper script called I'm pretty satisfied with this solution for now. Thanks for raising this issue and the response! |
If Litestream is used as a library and it can be closed and re-opened against the same database this leaking fd can break SQLite database file locking. During normal cli operation this race cannot happen as the whole process will exit after closing databases. Closing Litestream and opening again right after will leave the unclosed file handle out-of-scope and free to garbage collect. Go tries to be helpful and closes the file handle during GC to prevent it leaking permanently but that will cause a race with POSIX advisory locks. This order of operations will release the _real_ lock the embedded SQLite of Litestream is holding while leaving it clueless. That will allow the main application to take a new lock on the file and eventually both may be checkpointing the same WAL into the main database file causing permanent corruption. https://www.sqlite.org/howtocorrupt.html#_posix_advisory_locks_canceled_by_a_separate_thread_doing_close_ Fixes benbjohnson#510
This was actually worse than I thought and eventually lead to database corruption, opened #519 with a fix. |
I've been using Litestream as a library and tried a couple of things to abuse it since the usage pattern is a bit different than when using it as a separate process.
What I'm seeing is that the final sync at the beginning of Close() will fail due to this because I had sqlite3 cli just writing while closing (and rolling in the WAL):
cannot verify wal state: stat foo/bar.db-wal: no such file or directory
. This might be fine to ignore but the caller doesn't know it and calling Close() again will fail because the underlying database connection has already been closed and will always return an error at that point.This is very much an edge case of an edge case and it shouldn't ever really happen but the way database Close() is structured it's designed to ignore all errors and just continue but it might be better to return early and allow calling it more than once to retry if it's a passing issue or handle the errors more gracefully. Now because it's designed to ignore errors the caller needs to assume the database is somewhat closed and ignore the returned error and at most log it.
The text was updated successfully, but these errors were encountered: