Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Database is locked -- Why do we inform people to enabled shared cache mode? #632

Closed
bvinc opened this issue Sep 14, 2018 · 7 comments
Closed

Comments

@bvinc
Copy link

bvinc commented Sep 14, 2018

I think that this project is suffering from a misunderstanding of SQLITE_LOCK vs SQLITE_BUSY. I'd like to start a discussion about this, and why anyone has ever encouraged to enabled shared cache mode to solve a SQLITE_LOCK error.

The SQLITE_BUSY result code indicates that the database file could not be written (or in some cases read) because of concurrent activity by some other database connection, usually a database connection in a separate process.

The SQLITE_LOCKED result code indicates that a write operation could not continue because of a conflict within the same database connection or a conflict with a different database connection that uses a shared cache.

https://www.sqlite.org/cvstrac/wiki?p=DatabaseIsLocked

Without shared cache mode:

SQLITE_BUSY is expected to happen with any multithreaded application. This just means that two connections are conflicting with each other. You need to set up a busy timeout so that SQLite can automatically retry in this case. Problem solved.

SQLITE_LOCKED only happens if a single SQLite connection is doing something that doesn't make sense, such as dropping a table while still being in the middle of a SELECT statement. If someone is getting this error, either they're doing something crazy, or a connection was returned to the pool without being ready to be returned to the pool, because statements are still executing without being finalized.

Without shared cache mode, these are the only problems that you should expect.

But we've encouraged everyone to enable shared cache mode, so now the problem has changed. In shared cache mode, there is really only ever a single connection to the database at a time, and all "connections" are just sharing the same connection.

With shared cache mode:

SQLITE_BUSY pretty much doesn't happen. You can't conflict with other connections if there is only a single connection.

SQLITE_LOCKED will happen much more often. Now, normal, seemingly separate connections will get SQLITE_LOCKED if they happen to run conflicting statements at the same time. Worse, you can't set a busy timeout to fix it. If a lock error happens, it's failed. The only way around this now is to support the unlock_notify API, which this driver currently does not support. So we dig our heels in even further and limit the connection pool to only a single connection.

So basically, it seems incorrect to me that you should ever suggest people turn on shared cache mode to fix SQLITE_LOCKED errors. It should only increase the frequency of SQLITE_LOCKED. Is there something that I'm misunderstanding? Is there a reason I'm not understanding why SQLITE_LOCKED errors are unavoidable without shared cache mode?

@rittneje
Copy link
Collaborator

rittneje commented Dec 30, 2018

So we dig our heels in even further and limit the connection pool to only a single connection.

Also of note is the fact that if you only have a single connection to a given database, using a shared cache is pointless and just adds unnecessary overhead.

No doubt adding to the confusion is the fact that when getting the error message, the SQLite library itself turns SQLITE_BUSY into "database is locked" and SQLITE_LOCKED into "database table is locked". It's very easy to see the former error message and mistake it for SQLITE_LOCKED.

As for why shared cache is recommended, one of the earliest bug reports about "database is locked" I can find is #39. The reason turning on the shared cache fixes the problem is:

  1. The Go standard library assumes you want isolation between the result fetching and the modification. Consequently, it tries to establish a second database connection in this case.
  2. Until you call sqlite3_reset on a prepared statement for which you are fetching results, that connection maintains a read lock on the database.
  3. When a connection tries to write to the database, it must first acquire the write lock. If there is at least one active reader, you get SQLITE_BUSY. (If you have a busy timeout, it will retry until the timeout before returning.)
  4. Since two connections that are using a shared cache work together as a single connection, SQLite will allow one connection to write in the middle of the other connection's read, as if they were both the same connection.

Taking these points all together, it seems like people were trying to do something that, while allowed by SQLite, is disallowed by the standard library. The standard library then opens a second connection behind the scenes, and the user gets a mysterious SQLITE_BUSY error.

The driver does seem to support the unlock notify API now, provided you build with the sqlite_unlock_notify tag.

It seems to me that there a few ways to use this library correctly:

  1. Force the connection pool to a single connection with a private cache. You can never try to modify the database while iterating through result rows, as doing so would result in a deadlock.
  2. Allow multiple connections in the pool with each having a private cache. This will require setting an appropriate busy timeout. You can never try to modify the database while iterating through result rows, as doing so would result in SQLITE_BUSY.
  3. Allow multiple connections in the pool with each having a shared cache. You can modify the database while iterating through result rows. However, you cannot modify the table(s) from which you are reading. (If using the unlock notify API, this is because the connection doing the modification will wait for the connection doing the reading to finish, which never happens, resulting in a deadlock. If the unlock notify API were not being used, this would instead result in SQLITE_LOCKED.)
  4. Allow multiple connections in the pool with each having a private cache, and use write-ahead logging. You can modify the database while iterating through result rows, including the table(s) from which you are reading. Note that this requires private caches; using a shared cache will result in the same issue mentioned in (3). If you will be reading and writing from within the same transaction, you MUST use BEGIN IMMEDIATE, as described in (5).
  5. Always use the same transaction for fetching result rows and modifying the database if you need to modify the database within a rows.Next() loop. In this case, it does not matter whether you use WAL or a rollback journal, nor does it matter whether you use a private or shared cache. However, you MUST use BEGIN IMMEDIATE (exposed as _txlock in the DSN) for any transactions that could modify the database. Consider making two pools: one with mode=ro&_txlock=deferred and another with mode=rw&_txlock=immediate. The first pool can be arbitrarily large, but the second should be throttled to a single connection, as it is not possible for two "immediate" transactions to execute concurrently. (The second will wait for the first to complete.)
tx, err := db.Begin()
if err != nil {
    return err
}
defer tx.Rollback()

rows, err := tx.Query(...)
if err != nil {
    return err
}
defer rows.Close()

for rows.Next() {
    if err := rows.Scan(...); err != nil {
        return err
    }
    if err := tx.Exec(...); err != nil {
        return err
    }
}

if err := rows.Err(); err != nil {
    return err
}

return tx.Commit()
  1. Same as (5), but use the new DB.Conn method instead of a transaction.
c, err := db.Conn(ctx)
if err != nil {
    return err
}
defer c.Close()

rows, err := c.QueryContext(ctx, ...)
if err != nil {
    return err
}
defer rows.Close()

for rows.Next() {
    if err := rows.Scan(...); err != nil {
        return err
    }
    if err := c.ExecContext(ctx, ...); err != nil {
        return err
    }
}

if err := rows.Err(); err != nil {
    return err
}

return c.Close()

@gjrtimmer
Copy link
Collaborator

No activity / response; closed.

lstoll added a commit to lstoll/wherewasi that referenced this issue Jan 10, 2021
Turns out shared cache mode limits db connections, which ultimate triggers a lot of the locking issues:
mattn/go-sqlite3#632 (comment)

Stop using shared cache mode, in favour of a busy timeout. This lets us remove the mutexes, and
appears to solve the issue.
@juagargi
Copy link

I am using transactions to provide a consistent view and atomicity to a RPC handler. Each handler creates a transaction, which typically live for some milliseconds, and uses it to read and write several tables. This needs to happen in parallel, as they cannot block other clients in the meantime.
From @rittneje great comment I understood that I could take his solution # 4 to use the library correctly:

  1. Allow multiple connections in the pool with each having a private cache, and use write-ahead logging. You can modify the database while iterating through result rows, including the table(s) from which you are reading. Note that this requires private caches; using a shared cache will result in the same issue mentioned in (3).

But it seems it's not working for me. This is the connection string: database_file.sqlite?cache=private&mode=rwc&_busy_timeout=10000&_journal_mode=WAL
I am always getting ErrBusy "database is locked" if two read transactions try to become write transactions in parallel AND read from any table. I guess the reason it works in that case is because the transaction itself is deferred until there is an operation with the DB.
The sample code (an adaptation of some sample code that I found in another issue) is here

Is there any solution available that allows several write transactions to coexist simultaneously? Or did I not implement correctly @rittneje 's solution?

@rittneje
Copy link
Collaborator

@juagargi You need to use BEGIN IMMEDIATE (exposed as _txlock in the DSN) for two concurrent read-then-write transactions to work, since SQLite is trying to enforce transactional consistency. (If the database were modified by another transaction between your SELECT and INSERT then it can no longer guarantee that what you read is up-to-date.) Note that doing so means that your transactions will not truly happen in parallel, and you can still get SQLITE_BUSY if you hit your busy timeout while waiting for the database to be available. I will update my earlier comment to include this information.

Also, FYI you are missing the "file:" prefix on your call to sql.Open. It should be:

db, err := sql.Open("sqlite3", "file:database_file.sqlite?cache=private&mode=rwc&_busy_timeout=10000&_journal_mode=WAL")

@juagargi
Copy link

Many thanks @rittneje ! Wouldn't IMMEDIATE be equivalent to just one connection? It looks to me that the second thread won't ever be allowed to obtain a second read-transaction while the first one exists, which breaks my scenario (I want transactions as a "snapshot" of the DB for the life of the handler). I tried with _txlock and the second transaction won't open if there is already one.
Is there any way in the go-sqlite3 library to mix read statements and write statements in the same transactions, so that I can read in parallel, but write sequentially?

For now I have "overriden" the write functions of my DB handled by a Tx object, to check if they return a sqlite3.ErrBusy, and if so, sleep a bit, create a new transaction and try again. It stores the old transactions, so that it can rollback / commit all of them.
Still this is not a super genius approach, as with this I could still get non-repeatable reads, so not a general solution. But for my particular case, the handlers only attempt to write at the end of the RPC, so I'm safe for now ¯_(ツ)_/¯

@rittneje
Copy link
Collaborator

Fundamentally, SQLite itself does not really support two concurrent write transactions. You can kind of do it with shared cache mode, but there are some caveats, and the SQLite maintainers discourage its use anyway.

If all your transactions are going to read and write, then yes, in essence BEGIN IMMEDIATE is equivalent to a single connection. (But you should do it anyway in case there are transactions from another application.) If however some of your transactions are read-only, then you can have a separate read-only pool (using BEGIN DEFERRED) that is arbitrarily large (within your system/resource limitations).

@juagargi
Copy link

Thank you again for all the insight, I appreciate it.

lstoll added a commit to lstoll/wherewasi that referenced this issue Oct 30, 2022
lstoll added a commit to lstoll/wherewasi that referenced this issue Nov 4, 2022
* Upgrade to 1.18, use newer docker build stuff

The new stuff automatics tags and things like that

* Disable spatialite, WAL by default

spatialite has proven to be a continual pain in the ass for something we
don't actually use in the app, so drop it for now. Can continue to use it
in the CLI for ad-hoc if we want.

Enable WAL mode by default. This ruins the 'one file' thing, but also is
what should be used in a serving environment

* Use os/gh actions lint, normal sqlite build

* Drop cache mode, like we did in #30

mattn/go-sqlite3#632 (comment)

* Set a busy timeout

* Remove DB write mutex

we shouldn't actually need that, esp. with wal mode + busy timeout etc.

* Upgrade go version / sqlite driver again

* Tests use real DB on disk too

rather than potentially masking over db config issues by using in-mem

* Drop transaction around venue sync

we ran that in one big tx, which took a couple seconds and locked the DB. I
can't think of any real reason why it can't be an incremental update, the writes
are all upserts. So drop the TX

* upgrade actions to 1.19 too

* Fix lint errors
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants