Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Creating state-sync snapshot can fail with SQLITE_BUSY #8523

Closed
mhofman opened this issue Nov 10, 2023 · 2 comments · Fixed by #8619
Closed

Creating state-sync snapshot can fail with SQLITE_BUSY #8523

mhofman opened this issue Nov 10, 2023 · 2 comments · Fixed by #8619
Assignees
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset swing-store

Comments

@mhofman
Copy link
Member

mhofman commented Nov 10, 2023

Describe the bug

During blocks with a lot of activity, or while catching-up a node, taking a state-sync snapshot can often fail with an error code SQLITE_BUSY. Since the export is starting a read-only transaction, and the DB is in WAL mode, there is no reason it shouldn't be able to start the transaction.

To Reproduce

Enable state-sync snapshots on a node. Start from state-sync or another old cosmos snapshot.

Expected behavior

No errors starting a snapshot

Platform Environment

N/A

Additional context

SQLite likely needs to acquire a lock on the DB to indicate it's starting a read transaction, avoiding the flushing of the WAL file. It's possibly that lock acquiring that fails in some circumstances

Screenshots

2023-11-10T00:34:18.910338376Z State-sync export failed for block 12418000 { code: 'SQLITE_BUSY' }
2023-11-10T00:46:35.968256538Z State-sync export failed for block 12420000 { code: 'SQLITE_BUSY' }
2023-11-10T00:56:40.470466414Z State-sync export failed for block 12422000 { code: 'SQLITE_BUSY' }

@mhofman mhofman added bug Something isn't working cosmic-swingset package: cosmic-swingset swing-store labels Nov 10, 2023
@mhofman
Copy link
Member Author

mhofman commented Nov 10, 2023

warner added a commit that referenced this issue Dec 5, 2023
Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs #8523
@mhofman mhofman linked a pull request Dec 5, 2023 that will close this issue
warner added a commit that referenced this issue Dec 5, 2023
Exporting a state-sync snapshot is a read-only operation, and is
designed to run "in the background", i.e. in parallel with normal
mutating operations. It accomplishes this by opening a read-only
transaction right away, effectively capturing a snapshot of the SQLite
database state, to insulate the export process from ongoing writes by
the execution host.

The cosmic-swingset exporter starts with a query of `host.height`, to
confirm that the database has not already advanced to a new block
before this snapshot/read-transaction can be taken.

Previously, this query worked by using `openSwingStore`, and then
calling `hostStorage.hostKVStore.get('host.height')`. This had two
problems:

* TOCTTOU: the `hostKVStore.get` used a different DB connection (and
  different txn) than the exporter, so it might return a different
  height, negating the accuracy of the consistency check

* read-write txn: `openSwingStore` creates a read-*write* txn, even
  when merely opening the DB (because it might need to create the
  initial tables). This txn is closed right away, before
  `openSwingStore()` returns, so it did not present a threat to
  ongoing operations. But if the exporter was created while the
  ongoing execution side already had its own read-write txn
  open (e.g. while `controller.run()` was running), then it would
  fail, and `makeSwingStoreExporter` would fail with `SQLITE_BUSY`

Instead, we take advantage of the new `swingStoreExporter.getHostKV()`
API, and use *it* to fetch `host.height`. Unlike the normal
swingstore, the swingstore-exporter refrains from creating read-write
transactions entirely. So the cosmic-swingset export code can safely
query the height without fear of getting the wrong value or failing
because of an ongoing write transaction.

We this this should fix the SQLITE_BUSY errors.

refs #8523
warner added a commit that referenced this issue Dec 5, 2023
Exporting a state-sync snapshot is a read-only operation, and is
designed to run "in the background", i.e. in parallel with normal
mutating operations. It accomplishes this by opening a read-only
transaction right away, effectively capturing a snapshot of the SQLite
database state, to insulate the export process from ongoing writes by
the execution host.

The cosmic-swingset exporter starts with a query of `host.height`, to
confirm that the database has not already advanced to a new block
before this snapshot/read-transaction can be taken.

Previously, this query worked by using `openSwingStore`, and then
calling `hostStorage.hostKVStore.get('host.height')`. This had two
problems:

* TOCTTOU: the `hostKVStore.get` used a different DB connection (and
  different txn) than the exporter, so it might return a different
  height, negating the accuracy of the consistency check

* read-write txn: `openSwingStore` creates a read-*write* txn, even
  when merely opening the DB (because it might need to create the
  initial tables). This txn is closed right away, before
  `openSwingStore()` returns, so it did not present a threat to
  ongoing operations. But if the exporter was created while the
  ongoing execution side already had its own read-write txn
  open (e.g. while `controller.run()` was running), then it would
  fail, and `makeSwingStoreExporter` would fail with `SQLITE_BUSY`

Instead, we take advantage of the new `swingStoreExporter.getHostKV()`
API, and use *it* to fetch `host.height`. Unlike the normal
swingstore, the swingstore-exporter refrains from creating read-write
transactions entirely. So the cosmic-swingset export code can safely
query the height without fear of getting the wrong value or failing
because of an ongoing write transaction.

We think this should fix the SQLITE_BUSY errors.

refs #8523
@mhofman mhofman linked a pull request Dec 5, 2023 that will close this issue
@mergify mergify bot closed this as completed in #8619 Dec 6, 2023
mergify bot pushed a commit that referenced this issue Dec 6, 2023
Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs #8523
mhofman pushed a commit that referenced this issue Dec 6, 2023
Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs #8523
mhofman pushed a commit that referenced this issue Dec 6, 2023
Exporting a state-sync snapshot is a read-only operation, and is
designed to run "in the background", i.e. in parallel with normal
mutating operations. It accomplishes this by opening a read-only
transaction right away, effectively capturing a snapshot of the SQLite
database state, to insulate the export process from ongoing writes by
the execution host.

The cosmic-swingset exporter starts with a query of `host.height`, to
confirm that the database has not already advanced to a new block
before this snapshot/read-transaction can be taken.

Previously, this query worked by using `openSwingStore`, and then
calling `hostStorage.hostKVStore.get('host.height')`. This had two
problems:

* TOCTTOU: the `hostKVStore.get` used a different DB connection (and
  different txn) than the exporter, so it might return a different
  height, negating the accuracy of the consistency check

* read-write txn: `openSwingStore` creates a read-*write* txn, even
  when merely opening the DB (because it might need to create the
  initial tables). This txn is closed right away, before
  `openSwingStore()` returns, so it did not present a threat to
  ongoing operations. But if the exporter was created while the
  ongoing execution side already had its own read-write txn
  open (e.g. while `controller.run()` was running), then it would
  fail, and `makeSwingStoreExporter` would fail with `SQLITE_BUSY`

Instead, we take advantage of the new `swingStoreExporter.getHostKV()`
API, and use *it* to fetch `host.height`. Unlike the normal
swingstore, the swingstore-exporter refrains from creating read-write
transactions entirely. So the cosmic-swingset export code can safely
query the height without fear of getting the wrong value or failing
because of an ongoing write transaction.

We think this should fix the SQLITE_BUSY errors.

refs #8523
@mhofman
Copy link
Member Author

mhofman commented Dec 6, 2023

Since the export is starting a read-only transaction, and the DB is in WAL mode, there is no reason it shouldn't be able to start the transaction.

For posterity, the problem was the openDB to get the block height, which started an immediate tx, and executed some statements that tried to kick the tx into write mode.

@mhofman mhofman removed a link to a pull request Dec 6, 2023
mhofman pushed a commit that referenced this issue Dec 6, 2023
Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs #8523
mhofman pushed a commit that referenced this issue Dec 6, 2023
Exporting a state-sync snapshot is a read-only operation, and is
designed to run "in the background", i.e. in parallel with normal
mutating operations. It accomplishes this by opening a read-only
transaction right away, effectively capturing a snapshot of the SQLite
database state, to insulate the export process from ongoing writes by
the execution host.

The cosmic-swingset exporter starts with a query of `host.height`, to
confirm that the database has not already advanced to a new block
before this snapshot/read-transaction can be taken.

Previously, this query worked by using `openSwingStore`, and then
calling `hostStorage.hostKVStore.get('host.height')`. This had two
problems:

* TOCTTOU: the `hostKVStore.get` used a different DB connection (and
  different txn) than the exporter, so it might return a different
  height, negating the accuracy of the consistency check

* read-write txn: `openSwingStore` creates a read-*write* txn, even
  when merely opening the DB (because it might need to create the
  initial tables). This txn is closed right away, before
  `openSwingStore()` returns, so it did not present a threat to
  ongoing operations. But if the exporter was created while the
  ongoing execution side already had its own read-write txn
  open (e.g. while `controller.run()` was running), then it would
  fail, and `makeSwingStoreExporter` would fail with `SQLITE_BUSY`

Instead, we take advantage of the new `swingStoreExporter.getHostKV()`
API, and use *it* to fetch `host.height`. Unlike the normal
swingstore, the swingstore-exporter refrains from creating read-write
transactions entirely. So the cosmic-swingset export code can safely
query the height without fear of getting the wrong value or failing
because of an ongoing write transaction.

We think this should fix the SQLITE_BUSY errors.

refs #8523
mhofman pushed a commit that referenced this issue Dec 6, 2023
Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs #8523
mhofman pushed a commit that referenced this issue Dec 6, 2023
Exporting a state-sync snapshot is a read-only operation, and is
designed to run "in the background", i.e. in parallel with normal
mutating operations. It accomplishes this by opening a read-only
transaction right away, effectively capturing a snapshot of the SQLite
database state, to insulate the export process from ongoing writes by
the execution host.

The cosmic-swingset exporter starts with a query of `host.height`, to
confirm that the database has not already advanced to a new block
before this snapshot/read-transaction can be taken.

Previously, this query worked by using `openSwingStore`, and then
calling `hostStorage.hostKVStore.get('host.height')`. This had two
problems:

* TOCTTOU: the `hostKVStore.get` used a different DB connection (and
  different txn) than the exporter, so it might return a different
  height, negating the accuracy of the consistency check

* read-write txn: `openSwingStore` creates a read-*write* txn, even
  when merely opening the DB (because it might need to create the
  initial tables). This txn is closed right away, before
  `openSwingStore()` returns, so it did not present a threat to
  ongoing operations. But if the exporter was created while the
  ongoing execution side already had its own read-write txn
  open (e.g. while `controller.run()` was running), then it would
  fail, and `makeSwingStoreExporter` would fail with `SQLITE_BUSY`

Instead, we take advantage of the new `swingStoreExporter.getHostKV()`
API, and use *it* to fetch `host.height`. Unlike the normal
swingstore, the swingstore-exporter refrains from creating read-write
transactions entirely. So the cosmic-swingset export code can safely
query the height without fear of getting the wrong value or failing
because of an ongoing write transaction.

We think this should fix the SQLITE_BUSY errors.

refs #8523
mhofman pushed a commit that referenced this issue Dec 6, 2023
Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs #8523
mhofman pushed a commit that referenced this issue Dec 6, 2023
Exporting a state-sync snapshot is a read-only operation, and is
designed to run "in the background", i.e. in parallel with normal
mutating operations. It accomplishes this by opening a read-only
transaction right away, effectively capturing a snapshot of the SQLite
database state, to insulate the export process from ongoing writes by
the execution host.

The cosmic-swingset exporter starts with a query of `host.height`, to
confirm that the database has not already advanced to a new block
before this snapshot/read-transaction can be taken.

Previously, this query worked by using `openSwingStore`, and then
calling `hostStorage.hostKVStore.get('host.height')`. This had two
problems:

* TOCTTOU: the `hostKVStore.get` used a different DB connection (and
  different txn) than the exporter, so it might return a different
  height, negating the accuracy of the consistency check

* read-write txn: `openSwingStore` creates a read-*write* txn, even
  when merely opening the DB (because it might need to create the
  initial tables). This txn is closed right away, before
  `openSwingStore()` returns, so it did not present a threat to
  ongoing operations. But if the exporter was created while the
  ongoing execution side already had its own read-write txn
  open (e.g. while `controller.run()` was running), then it would
  fail, and `makeSwingStoreExporter` would fail with `SQLITE_BUSY`

Instead, we take advantage of the new `swingStoreExporter.getHostKV()`
API, and use *it* to fetch `host.height`. Unlike the normal
swingstore, the swingstore-exporter refrains from creating read-write
transactions entirely. So the cosmic-swingset export code can safely
query the height without fear of getting the wrong value or failing
because of an ongoing write transaction.

We think this should fix the SQLITE_BUSY errors.

refs #8523
anilhelvaci pushed a commit to Jorge-Lopes/agoric-sdk that referenced this issue Feb 16, 2024
Add a new API to the exporter, `exporter.getHostKV(key)`, so that the
cosmic-swingset state-sync exporter process can query `host.height`
without accidentally creating a read-write transaction too.

refs Agoric#8523
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working cosmic-swingset package: cosmic-swingset swing-store
Projects
None yet
3 participants