Skip to content
This repository was archived by the owner on Oct 18, 2023. It is now read-only.

bottomless: checkpoint before initializing bottomless #726

Merged
merged 2 commits into from
Oct 3, 2023

Conversation

psarna
Copy link
Contributor

@psarna psarna commented Oct 3, 2023

Due to a bug in wallog recovery, we need to checkpoint the database strictly before we initialize bottomless. A proper fix should be to use our virtual WAL methods for checkpointing, but there's an initialization cycle and resolving it will be a larger patch - a connection with WAL methods wants us to already have the replication logger created, and replication logger wants to perform a checkpoint on creation.
As a mid-term solution, we just perform the forbidden regular checkpoint before bottomless is ever launched. Combined with the fact that bottomless treats existing databases as the source of truth, it just creates
a new backup generation and continues working properly.

The following scenario was buggy before:

  1. We leave the db in the state where some WAL frames still exist in data-wal file
  2. We restart sqld
  3. bottomless is initialized, it reuses the existing db and WAL frames and uploads them to S3, to avoid creating a potentially costly snapshot
  4. ReplicationLogger::new() incorrectly calls sqlite3_wal_checkpoint which swipes data from under bottomless.
  5. Bottomless thinks it hasn't checkpointed and continues to write WAL frames. As a result, it writes garbage to S3, because the db was checkpointed outside of bottomless control

Due to a bug in wallog recovery, we need to checkpoint
the database *strictly before* we initialize bottomless.
A proper fix should be to use our virtual WAL methods
for checkpointing, but there's an initialization cycle
and resolving it will be a larger patch - a connection
with WAL methods wants us to already have the replication
logger created, and replication logger wants to perform
a checkpoint on creation.
As a mid-term solution, we just perform the forbidden
regular checkpoint before bottomless is ever launched.
Combined with the fact that bottomless treats existing
databases as the source of truth, it just creates
a new backup generation and continues working properly.

The following scenario was buggy before:
 1. We leave the db in the state where some WAL frames
   still exist in data-wal file
 2. We restart sqld
 3. bottomless is initialized, it reuses the existing db
    and WAL frames and uploads them to S3, to avoid
    creating a potentially costly snapshot
 4. ReplicationLogger::new() incorrectly calls
    sqlite3_wal_checkpoint which swipes data from under
    bottomless.
 5. Bottomless thinks it hasn't checkpointed and continues
    to write WAL frames. As a result, it writes garbage
    to S3, because the db was checkpointed outside
    of bottomless control
@psarna psarna requested review from penberg and MarinPostma October 3, 2023 14:29
@MarinPostma MarinPostma enabled auto-merge October 3, 2023 14:51
@MarinPostma MarinPostma added this pull request to the merge queue Oct 3, 2023
Merged via the queue into libsql:main with commit 40cb9be Oct 3, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants