Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: fix FormatPrePebblev1Marked migration #2020

Merged
merged 1 commit into from
Oct 20, 2022

Conversation

jbowens
Copy link
Collaborator

@jbowens jbowens commented Oct 20, 2022

Fix the FormatPrePebblev1Marked migration to tolerate concurrent file deletions by disabling physical deletion of files removed from the LSM until the migration completes.

Fix #2019.
Informs cockroachdb/cockroach#89755.
Informs cockroachdb/cockroach#83079.

@jbowens jbowens requested review from nicktrav and a team October 20, 2022 15:39
@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm: Thank you!

Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens)


format_major_version_test.go line 475 at r1 (raw file):

//
// Regression test for #2019.
func TestPebblev1MigrationRace(t *testing.T) {

I took this for spin locally. One thing I noticed was that panics in this test (without the fix) are on the compaction goroutines calling into the cache (presumably the ones that have been told to re-write a table that was then deleted). Whereas the panic in the original report was looking in the cache before in order to schedule the compactions.

It's slightly different, but I'm assuming the same patch would knock out that issue too - i.e. preventing files from being deleted keeps them around in the cache.

Copy link
Collaborator Author

@jbowens jbowens left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @nicktrav)


format_major_version_test.go line 475 at r1 (raw file):

Previously, nicktrav (Nick Travers) wrote…

I took this for spin locally. One thing I noticed was that panics in this test (without the fix) are on the compaction goroutines calling into the cache (presumably the ones that have been told to re-write a table that was then deleted). Whereas the panic in the original report was looking in the cache before in order to schedule the compactions.

It's slightly different, but I'm assuming the same patch would knock out that issue too - i.e. preventing files from being deleted keeps them around in the cache.

Yeah, there are two ways in which the race can manifest.

  1. If the compaction completes and deletes the file before the migration opens it, the migration panics with the panic in the original report.
  2. If the migration opens the file first before the compaction attempts to delete it, when the compaction goes to delete the file and evict it from the table cache, it sees a non-zero ref count which induces a panic.

This test reliably produces the 2nd case, but I haven't been able to produce the 1st case within a unit test.

Copy link
Contributor

@nicktrav nicktrav left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! all files reviewed, all discussions resolved (waiting on @jbowens)


format_major_version_test.go line 475 at r1 (raw file):

This test reliably produces the 2nd case, but I haven't been able to produce the 1st case within a unit test.

Understood. Figured it was tricky to tickle that part of it. LGTM.

Fix the FormatPrePebblev1Marked migration to tolerate concurrent file
deletions by disabling physical deletion of files removed from the LSM
until the migration completes.

Fix cockroachdb#2019.
Informs cockroachdb/cockroach#89755.
Informs cockroachdb/cockroach#83079.
@jbowens
Copy link
Collaborator Author

jbowens commented Oct 20, 2022

tftr!

@jbowens jbowens merged commit 24f67a4 into cockroachdb:master Oct 20, 2022
@jbowens jbowens deleted the fmv-fix branch October 20, 2022 18:19
jbowens added a commit to jbowens/pebble that referenced this pull request Dec 12, 2022
During a format major version, some version migrations must scan the
LSM. These migrations require that the files they examine are not
deleted before or while they're reading them. A previous attempt (cockroachdb#2020)
at avoiding these unexpected deletions accidentally left an
opening for the acquired versions' files to be deleted while waiting for
file deletions to be disabled (sleeping on a condition variable).

A different fix with less subtlety is applied here. The format major
version upgrade path acquires a readState, which refs the versions it
acquires, preventing deletion of the referenced files. This requires the
format major version upgrade in Open to be moved later within the Open
routine, after the current read state is initialized.

Fix cockroachdb#2143.
jbowens added a commit to jbowens/pebble that referenced this pull request Jan 12, 2023
During a format major version, some version migrations must scan the
LSM. These migrations require that the files they examine are not
deleted before or while they're reading them. A previous attempt (cockroachdb#2020)
at avoiding these unexpected deletions accidentally left an
opening for the acquired versions' files to be deleted while waiting for
file deletions to be disabled (sleeping on a condition variable).

A different fix with less subtlety is applied here. The format major
version upgrade path acquires a readState, which refs the versions it
acquires, preventing deletion of the referenced files. This requires the
format major version upgrade in Open to be moved later within the Open
routine, after the current read state is initialized.

Fix cockroachdb#2143.
jbowens added a commit that referenced this pull request Jan 12, 2023
During a format major version, some version migrations must scan the
LSM. These migrations require that the files they examine are not
deleted before or while they're reading them. A previous attempt (#2020)
at avoiding these unexpected deletions accidentally left an
opening for the acquired versions' files to be deleted while waiting for
file deletions to be disabled (sleeping on a condition variable).

A different fix with less subtlety is applied here. The format major
version upgrade path acquires a readState, which refs the versions it
acquires, preventing deletion of the referenced files. This requires the
format major version upgrade in Open to be moved later within the Open
routine, after the current read state is initialized.

Fix #2143.
jbowens added a commit to jbowens/pebble that referenced this pull request Jan 12, 2023
During a format major version, some version migrations must scan the
LSM. These migrations require that the files they examine are not
deleted before or while they're reading them. A previous attempt (cockroachdb#2020)
at avoiding these unexpected deletions accidentally left an
opening for the acquired versions' files to be deleted while waiting for
file deletions to be disabled (sleeping on a condition variable).

A different fix with less subtlety is applied here. The format major
version upgrade path acquires a readState, which refs the versions it
acquires, preventing deletion of the referenced files. This requires the
format major version upgrade in Open to be moved later within the Open
routine, after the current read state is initialized.

Fix cockroachdb#2143.
jbowens added a commit that referenced this pull request Jan 12, 2023
During a format major version, some version migrations must scan the
LSM. These migrations require that the files they examine are not
deleted before or while they're reading them. A previous attempt (#2020)
at avoiding these unexpected deletions accidentally left an
opening for the acquired versions' files to be deleted while waiting for
file deletions to be disabled (sleeping on a condition variable).

A different fix with less subtlety is applied here. The format major
version upgrade path acquires a readState, which refs the versions it
acquires, preventing deletion of the referenced files. This requires the
format major version upgrade in Open to be moved later within the Open
routine, after the current read state is initialized.

Fix #2143.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

db: FormatPrePebblev1Marked panics if sstable does not exist
3 participants