-
Notifications
You must be signed in to change notification settings - Fork 451
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: fix FormatPrePebblev1Marked migration #2020
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 3 of 3 files at r1, all commit messages.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @jbowens)
format_major_version_test.go
line 475 at r1 (raw file):
// // Regression test for #2019. func TestPebblev1MigrationRace(t *testing.T) {
I took this for spin locally. One thing I noticed was that panics in this test (without the fix) are on the compaction goroutines calling into the cache (presumably the ones that have been told to re-write a table that was then deleted). Whereas the panic in the original report was looking in the cache before in order to schedule the compactions.
It's slightly different, but I'm assuming the same patch would knock out that issue too - i.e. preventing files from being deleted keeps them around in the cache.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: all files reviewed, 1 unresolved discussion (waiting on @nicktrav)
format_major_version_test.go
line 475 at r1 (raw file):
Previously, nicktrav (Nick Travers) wrote…
I took this for spin locally. One thing I noticed was that panics in this test (without the fix) are on the compaction goroutines calling into the cache (presumably the ones that have been told to re-write a table that was then deleted). Whereas the panic in the original report was looking in the cache before in order to schedule the compactions.
It's slightly different, but I'm assuming the same patch would knock out that issue too - i.e. preventing files from being deleted keeps them around in the cache.
Yeah, there are two ways in which the race can manifest.
- If the compaction completes and deletes the file before the migration opens it, the migration panics with the panic in the original report.
- If the migration opens the file first before the compaction attempts to delete it, when the compaction goes to delete the file and evict it from the table cache, it sees a non-zero ref count which induces a panic.
This test reliably produces the 2nd case, but I haven't been able to produce the 1st case within a unit test.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! all files reviewed, all discussions resolved (waiting on @jbowens)
format_major_version_test.go
line 475 at r1 (raw file):
This test reliably produces the 2nd case, but I haven't been able to produce the 1st case within a unit test.
Understood. Figured it was tricky to tickle that part of it. LGTM.
Fix the FormatPrePebblev1Marked migration to tolerate concurrent file deletions by disabling physical deletion of files removed from the LSM until the migration completes. Fix cockroachdb#2019. Informs cockroachdb/cockroach#89755. Informs cockroachdb/cockroach#83079.
tftr! |
During a format major version, some version migrations must scan the LSM. These migrations require that the files they examine are not deleted before or while they're reading them. A previous attempt (cockroachdb#2020) at avoiding these unexpected deletions accidentally left an opening for the acquired versions' files to be deleted while waiting for file deletions to be disabled (sleeping on a condition variable). A different fix with less subtlety is applied here. The format major version upgrade path acquires a readState, which refs the versions it acquires, preventing deletion of the referenced files. This requires the format major version upgrade in Open to be moved later within the Open routine, after the current read state is initialized. Fix cockroachdb#2143.
During a format major version, some version migrations must scan the LSM. These migrations require that the files they examine are not deleted before or while they're reading them. A previous attempt (cockroachdb#2020) at avoiding these unexpected deletions accidentally left an opening for the acquired versions' files to be deleted while waiting for file deletions to be disabled (sleeping on a condition variable). A different fix with less subtlety is applied here. The format major version upgrade path acquires a readState, which refs the versions it acquires, preventing deletion of the referenced files. This requires the format major version upgrade in Open to be moved later within the Open routine, after the current read state is initialized. Fix cockroachdb#2143.
During a format major version, some version migrations must scan the LSM. These migrations require that the files they examine are not deleted before or while they're reading them. A previous attempt (#2020) at avoiding these unexpected deletions accidentally left an opening for the acquired versions' files to be deleted while waiting for file deletions to be disabled (sleeping on a condition variable). A different fix with less subtlety is applied here. The format major version upgrade path acquires a readState, which refs the versions it acquires, preventing deletion of the referenced files. This requires the format major version upgrade in Open to be moved later within the Open routine, after the current read state is initialized. Fix #2143.
During a format major version, some version migrations must scan the LSM. These migrations require that the files they examine are not deleted before or while they're reading them. A previous attempt (cockroachdb#2020) at avoiding these unexpected deletions accidentally left an opening for the acquired versions' files to be deleted while waiting for file deletions to be disabled (sleeping on a condition variable). A different fix with less subtlety is applied here. The format major version upgrade path acquires a readState, which refs the versions it acquires, preventing deletion of the referenced files. This requires the format major version upgrade in Open to be moved later within the Open routine, after the current read state is initialized. Fix cockroachdb#2143.
During a format major version, some version migrations must scan the LSM. These migrations require that the files they examine are not deleted before or while they're reading them. A previous attempt (#2020) at avoiding these unexpected deletions accidentally left an opening for the acquired versions' files to be deleted while waiting for file deletions to be disabled (sleeping on a condition variable). A different fix with less subtlety is applied here. The format major version upgrade path acquires a readState, which refs the versions it acquires, preventing deletion of the referenced files. This requires the format major version upgrade in Open to be moved later within the Open routine, after the current read state is initialized. Fix #2143.
Fix the FormatPrePebblev1Marked migration to tolerate concurrent file deletions by disabling physical deletion of files removed from the LSM until the migration completes.
Fix #2019.
Informs cockroachdb/cockroach#89755.
Informs cockroachdb/cockroach#83079.