-
Notifications
You must be signed in to change notification settings - Fork 6.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update VersionSet last seqno after LogAndApply #10051
Conversation
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for showing a simple solution can work!
I thought for a while about what we can do to increase concurrency of live writes with file ingestion. Nothing to do now; recording my notes here for future reference.
- There is a way we can start working on the live writes earlier (cockroachdb/pebble does it) by decoupling the seqno allocated to writers from the seqno visible to readers, but still we would have to wait for the ingestion to finish before parallel live writes can be made visible to readers.
- We could try only making writes that overlap with an ingested file wait, or return TryAgain if a write overlapping with an ingested file happened during the flushing stage. Currently we block all writes from even starting while we flush memtables and write to manifest which feels excessive.
@cbi42 has updated the pull request. You must reimport the pull request before landing. |
@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator. |
One more, just because it's an interesting problem to think about:
|
Summary: Thanks to #9919 and #10051 the known bugs in file ingestion (besides mmap read + file checksum) are fixed. Now we can try again to enable file ingestion in crash test. Pull Request resolved: #9357 Test Plan: stress file ingestion heavily for an hour: `$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=1000000 --ingest_external_file_one_in=100 --duration=3600 --interval=20 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152` Reviewed By: riversand963 Differential Revision: D33410746 Pulled By: ajkr fbshipit-source-id: d276431390995a67f68390d61c06a40945fdd280
…ifest (#10066) Summary: Fix the unittest `ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest` introduced in #10051 that is failing. Pull Request resolved: #10066 Test Plan: CI Reviewed By: ajkr Differential Revision: D36720669 Pulled By: cbi42 fbshipit-source-id: 47a6d2c161f27b605ede5c62d1776eecaf0d5363
This PR fixes the issue of unstable snapshot during external SST file ingestion. Credit @ajkr for the following walk through: consider these relevant steps for of IngestExternalFile():
(1) increase seqno while holding mutex --
rocksdb/db/db_impl/db_impl.cc
Line 4768 in 677d2b4
(2) LogAndApply() --
rocksdb/db/db_impl/db_impl.cc
Lines 4797 to 4798 in 677d2b4
(a) write to MANIFEST with mutex released
rocksdb/db/version_set.cc
Line 4407 in a96a4a2
(b) apply to in-memory state with mutex held
A snapshot taken during (2a) will be unstable. In particular, queries against that snapshot will not include data from the ingested file before (2b), and will include data from the ingested file after (2b).
Test Plan:
Added a new unit test:
ExternalSSTFileBasicTest.WriteAfterReopenStableSnapshotWhileLoggingToManifest
.