Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update VersionSet last seqno after LogAndApply #10051

Closed
wants to merge 3 commits into from

Conversation

cbi42
Copy link
Member

@cbi42 cbi42 commented May 25, 2022

This PR fixes the issue of unstable snapshot during external SST file ingestion. Credit @ajkr for the following walk through: consider these relevant steps for of IngestExternalFile():

(1) increase seqno while holding mutex --

versions_->SetLastSequence(last_seqno + consumed_seqno_count);

(2) LogAndApply() --

rocksdb/db/db_impl/db_impl.cc

Lines 4797 to 4798 in 677d2b4

versions_->LogAndApply(cfds_to_commit, mutable_cf_options_list,
edit_lists, &mutex_, directories_.GetDbDir());

(a) write to MANIFEST with mutex released
mu->Unlock();

(b) apply to in-memory state with mutex held

A snapshot taken during (2a) will be unstable. In particular, queries against that snapshot will not include data from the ingested file before (2b), and will include data from the ingested file after (2b).

Test Plan:
Added a new unit test: ExternalSSTFileBasicTest.WriteAfterReopenStableSnapshotWhileLoggingToManifest.

make external_sst_file_basic_test
./external_sst_file_basic_test

@cbi42 cbi42 changed the title Set VersionSet last seqno after LogAndApply Update VersionSet last seqno after LogAndApply May 25, 2022
@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

Copy link
Contributor

@ajkr ajkr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for showing a simple solution can work!

I thought for a while about what we can do to increase concurrency of live writes with file ingestion. Nothing to do now; recording my notes here for future reference.

  • There is a way we can start working on the live writes earlier (cockroachdb/pebble does it) by decoupling the seqno allocated to writers from the seqno visible to readers, but still we would have to wait for the ingestion to finish before parallel live writes can be made visible to readers.
  • We could try only making writes that overlap with an ingested file wait, or return TryAgain if a write overlapping with an ingested file happened during the flushing stage. Currently we block all writes from even starting while we flush memtables and write to manifest which feels excessive.

@facebook-github-bot
Copy link
Contributor

@cbi42 has updated the pull request. You must reimport the pull request before landing.

@facebook-github-bot
Copy link
Contributor

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

@ajkr
Copy link
Contributor

ajkr commented May 25, 2022

I thought for a while about what we can do to increase concurrency of live writes with file ingestion. Nothing to do now; recording my notes here for future reference.

One more, just because it's an interesting problem to think about:

  • The ingestion path could simply insert a special range key [ingested file smallest key, ingested file largest key] -> ingested filename
    • When readers need data for such a range key in memtable, they would look into ingested filename
    • When background flush encounters such a range key, it can do the work of assigning the ingested file to a level and committing it to the LSM

facebook-github-bot pushed a commit that referenced this pull request May 26, 2022
Summary:
Thanks to #9919 and #10051 the known bugs in file ingestion (besides mmap read + file checksum) are fixed. Now we can  try again to enable file ingestion in crash test.

Pull Request resolved: #9357

Test Plan: stress file ingestion heavily for an hour: `$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=1000000 --ingest_external_file_one_in=100 --duration=3600 --interval=20 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152`

Reviewed By: riversand963

Differential Revision: D33410746

Pulled By: ajkr

fbshipit-source-id: d276431390995a67f68390d61c06a40945fdd280
facebook-github-bot pushed a commit that referenced this pull request May 31, 2022
…ifest (#10066)

Summary:
Fix the unittest `ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest` introduced in #10051 that is failing.

Pull Request resolved: #10066

Test Plan: CI

Reviewed By: ajkr

Differential Revision: D36720669

Pulled By: cbi42

fbshipit-source-id: 47a6d2c161f27b605ede5c62d1776eecaf0d5363
@cbi42 cbi42 mentioned this pull request Jun 1, 2022
facebook-github-bot pushed a commit that referenced this pull request Jun 1, 2022
Summary:
Add to HISTORY.md the bug fixed in #10051

Pull Request resolved: #10091

Reviewed By: ajkr

Differential Revision: D36821861

Pulled By: cbi42

fbshipit-source-id: 598812fab88f65c0147ece53cff55cf4ea73aac6
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants