Update VersionSet last seqno after LogAndApply #10051

cbi42 · 2022-05-25T01:15:19Z

This PR fixes the issue of unstable snapshot during external SST file ingestion. Credit @ajkr for the following walk through: consider these relevant steps for of IngestExternalFile():

(1) increase seqno while holding mutex --

rocksdb/db/db_impl/db_impl.cc

Line 4768 in 677d2b4

versions_->SetLastSequence(last_seqno + consumed_seqno_count);

(2) LogAndApply() --

rocksdb/db/db_impl/db_impl.cc

Lines 4797 to 4798 in 677d2b4

    
           versions_->LogAndApply(cfds_to_commit, mutable_cf_options_list, 
        
                                  edit_lists, &mutex_, directories_.GetDbDir());

(a) write to MANIFEST with mutex released

rocksdb/db/version_set.cc

Line 4407 in a96a4a2

mu->Unlock();

(b) apply to in-memory state with mutex held

A snapshot taken during (2a) will be unstable. In particular, queries against that snapshot will not include data from the ingested file before (2b), and will include data from the ingested file after (2b).

Test Plan:
Added a new unit test: ExternalSSTFileBasicTest.WriteAfterReopenStableSnapshotWhileLoggingToManifest.

make external_sst_file_basic_test
./external_sst_file_basic_test

facebook-github-bot · 2022-05-25T03:08:33Z

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr

LGTM, thanks for showing a simple solution can work!

I thought for a while about what we can do to increase concurrency of live writes with file ingestion. Nothing to do now; recording my notes here for future reference.

There is a way we can start working on the live writes earlier (cockroachdb/pebble does it) by decoupling the seqno allocated to writers from the seqno visible to readers, but still we would have to wait for the ingestion to finish before parallel live writes can be made visible to readers.
We could try only making writes that overlap with an ingested file wait, or return TryAgain if a write overlapping with an ingested file happened during the flushing stage. Currently we block all writes from even starting while we flush memtables and write to manifest which feels excessive.

facebook-github-bot · 2022-05-25T04:30:01Z

@cbi42 has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot · 2022-05-25T04:32:13Z

@cbi42 has imported this pull request. If you are a Meta employee, you can view this diff on Phabricator.

ajkr · 2022-05-25T17:34:15Z

I thought for a while about what we can do to increase concurrency of live writes with file ingestion. Nothing to do now; recording my notes here for future reference.

One more, just because it's an interesting problem to think about:

The ingestion path could simply insert a special range key [ingested file smallest key, ingested file largest key] -> ingested filename
- When readers need data for such a range key in memtable, they would look into ingested filename
- When background flush encounters such a range key, it can do the work of assigning the ingested file to a level and committing it to the LSM

Summary: Thanks to #9919 and #10051 the known bugs in file ingestion (besides mmap read + file checksum) are fixed. Now we can try again to enable file ingestion in crash test. Pull Request resolved: #9357 Test Plan: stress file ingestion heavily for an hour: `$ TEST_TMPDIR=/dev/shm python3 tools/db_crashtest.py blackbox --max_key=1000000 --ingest_external_file_one_in=100 --duration=3600 --interval=20 --write_buffer_size=524288 --target_file_size_base=524288 --max_bytes_for_level_base=2097152` Reviewed By: riversand963 Differential Revision: D33410746 Pulled By: ajkr fbshipit-source-id: d276431390995a67f68390d61c06a40945fdd280

…ifest (#10066) Summary: Fix the unittest `ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest` introduced in #10051 that is failing. Pull Request resolved: #10066 Test Plan: CI Reviewed By: ajkr Differential Revision: D36720669 Pulled By: cbi42 fbshipit-source-id: 47a6d2c161f27b605ede5c62d1776eecaf0d5363

Summary: Add to HISTORY.md the bug fixed in #10051 Pull Request resolved: #10091 Reviewed By: ajkr Differential Revision: D36821861 Pulled By: cbi42 fbshipit-source-id: 598812fab88f65c0147ece53cff55cf4ea73aac6

Set VersionSet last seqno after LogAndApply

0c250fd

facebook-github-bot added the CLA Signed label May 25, 2022

cbi42 changed the title ~~Set VersionSet last seqno after LogAndApply~~ Update VersionSet last seqno after LogAndApply May 25, 2022

Fix memory leak

c22025b

ajkr approved these changes May 25, 2022

View reviewed changes

typo

a94d6e0

facebook-github-bot closed this in b0e1906 May 25, 2022

ajkr mentioned this pull request May 25, 2022

Enable IngestExternalFile() in crash test #9357

Closed

cbi42 mentioned this pull request May 26, 2022

Fix unittest ExternalSSTFileBasicTest.StableSnapshotWhileLoggingToManifest #10066

Closed

cbi42 mentioned this pull request Jun 1, 2022

Add bug fix to HISTORY.md #10091

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update VersionSet last seqno after LogAndApply #10051

Update VersionSet last seqno after LogAndApply #10051

cbi42 commented May 25, 2022 •

edited

Loading

facebook-github-bot commented May 25, 2022

ajkr left a comment

facebook-github-bot commented May 25, 2022

facebook-github-bot commented May 25, 2022

ajkr commented May 25, 2022

	versions_->LogAndApply(cfds_to_commit, mutable_cf_options_list,
	edit_lists, &mutex_, directories_.GetDbDir());

Update VersionSet last seqno after LogAndApply #10051

Update VersionSet last seqno after LogAndApply #10051

Conversation

cbi42 commented May 25, 2022 • edited Loading

facebook-github-bot commented May 25, 2022

ajkr left a comment

Choose a reason for hiding this comment

facebook-github-bot commented May 25, 2022

facebook-github-bot commented May 25, 2022

ajkr commented May 25, 2022

cbi42 commented May 25, 2022 •

edited

Loading