Skip to content

Commit

Permalink
Sanitize min_write_buffer_number_to_merge to 1 with atomic_flush (#10773
Browse files Browse the repository at this point in the history
)

Summary:
With current implementation, within the same RocksDB instance, all column families with non-empty memtables will be scheduled for flush if RocksDB determines that any column family needs to be flushed, e.g. memtable full, write buffer manager, etc., if atomic flush is enabled. Not doing so can lead to data loss and inconsistency when WAL is disabled, which is a common setting when atomic flush is enabled. Therefore, setting a per-column-family knob, min_write_buffer_number_to_merge to a value greater than 1 is not compatible with atomic flush, and should be sanitized during column family creation and db open.

Pull Request resolved: #10773

Test Plan:
Reproduce: D39993203 has detailed steps.
Run the test with and without the fix.

Reviewed By: cbi42

Differential Revision: D40077955

Pulled By: cbi42

fbshipit-source-id: 451a9179eb531ac42eaccf40b451b9dec4085240
  • Loading branch information
riversand963 authored and facebook-github-bot committed Oct 5, 2022
1 parent eca47fb commit 4d82b94
Show file tree
Hide file tree
Showing 3 changed files with 22 additions and 1 deletion.
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
### New Features
* Add a new option IOOptions.do_not_recurse that can be used by underlying file systems to skip recursing through sub directories and list only files in GetChildren API.

### Behavior Changes
* Sanitize min_write_buffer_number_to_merge to 1 if atomic flush is enabled to prevent unexpected data loss when WAL is disabled in a multi-column-family setting (#10773).

## 7.7.0 (09/18/2022)
### Bug Fixes
* Fixed a hang when an operation such as `GetLiveFiles` or `CreateNewBackup` is asked to trigger and wait for memtable flush on a read-only DB. Such indirect requests for memtable flush are now ignored on a read-only DB.
Expand Down
15 changes: 15 additions & 0 deletions db/column_family.cc
Original file line number Diff line number Diff line change
Expand Up @@ -233,6 +233,21 @@ ColumnFamilyOptions SanitizeOptions(const ImmutableDBOptions& db_options,
result.min_write_buffer_number_to_merge = 1;
}

if (db_options.atomic_flush && result.min_write_buffer_number_to_merge > 1) {
ROCKS_LOG_WARN(
db_options.logger,
"Currently, if atomic_flush is true, then triggering flush for any "
"column family internally (non-manual flush) will trigger flushing "
"all column families even if the number of memtables is smaller "
"min_write_buffer_number_to_merge. Therefore, configuring "
"min_write_buffer_number_to_merge > 1 is not compatible and should "
"be satinized to 1. Not doing so will lead to data loss and "
"inconsistent state across multiple column families when WAL is "
"disabled, which is a common setting for atomic flush");

result.min_write_buffer_number_to_merge = 1;
}

if (result.num_levels < 1) {
result.num_levels = 1;
}
Expand Down
5 changes: 4 additions & 1 deletion include/rocksdb/advanced_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -271,7 +271,10 @@ struct AdvancedColumnFamilyOptions {
// read amplification because a get request has to check in all of these
// files. Also, an in-memory merge may result in writing lesser
// data to storage if there are duplicate records in each of these
// individual write buffers. Default: 1
// individual write buffers.
// If atomic flush is enabled (options.atomic_flush == true), then this
// option will be sanitized to 1.
// Default: 1
int min_write_buffer_number_to_merge = 1;

// DEPRECATED
Expand Down

0 comments on commit 4d82b94

Please sign in to comment.