Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Slowdown writes based on WBM's dirty memory usage #164

Closed
wants to merge 229 commits into from

Conversation

udi-speedb
Copy link
Contributor

No description provided.

hx235 and others added 30 commits April 19, 2022 09:32
… (#9854)

Summary:
Context:
As mentioned in facebook/rocksdb#9701, we have the following in LITE=1 make static_lib for v7.0.2
```
  CC       file/sequence_file_reader.o
  CC       file/sst_file_manager_impl.o
  CC       file/writable_file_writer.o
In file included from file/writable_file_writer.cc:10:
./file/writable_file_writer.h:163:15: error: private field 'temperature_' is not used [-Werror,-Wunused-private-field]
  Temperature temperature_;
              ^
1 error generated.
make: *** [file/writable_file_writer.o] Error 1
```

 as titled

Pull Request resolved: facebook/rocksdb#9854

Test Plan:
- Local `LITE=1 make static_lib` reveals the same error and error is gone after this fix
- CI

Reviewed By: ajkr, jay-zhuang

Differential Revision: D35706585

Pulled By: hx235

fbshipit-source-id: 7743310298231ad6866304ffa2225c8abdc91d9a
Summary:
We don't really have a mechanism for internal-only release
notes, so adding this to the standard release notes. For picking into
7.2 release.

Pull Request resolved: facebook/rocksdb#9872

Test Plan: release note only

Reviewed By: jay-zhuang

Differential Revision: D35761307

Pulled By: pdillinger

fbshipit-source-id: 5d1932767fff48456323df948604dbb956ac27b2
Summary:
Add release note for CompressedSecondaryCache and the update of SecondaryCache::Lookup().

Pull Request resolved: facebook/rocksdb#9874

Reviewed By: jay-zhuang

Differential Revision: D35765973

Pulled By: gitbw95

fbshipit-source-id: 98232508c4f2047216def9c11a038cfb98709690
…Test.CapMemoryUsageUnderCacheCapacity (#9869)

Summary:
**Context:**
Unit test for facebook/rocksdb#9748 keeps opening new files to see whether the new feature is able to correctly constrain the opening based on block cache capacity.

However, the unit test has two places written inefficiently that can lead to opening too many new files relative to underlying operating system/file system constraint, even before hitting the block cache capacity:
(1) [opened_table_reader_num < 2 * max_table_reader_num](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR438), which can leads to 1200 + open files because of (2) below
(2) NewLRUCache(6 * CacheReservationManagerImpl<CacheEntryRole::kBlockBasedTableReader>::GetDummyEntrySize()) in [here](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR364)

Therefore we see CI failures like this on machine with a strict open file limit ~1000 (see the "table_1021" naming in following error msg)
https://app.circleci.com/pipelines/github/facebook/rocksdb/12886/workflows/75524682-3fa4-41ee-9a61-81827b51d99b/jobs/345270
```
fs_->NewWritableFile(path, foptions, &file, nullptr)
IO error: While open a file for appending: /dev/shm/rocksdb.Jedwt/run-block_based_table_reader_test-CapMemoryUsageUnderCacheCapacity-BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity-0/block_based_table_reader_test_1668910_829492452552920927/**table_1021**: Too many open files
```

**Summary:**
- Revised the test more efficiently on the above 2 places,  including using 1.1 instead 2 in the threshold and lowering down the block cache capacity a bit
- Renamed some variables for clarity

Pull Request resolved: facebook/rocksdb#9869

Test Plan:
- Manual inspection of max opened table reader in all test case, which is around ~389
- Circle CI to see if error is gone

Reviewed By: ajkr

Differential Revision: D35752655

Pulled By: hx235

fbshipit-source-id: 8a0953d39d561babfa4257b8ed8550bb21b04839
Summary:
Add experimental new API AbortIO in FileSystem to abort the
read requests submitted asynchronously through ReadAsync API.

Pull Request resolved: facebook/rocksdb#9901

Test Plan: Existing tests

Reviewed By: anand1976

Differential Revision: D35885591

Pulled By: akankshamahajan15

fbshipit-source-id: df3944e6e9e6e487af1fa688376b4abb6837fb02
Summary:
Add stats PREFETCHED_BYTES_DISCARDED and POLL_WAIT_MICROS.
PREFETCHED_BYTES_DISCARDED records number of prefetched bytes discarded by
FilePrefetchBuffer. POLL_WAIT_MICROS records the time taken by underling
file_system Poll API.

Pull Request resolved: facebook/rocksdb#9845

Test Plan: Update existing tests

Reviewed By: anand1976

Differential Revision: D35909694

Pulled By: akankshamahajan15

fbshipit-source-id: e009ef940bb9ed72c9446f5529095caabb8a1e36
Summary:
Left HISTORY.md and unit tests.
Added a new unit test to repro the corruption scenario that this PR fixes, and HISTORY.md line for that.

Pull Request resolved: facebook/rocksdb#9906

Reviewed By: riversand963

Differential Revision: D35940093

Pulled By: ajkr

fbshipit-source-id: 9816f99e1ce405ba36f316beb4f6378c37c8c86b
Summary:
In FilePrefetchBuffer, in case data is overlapping between two
buffers and more data is required to read and copy that to third buffer,
incorrect length was updated resulting in
```
Iterator diverged from control iterator which has value 00000000000310C3000000000000012B0000000000000274 total_order_seek: 1 auto_prefix_mode: 0 S 000000000002C37F000000000000012B000000000000001C NNNPPPPPNN; total_order_seek: 1 auto_prefix_mode: 0 S 000000000002F10B00000000000000BF78787878787878 NNNPNNNNPN; total_order_seek: 1 auto_prefix_mode: 0 S 00000000000310C3000000000000012B000000000000026B
iterator is not valid
Control CF default
db_stress: db_stress_tool/db_stress_test_base.cc:1388: void rocksdb::StressTest::VerifyIterator(rocksdb::ThreadState*, rocksdb::ColumnFamilyHandle*, const rocksdb::ReadOptions&, rocksdb::Iterator*, rocksdb::Iterator*, rocksdb::StressTest::LastIterateOp, const rocksdb::Slice&, const string&, bool*): Assertion `false' failed.
Aborted (core dumped)
```

Pull Request resolved: facebook/rocksdb#9916

Test Plan:
```
- CircleCI jobs
- Ran db_stress with OPTIONS file which caught the bug
 ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=42.26248932628998 --bottommost_compression_type=disable --cache_index_and_filter_blocks=0 --cache_size=8388608 --checkpoint_one_in=0 --checksum_type=kxxHash --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 --compression_max_dict_buffer_bytes=1073741823 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/ --db_write_buffer_size=134217728 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_blob_files=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --fail_if_options_file_error=0 --file_checksum_impl=none --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=12 --index_type=2 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=0 --mark_for_compaction_one_file_in=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=8388608 --memtable_prefix_bloom_size_ratio=0.001 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=0 --read_only=0 --readpercent=50 --recycle_log_file_num=1 --reopen=0 --reserve_table_reader_memory=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_catch_up_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=3 --unpartitioned_pinning=3 --use_blob_db=0 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=0 --use_txn=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=0 --writepercent=35 --options_file=/home/akankshamahajan/OPTIONS.orig -column_families=1

db_bench with async_io enabled to make sure db_bench completes successfully without any failure.
- ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1
```

crash_test in progress

Reviewed By: anand1976

Differential Revision: D35985789

Pulled By: akankshamahajan15

fbshipit-source-id: 5abe185f34caa99ca587d4bdc8954bd0802b1bf9
RocksDB currently fails to build with clang 12 due to newly added
diagnostics. This is problematic for fuzz testing in CI and for developers
building locally with clang, and thankfully these are easily fixable.
Currently the CMake config only looks for ccache, so developers using
sccache need to manually set the launch commands. Add a check for sccache
to spare them the trouble.

While at it, clean up the CMake config and remove ccache as the link launch
wrapper since it will just forward to the underlying linker anyway, and
also regenerate the buck TARGETS file (we might want to drop support for
it at some point, but for now let's keep it consistent).
This commit originally imported fuzzing tools from rocksdb.
However, after the rebase on 6.21 (from 6.11.4) only the compilation
fixes are left.

Original commit message:

Building them is a bit of a pain, but the OSS-Fuzz build scripts are
a good reference on how to do that:
https://github.com/google/oss-fuzz/tree/master/projects/rocksdb

This commit also fixes some clang compilation errors.
…Data()

When there's no data in the buffer, there's nothing to drop anyway, and
providing 0 to the rand->Uniform() function results in a division by zero
error (SIGFPE), so just check and do nothing in that case.
When fixing the clang compilation errors (offsetof() being applied
to non standard layout types), the pointer flags were mistakenly set
to `kNone` from `kAllowNone`. This didn't cause an issue with the tests
on 6.22.1, but it does after the rebase on 7.2.2. Add the missing flags
back so that the test passes.
isaac-io and others added 6 commits October 25, 2022 17:37
This includes both the CMake and Makefile configurations, various scripts
that are invoked during the build or during test runs, as well as the
Java native library loader code.
This includes simple renaming from `rocksdb` to `speedb` as well as replacing
URLs that are pointing to the RocksDB repository with the URL of the Speedb
repository.
This includes references in statuses as well as tools output.
This is done across all tools except trace_replay, which specifically
looks for the RocksDB version for database compatibility checks.
zlib 1.12.2 was pulled due to https://nvd.nist.gov/vuln/detail/CVE-2022-37434
and is no longer available. This breaks the build, and prevents version
release. Update to 1.12.3 to allow the build to work.
@isaac-io isaac-io force-pushed the main branch 2 times, most recently from 35fba0e to be2764e Compare October 26, 2022 12:18
The release pipeline is currently editing the RocksDB version and in a
broken manner. Fix it and use a proper trigger for the artefacts release.
GitHub Runner Bot and others added 4 commits October 26, 2022 13:40
In the case where we create the test path, we should clean up after the
run finishes as done in the Makefile, due to garbage that is left behind
by tests and accumulates.

While at it, randomise test scheduling in order to increase the likelihood
of surfacing latent bugs that are hidden by the execution order.
@udi-speedb udi-speedb force-pushed the 114-writes-slowdown-based-on-dirty-memory-usage branch from 15a5f50 to c4b0bee Compare November 3, 2022 05:59
@udi-speedb
Copy link
Contributor Author

udi-speedb commented Nov 3, 2022

@isaac-io Please review the updated commit. Thanks
But I found a minor issue so please wait until I update (force push) it. Thanks

@udi-speedb udi-speedb force-pushed the 114-writes-slowdown-based-on-dirty-memory-usage branch from c4b0bee to d3415be Compare November 3, 2022 09:02
@udi-speedb
Copy link
Contributor Author

@isaac-io - I have pushed the minor fix (forced). Please review. Thanks

Copy link
Contributor

@isaac-io isaac-io left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some minor comments, but overall LGTM.

@@ -739,19 +739,26 @@ const double kDelayRecoverSlowdownRatio = 1.4;

namespace {
// If penalize_stop is true, we further reduce slowdown rate.
// being one of them. The columen families have their own delay / stop logic
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: this comment seems to be missing a line that somehow got pushed down to line 746 below. Also, columen -> column.

@@ -21,11 +25,18 @@ class WriteControllerToken;
// All of the methods here (including WriteControllerToken's destructors) need
// to be called while holding DB mutex
class WriteController {
public:
enum class DelaySource { kCF = 0, kWBM = 1, kNumSources };
static constexpr unsigned int DelaySourceValue(DelaySource source) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use size_t for the return value, as it's used for indexing into an array.

public:
enum class DelaySource { kCF = 0, kWBM = 1, kNumSources };
static constexpr unsigned int DelaySourceValue(DelaySource source) {
return static_cast<int>(source);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note: casting to int, while return value is unsigned int.

// When an actor (column family) requests a moderate token, compaction
// threads will be increased
std::unique_ptr<WriteControllerToken> GetCompactionPressureToken();

// these three metods are querying the state of the WriteController
bool IsStopped() const;
bool NeedsDelay() const { return total_delayed_.load() > 0; }
bool NeedsDelay() const { return (TotalDelayed() > 0); }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: the enclosing round brackets are redundant here and hurt readability.

Comment on lines 829 to 834
ROCKS_LOG_HEADER(log, " Options.write_buffer_manager: %p",
write_buffer_manager.get());
if (write_buffer_manager.get()) {
ROCKS_LOG_HEADER(log, " write_buffer_manager_options:");
ROCKS_LOG_HEADER(log, "%s",
write_buffer_manager->GetPrintableOptions().c_str());
}
ROCKS_LOG_HEADER(
log, " Options.write_buffer_manager: %p%s%s",
write_buffer_manager.get(), (write_buffer_manager.get() ? "\n" : ""),
(write_buffer_manager.get()
? write_buffer_manager->GetPrintableOptions().c_str()
: ""));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This change doesn't change the functionality, nor make the code more readable. Why is it needed?

@udi-speedb
Copy link
Contributor Author

@isaac-io Thanks for the review.
Have you finished it?
If not, please let me know once you are done so I can fix in a single iteration.
Thanks

@isaac-io
Copy link
Contributor

isaac-io commented Nov 3, 2022

@isaac-io Thanks for the review. Have you finished it? If not, please let me know once you are done so I can fix in a single iteration. Thanks

I did. I don't have any more comments, which is why I approved.

@udi-speedb
Copy link
Contributor Author

@Yuval-Ariel This is the PR that was opened for #114. However, this is no longer an issue to be inserted into main as is. Please decide what to do with this PR (probably close it). Thanks

@udi-speedb
Copy link
Contributor Author

@Yuval-Ariel, @Guyme - Is this PR applicable? Shouldn't it be closed in favour of the PR that will include all of @Yuval-Ariel's work, including whatever was used from this PR?

@Yuval-Ariel
Copy link
Contributor

yes it will be closed once i open a new one.

@Yuval-Ariel
Copy link
Contributor

will be handled by #423

@Yuval-Ariel Yuval-Ariel deleted the 114-writes-slowdown-based-on-dirty-memory-usage branch May 11, 2023 08:33
@Yuval-Ariel Yuval-Ariel restored the 114-writes-slowdown-based-on-dirty-memory-usage branch May 11, 2023 08:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

WriteBufferManager: allow slowing down writes based on total memory usage