Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SPDB-702: Avoid unnecessary blocking io #5

Closed
wants to merge 230 commits into from

Conversation

AmnonHanuhov
Copy link
Contributor

avoid_unnecessary_blocking_io is an option in rocksdb that if set to true, allows working threads
to avoid doing unnecessary and long-latency operations on the “hot path” and instead schedule
background jobs to do them. We have seen improvements in compaction latency when this option
was set to true, due to the fact that obsolete files were deleted in a separate background
job, instead of being deleted as part of the compaction job.

Signed-off-by: Amnon Hanuhov AmnonSWE@gmail.com

hx235 and others added 11 commits April 19, 2022 09:32
… (#9854)

Summary:
Context:
As mentioned in facebook/rocksdb#9701, we have the following in LITE=1 make static_lib for v7.0.2
```
  CC       file/sequence_file_reader.o
  CC       file/sst_file_manager_impl.o
  CC       file/writable_file_writer.o
In file included from file/writable_file_writer.cc:10:
./file/writable_file_writer.h:163:15: error: private field 'temperature_' is not used [-Werror,-Wunused-private-field]
  Temperature temperature_;
              ^
1 error generated.
make: *** [file/writable_file_writer.o] Error 1
```

 as titled

Pull Request resolved: facebook/rocksdb#9854

Test Plan:
- Local `LITE=1 make static_lib` reveals the same error and error is gone after this fix
- CI

Reviewed By: ajkr, jay-zhuang

Differential Revision: D35706585

Pulled By: hx235

fbshipit-source-id: 7743310298231ad6866304ffa2225c8abdc91d9a
Summary:
We don't really have a mechanism for internal-only release
notes, so adding this to the standard release notes. For picking into
7.2 release.

Pull Request resolved: facebook/rocksdb#9872

Test Plan: release note only

Reviewed By: jay-zhuang

Differential Revision: D35761307

Pulled By: pdillinger

fbshipit-source-id: 5d1932767fff48456323df948604dbb956ac27b2
Summary:
Add release note for CompressedSecondaryCache and the update of SecondaryCache::Lookup().

Pull Request resolved: facebook/rocksdb#9874

Reviewed By: jay-zhuang

Differential Revision: D35765973

Pulled By: gitbw95

fbshipit-source-id: 98232508c4f2047216def9c11a038cfb98709690
…Test.CapMemoryUsageUnderCacheCapacity (#9869)

Summary:
**Context:**
Unit test for facebook/rocksdb#9748 keeps opening new files to see whether the new feature is able to correctly constrain the opening based on block cache capacity.

However, the unit test has two places written inefficiently that can lead to opening too many new files relative to underlying operating system/file system constraint, even before hitting the block cache capacity:
(1) [opened_table_reader_num < 2 * max_table_reader_num](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR438), which can leads to 1200 + open files because of (2) below
(2) NewLRUCache(6 * CacheReservationManagerImpl<CacheEntryRole::kBlockBasedTableReader>::GetDummyEntrySize()) in [here](https://github.com/facebook/rocksdb/pull/9748/files?show-viewed-files=true&file-filters%5B%5D=#diff-ec9f5353e317df71093094734ba29193b94a998f0f9c9af924e4c99692195eeaR364)

Therefore we see CI failures like this on machine with a strict open file limit ~1000 (see the "table_1021" naming in following error msg)
https://app.circleci.com/pipelines/github/facebook/rocksdb/12886/workflows/75524682-3fa4-41ee-9a61-81827b51d99b/jobs/345270
```
fs_->NewWritableFile(path, foptions, &file, nullptr)
IO error: While open a file for appending: /dev/shm/rocksdb.Jedwt/run-block_based_table_reader_test-CapMemoryUsageUnderCacheCapacity-BlockBasedTableReaderCapMemoryTest.CapMemoryUsageUnderCacheCapacity-0/block_based_table_reader_test_1668910_829492452552920927/**table_1021**: Too many open files
```

**Summary:**
- Revised the test more efficiently on the above 2 places,  including using 1.1 instead 2 in the threshold and lowering down the block cache capacity a bit
- Renamed some variables for clarity

Pull Request resolved: facebook/rocksdb#9869

Test Plan:
- Manual inspection of max opened table reader in all test case, which is around ~389
- Circle CI to see if error is gone

Reviewed By: ajkr

Differential Revision: D35752655

Pulled By: hx235

fbshipit-source-id: 8a0953d39d561babfa4257b8ed8550bb21b04839
Summary:
Add experimental new API AbortIO in FileSystem to abort the
read requests submitted asynchronously through ReadAsync API.

Pull Request resolved: facebook/rocksdb#9901

Test Plan: Existing tests

Reviewed By: anand1976

Differential Revision: D35885591

Pulled By: akankshamahajan15

fbshipit-source-id: df3944e6e9e6e487af1fa688376b4abb6837fb02
Summary:
Add stats PREFETCHED_BYTES_DISCARDED and POLL_WAIT_MICROS.
PREFETCHED_BYTES_DISCARDED records number of prefetched bytes discarded by
FilePrefetchBuffer. POLL_WAIT_MICROS records the time taken by underling
file_system Poll API.

Pull Request resolved: facebook/rocksdb#9845

Test Plan: Update existing tests

Reviewed By: anand1976

Differential Revision: D35909694

Pulled By: akankshamahajan15

fbshipit-source-id: e009ef940bb9ed72c9446f5529095caabb8a1e36
Summary:
Left HISTORY.md and unit tests.
Added a new unit test to repro the corruption scenario that this PR fixes, and HISTORY.md line for that.

Pull Request resolved: facebook/rocksdb#9906

Reviewed By: riversand963

Differential Revision: D35940093

Pulled By: ajkr

fbshipit-source-id: 9816f99e1ce405ba36f316beb4f6378c37c8c86b
Summary:
In FilePrefetchBuffer, in case data is overlapping between two
buffers and more data is required to read and copy that to third buffer,
incorrect length was updated resulting in
```
Iterator diverged from control iterator which has value 00000000000310C3000000000000012B0000000000000274 total_order_seek: 1 auto_prefix_mode: 0 S 000000000002C37F000000000000012B000000000000001C NNNPPPPPNN; total_order_seek: 1 auto_prefix_mode: 0 S 000000000002F10B00000000000000BF78787878787878 NNNPNNNNPN; total_order_seek: 1 auto_prefix_mode: 0 S 00000000000310C3000000000000012B000000000000026B
iterator is not valid
Control CF default
db_stress: db_stress_tool/db_stress_test_base.cc:1388: void rocksdb::StressTest::VerifyIterator(rocksdb::ThreadState*, rocksdb::ColumnFamilyHandle*, const rocksdb::ReadOptions&, rocksdb::Iterator*, rocksdb::Iterator*, rocksdb::StressTest::LastIterateOp, const rocksdb::Slice&, const string&, bool*): Assertion `false' failed.
Aborted (core dumped)
```

Pull Request resolved: facebook/rocksdb#9916

Test Plan:
```
- CircleCI jobs
- Ran db_stress with OPTIONS file which caught the bug
 ./db_stress --acquire_snapshot_one_in=10000 --adaptive_readahead=0 --async_io=1 --avoid_flush_during_recovery=0 --avoid_unnecessary_blocking_io=1 --backup_max_size=104857600 --backup_one_in=0 --batch_protection_bytes_per_key=0 --block_size=16384 --bloom_bits=42.26248932628998 --bottommost_compression_type=disable --cache_index_and_filter_blocks=0 --cache_size=8388608 --checkpoint_one_in=0 --checksum_type=kxxHash --clear_column_family_one_in=0 --compact_files_one_in=1000000 --compact_range_one_in=1000000 --compaction_ttl=0 --compression_max_dict_buffer_bytes=1073741823 --compression_max_dict_bytes=16384 --compression_parallel_threads=1 --compression_type=zstd --compression_zstd_max_train_bytes=65536 --continuous_verification_interval=0 --db=/dev/shm/rocksdb/ --db_write_buffer_size=134217728 --delpercent=5 --delrangepercent=0 --destroy_db_initially=0 --detect_filter_construct_corruption=0 --disable_wal=0 --enable_blob_files=0 --enable_compaction_filter=0 --enable_pipelined_write=0 --fail_if_options_file_error=0 --file_checksum_impl=none --flush_one_in=1000000 --format_version=4 --get_current_wal_file_one_in=0 --get_live_files_one_in=1000000 --get_property_one_in=1000000 --get_sorted_wal_files_one_in=0 --index_block_restart_interval=12 --index_type=2 --ingest_external_file_one_in=0 --iterpercent=10 --key_len_percent_dist=1,30,69 --level_compaction_dynamic_level_bytes=True --long_running_snapshots=0 --mark_for_compaction_one_file_in=0 --max_background_compactions=20 --max_bytes_for_level_base=10485760 --max_key=25000000 --max_key_len=3 --max_manifest_file_size=1073741824 --max_write_batch_group_size_bytes=1048576 --max_write_buffer_number=3 --max_write_buffer_size_to_maintain=8388608 --memtable_prefix_bloom_size_ratio=0.001 --memtable_whole_key_filtering=1 --memtablerep=skip_list --mmap_read=0 --mock_direct_io=False --nooverwritepercent=1 --open_files=100 --open_metadata_write_fault_one_in=0 --open_read_fault_one_in=0 --open_write_fault_one_in=16 --ops_per_thread=100000000 --optimize_filters_for_memory=0 --paranoid_file_checks=1 --partition_filters=0 --partition_pinning=1 --pause_background_one_in=1000000 --periodic_compaction_seconds=0 --prefix_size=-1 --prefixpercent=0 --prepopulate_block_cache=0 --progress_reports=0 --read_fault_one_in=0 --read_only=0 --readpercent=50 --recycle_log_file_num=1 --reopen=0 --reserve_table_reader_memory=0 --ribbon_starting_level=999 --secondary_cache_fault_one_in=0 --secondary_catch_up_one_in=0 --set_options_one_in=10000 --snapshot_hold_ops=100000 --sst_file_manager_bytes_per_sec=104857600 --sst_file_manager_bytes_per_truncate=0 --subcompactions=2 --sync=0 --sync_fault_injection=False --target_file_size_base=2097152 --target_file_size_multiplier=2 --test_batches_snapshots=0 --test_cf_consistency=0 --top_level_index_pinning=3 --unpartitioned_pinning=3 --use_blob_db=0 --use_block_based_filter=0 --use_clock_cache=0 --use_direct_io_for_flush_and_compaction=1 --use_direct_reads=0 --use_full_merge_v1=0 --use_merge=0 --use_multiget=0 --use_txn=0 --user_timestamp_size=0 --value_size_mult=32 --verify_checksum=1 --verify_checksum_one_in=1000000 --verify_db_one_in=100000 --wal_compression=zstd --write_buffer_size=4194304 --write_dbid_to_manifest=0 --writepercent=35 --options_file=/home/akankshamahajan/OPTIONS.orig -column_families=1

db_bench with async_io enabled to make sure db_bench completes successfully without any failure.
- ./db_bench -use_existing_db=true -db=/tmp/prefix_scan_prefetch_main -benchmarks="seekrandom" -key_size=32 -value_size=512 -num=5000000 -use_direct_reads=true -seek_nexts=327680 -duration=120 -ops_between_duration_checks=1 -async_io=1
```

crash_test in progress

Reviewed By: anand1976

Differential Revision: D35985789

Pulled By: akankshamahajan15

fbshipit-source-id: 5abe185f34caa99ca587d4bdc8954bd0802b1bf9
@AmnonHanuhov AmnonHanuhov requested a review from hilikspdb June 7, 2022 18:10
@AmnonHanuhov AmnonHanuhov self-assigned this Jun 7, 2022
@AmnonHanuhov AmnonHanuhov force-pushed the amnon/SPDB-702-avoid-unnecessary-blocking-io branch from 7a22f60 to 5abf9a2 Compare June 7, 2022 18:12
@Yuval-Ariel Yuval-Ariel requested review from Yuval-Ariel and removed request for hilikspdb June 20, 2022 11:27
@Yuval-Ariel
Copy link
Contributor

assigned to @assaf-speedb for QA.
need to decide on the performance tests for this @erez-speedb @hilikspdb

isaac-io and others added 9 commits June 22, 2022 19:38
RocksDB currently fails to build with clang 12 due to newly added
diagnostics. This is problematic for fuzz testing in CI and for developers
building locally with clang, and thankfully these are easily fixable.
Currently the CMake config only looks for ccache, so developers using
sccache need to manually set the launch commands. Add a check for sccache
to spare them the trouble.

While at it, clean up the CMake config and remove ccache as the link launch
wrapper since it will just forward to the underlying linker anyway, and
also regenerate the buck TARGETS file (we might want to drop support for
it at some point, but for now let's keep it consistent).
isaac-io and others added 2 commits October 26, 2022 12:34
zlib 1.12.2 was pulled due to https://nvd.nist.gov/vuln/detail/CVE-2022-37434
and is no longer available. This breaks the build, and prevents version
release. Update to 1.12.3 to allow the build to work.
@isaac-io isaac-io force-pushed the main branch 2 times, most recently from 35fba0e to be2764e Compare October 26, 2022 12:18
@isaac-io isaac-io modified the milestones: v2.1.0, v2.2.0 Oct 26, 2022
The release pipeline is currently editing the RocksDB version and in a
broken manner. Fix it and use a proper trigger for the artefacts release.
GitHub Runner Bot and others added 6 commits October 26, 2022 13:40
In the case where we create the test path, we should clean up after the
run finishes as done in the Makefile, due to garbage that is left behind
by tests and accumulates.

While at it, randomise test scheduling in order to increase the likelihood
of surfacing latent bugs that are hidden by the execution order.
`avoid_unnecessary_blocking_io` is an option in rocksdb that if set to `true`, allows working threads
to avoid doing unnecessary and long-latency operations on the “hot path” and instead schedule
background jobs to do them. We have seen improvements in compaction latency when this option
was set to `true`, due to the fact that obsolete files were deleted in a separate background
job, instead of being deleted as part of the compaction job.

Signed-off-by: Amnon Hanuhov <AmnonSWE@gmail.com>
When we need to purge a large amount of files, or large files
which don't have their blocks allocated contiguously, or on
bitmap-based filesystems (even when we have a contiguous range of blocks for each file),
we may incur a high load on the disk when purging obsolete files
that would eat into the bandwidth available to user operations and compactions.
To avoid this situation, this change truncates files into modestly sized chunks
and then deletes each one of them, instead of deleting the entire file in one go.

Signed-off-by: Amnon Hanuhov <AmnonSWE@gmail.com>
…or read only databases

Inside SanitizeOptions() we set this option to false if read_only is set to true.
@erez-speedb
Copy link

I don't see any gain in either the small obj test

Screenshot 2022-11-16 at 14 18 53

O the large obj
Screenshot 2022-11-16 at 14 15 55

@AmnonHanuhov
Copy link
Contributor Author

I don't see any gain in either the small obj test

Screenshot 2022-11-16 at 14 18 53

O the large obj Screenshot 2022-11-16 at 14 15 55

@hilikspdb thoughts?

@Guyme
Copy link

Guyme commented Dec 27, 2022

Shows no value, abandoned.

@Guyme Guyme closed this Dec 27, 2022
Yuval-Ariel added a commit that referenced this pull request May 8, 2023
@Yuval-Ariel Yuval-Ariel deleted the amnon/SPDB-702-avoid-unnecessary-blocking-io branch May 11, 2023 08:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Avoid flooding disk I/O on file purges set purges as Pri::LOW instead of HIGH