Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid recompressing cold block in CompressedSecondaryCache #10527

Closed
wants to merge 56 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
56 commits
Select commit Hold shift + click to select a range
88c4053
Avoid recompressing cold block
Aug 14, 2022
437d31e
update
Aug 14, 2022
98b36e8
update
Aug 15, 2022
0c5ff08
remove comments.
Aug 15, 2022
8994688
update
Aug 15, 2022
63f8bb8
This is a commit without bugs reported by db_stress.
Aug 15, 2022
920d991
polishing.
Aug 15, 2022
9d3a72c
update
Aug 15, 2022
a900e7f
workable commit.
Aug 15, 2022
ac32842
update unit tests.
Aug 16, 2022
48b6135
update unit tests.
Aug 16, 2022
1cfc7ce
Add use_compressed_secondary_cache_ , standalone_pool_ratio, etc.
Aug 17, 2022
da46184
update Lookup and Promote.
Aug 17, 2022
d71a192
update
Aug 17, 2022
b715163
update
Aug 17, 2022
087f7f1
update
Aug 17, 2022
597b180
update
Aug 17, 2022
07e45eb
update
Aug 18, 2022
59ced4b
update
Aug 18, 2022
ded5759
update
Aug 19, 2022
b9bde8a
Update Lookup in fault_injection_secondary_cache.cc
Aug 19, 2022
c14a653
fix one Release issue in CompressedSecondaryCache::Lookup
Aug 21, 2022
9d87893
this is a workable commit with extra logs.
Aug 21, 2022
8c34ff0
remove some extra comments.
Aug 21, 2022
3069c5d
update
Aug 21, 2022
941eacc
remove cout.
Aug 21, 2022
7501a84
use mutext for updating standalone_pool_usage_
Aug 22, 2022
b94da9b
update
Aug 22, 2022
5bf17e3
update the mutex use.
Aug 22, 2022
0073eee
update
Aug 22, 2022
7368125
fix a double to size_t conversion warning.
Aug 22, 2022
5d4763e
update parameter comments.
Aug 22, 2022
e363895
check Status.
Aug 22, 2022
3331694
fix lint issues.
Aug 22, 2022
b79c6d5
fix a lint issue.
Aug 22, 2022
d8d2fda
Avoid insert a block into sec cache if it is evicted for the first time.
Aug 27, 2022
88cc137
update 0.3 to 0.2.
Aug 28, 2022
aaff78b
update
Aug 30, 2022
b66459b
update unit tests.
Aug 31, 2022
6f7a477
update blob_source_test.
Aug 31, 2022
e0ed88f
update db_blob_basic_test
Aug 31, 2022
9556d50
remove standalone pool.
Sep 2, 2022
befc26d
update
Sep 2, 2022
9a1b93b
update blob_source_test
Sep 2, 2022
d1621ab
update malloc_bin_sizes_.
Sep 3, 2022
17f2eaa
fix unit tests.
Sep 4, 2022
cf9c869
avoid split and merge.
Sep 4, 2022
0b136a7
add HISTORY.md
Sep 4, 2022
778717f
address the comments.
Sep 7, 2022
22170f3
fix a crash issue.
Sep 7, 2022
92e5a50
merge main.
Sep 7, 2022
b6543b2
update
Sep 7, 2022
4ed4448
rename erase_handle to advise_erase.
Sep 7, 2022
f7dca22
address comments.
Sep 7, 2022
d39ae15
fix a character issue.
Sep 7, 2022
3c82ce3
address a comment.
Sep 7, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 2 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,8 @@

### Behavior Change
* Right now, when the option migration tool (OptionChangeMigration()) migrates to FIFO compaction, it compacts all the data into one single SST file and move to L0. This might create a problem for some users: the giant file may be soon deleted to satisfy max_table_files_size, and might cayse the DB to be almost empty. We change the behavior so that the files are cut to be smaller, but these files might not follow the data insertion order. With the change, after the migration, migrated data might not be dropped by insertion order by FIFO compaction.
* When a block is firstly found from `CompressedSecondaryCache`, we just insert a dummy block into the primary cache and don’t erase the block from `CompressedSecondaryCache`. A standalone handle is returned to the caller. Only if the block is found again from `CompressedSecondaryCache` before the dummy block is evicted, we erase the block from `CompressedSecondaryCache` and insert it into the primary cache.
* When a block is firstly evicted from the primary cache to `CompressedSecondaryCache`, we just insert a dummy block in `CompressedSecondaryCache`. Only if it is evicted again before the dummy block is evicted from the cache, it is treated as a hot block and is inserted into `CompressedSecondaryCache`.

### New Features
* RocksDB does internal auto prefetching if it notices 2 sequential reads if readahead_size is not specified. New option `num_file_reads_for_auto_readahead` is added in BlockBasedTableOptions which indicates after how many sequential reads internal auto prefetching should be start (default is 2).
Expand Down
93 changes: 55 additions & 38 deletions cache/compressed_secondary_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,9 @@ CompressedSecondaryCache::CompressedSecondaryCache(
CacheMetadataChargePolicy metadata_charge_policy,
CompressionType compression_type, uint32_t compress_format_version)
: cache_options_(capacity, num_shard_bits, strict_capacity_limit,
high_pri_pool_ratio, memory_allocator, use_adaptive_mutex,
metadata_charge_policy, compression_type,
compress_format_version, low_pri_pool_ratio) {
high_pri_pool_ratio, low_pri_pool_ratio, memory_allocator,
use_adaptive_mutex, metadata_charge_policy,
compression_type, compress_format_version) {
cache_ =
NewLRUCache(capacity, num_shard_bits, strict_capacity_limit,
high_pri_pool_ratio, memory_allocator, use_adaptive_mutex,
Expand All @@ -35,58 +35,79 @@ CompressedSecondaryCache::~CompressedSecondaryCache() { cache_.reset(); }

std::unique_ptr<SecondaryCacheResultHandle> CompressedSecondaryCache::Lookup(
const Slice& key, const Cache::CreateCallback& create_cb, bool /*wait*/,
bool& is_in_sec_cache) {
bool advise_erase, bool& is_in_sec_cache) {
std::unique_ptr<SecondaryCacheResultHandle> handle;
is_in_sec_cache = false;
Cache::Handle* lru_handle = cache_->Lookup(key);
if (lru_handle == nullptr) {
return handle;
return nullptr;
}

CacheValueChunk* handle_value =
reinterpret_cast<CacheValueChunk*>(cache_->Value(lru_handle));
size_t handle_value_charge{0};
CacheAllocationPtr merged_value =
MergeChunksIntoValue(handle_value, handle_value_charge);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the change of removing MergeChunksIntoValue() related?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The myshadow AB tests showed consistent better metrics (cache hit rate, cpu, and mem_rss) were achieved without current split and merge functions, so I remove them for now.

void* handle_value = cache_->Value(lru_handle);
if (handle_value == nullptr) {
cache_->Release(lru_handle, /*erase_if_last_ref=*/false);
return nullptr;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we release the handle in this case?

}

CacheAllocationPtr* ptr = reinterpret_cast<CacheAllocationPtr*>(handle_value);

Status s;
void* value{nullptr};
size_t charge{0};
if (cache_options_.compression_type == kNoCompression) {
s = create_cb(merged_value.get(), handle_value_charge, &value, &charge);
s = create_cb(ptr->get(), cache_->GetCharge(lru_handle), &value, &charge);
} else {
UncompressionContext uncompression_context(cache_options_.compression_type);
UncompressionInfo uncompression_info(uncompression_context,
UncompressionDict::GetEmptyDict(),
cache_options_.compression_type);

size_t uncompressed_size{0};
CacheAllocationPtr uncompressed;
uncompressed = UncompressData(uncompression_info, (char*)merged_value.get(),
handle_value_charge, &uncompressed_size,
cache_options_.compress_format_version,
cache_options_.memory_allocator.get());
CacheAllocationPtr uncompressed = UncompressData(
uncompression_info, (char*)ptr->get(), cache_->GetCharge(lru_handle),
&uncompressed_size, cache_options_.compress_format_version,
cache_options_.memory_allocator.get());

if (!uncompressed) {
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
return handle;
cache_->Release(lru_handle, /*erase_if_last_ref=*/true);
return nullptr;
}
s = create_cb(uncompressed.get(), uncompressed_size, &value, &charge);
}

if (!s.ok()) {
cache_->Release(lru_handle, /* erase_if_last_ref */ true);
return handle;
cache_->Release(lru_handle, /*erase_if_last_ref=*/true);
return nullptr;
}

cache_->Release(lru_handle, /* erase_if_last_ref */ true);
if (advise_erase) {
cache_->Release(lru_handle, /*erase_if_last_ref=*/true);
// Insert a dummy handle.
cache_->Insert(key, /*value=*/nullptr, /*charge=*/0, DeletionCallback)
.PermitUncheckedError();
} else {
is_in_sec_cache = true;
cache_->Release(lru_handle, /*erase_if_last_ref=*/false);
}
handle.reset(new CompressedSecondaryCacheResultHandle(value, charge));

return handle;
}

Status CompressedSecondaryCache::Insert(const Slice& key, void* value,
const Cache::CacheItemHelper* helper) {
if (value == nullptr) {
return Status::InvalidArgument();
}

Cache::Handle* lru_handle = cache_->Lookup(key);
if (lru_handle == nullptr) {
// Insert a dummy handle if the handle is evicted for the first time.
return cache_->Insert(key, /*value=*/nullptr, /*charge=*/0,
DeletionCallback);
} else {
cache_->Release(lru_handle, /*erase_if_last_ref=*/false);
}

size_t size = (*helper->size_cb)(value);
CacheAllocationPtr ptr =
AllocateBlock(size, cache_options_.memory_allocator.get());
Expand Down Expand Up @@ -115,12 +136,14 @@ Status CompressedSecondaryCache::Insert(const Slice& key, void* value,
}

val = Slice(compressed_val);
size = compressed_val.size();
ptr = AllocateBlock(size, cache_options_.memory_allocator.get());
memcpy(ptr.get(), compressed_val.data(), size);
}

size_t charge{0};
CacheValueChunk* value_chunks_head =
SplitValueIntoChunks(val, cache_options_.compression_type, charge);
return cache_->Insert(key, value_chunks_head, charge, DeletionCallback);
CacheAllocationPtr* buf = new CacheAllocationPtr(std::move(ptr));

return cache_->Insert(key, buf, size, DeletionCallback);
}

void CompressedSecondaryCache::Erase(const Slice& key) { cache_->Erase(key); }
Expand Down Expand Up @@ -212,22 +235,16 @@ CacheAllocationPtr CompressedSecondaryCache::MergeChunksIntoValue(

void CompressedSecondaryCache::DeletionCallback(const Slice& /*key*/,
void* obj) {
CacheValueChunk* chunks_head = reinterpret_cast<CacheValueChunk*>(obj);
while (chunks_head != nullptr) {
CacheValueChunk* tmp_chunk = chunks_head;
chunks_head = chunks_head->next;
tmp_chunk->Free();
}
delete reinterpret_cast<CacheAllocationPtr*>(obj);
obj = nullptr;
}

std::shared_ptr<SecondaryCache> NewCompressedSecondaryCache(
size_t capacity, int num_shard_bits, bool strict_capacity_limit,
double high_pri_pool_ratio,
double high_pri_pool_ratio, double low_pri_pool_ratio,
std::shared_ptr<MemoryAllocator> memory_allocator, bool use_adaptive_mutex,
CacheMetadataChargePolicy metadata_charge_policy,
CompressionType compression_type, uint32_t compress_format_version,
double low_pri_pool_ratio) {
CompressionType compression_type, uint32_t compress_format_version) {
return std::make_shared<CompressedSecondaryCache>(
capacity, num_shard_bits, strict_capacity_limit, high_pri_pool_ratio,
low_pri_pool_ratio, memory_allocator, use_adaptive_mutex,
Expand All @@ -240,9 +257,9 @@ std::shared_ptr<SecondaryCache> NewCompressedSecondaryCache(
assert(opts.secondary_cache == nullptr);
return NewCompressedSecondaryCache(
opts.capacity, opts.num_shard_bits, opts.strict_capacity_limit,
opts.high_pri_pool_ratio, opts.memory_allocator, opts.use_adaptive_mutex,
opts.metadata_charge_policy, opts.compression_type,
opts.compress_format_version, opts.low_pri_pool_ratio);
opts.high_pri_pool_ratio, opts.low_pri_pool_ratio, opts.memory_allocator,
opts.use_adaptive_mutex, opts.metadata_charge_policy,
opts.compression_type, opts.compress_format_version);
}

} // namespace ROCKSDB_NAMESPACE
20 changes: 18 additions & 2 deletions cache/compressed_secondary_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,21 @@ class CompressedSecondaryCacheResultHandle : public SecondaryCacheResultHandle {
// The CompressedSecondaryCache is a concrete implementation of
// rocksdb::SecondaryCache.
//
// Users can also cast a pointer to it and call methods on
// When a block is found from CompressedSecondaryCache::Lookup, we check whether
// there is a dummy block with the same key in the primary cache.
// 1. If the dummy block exits, we erase the block from
// CompressedSecondaryCache and insert it into the primary cache.
// 2. If not, we just insert a dummy block into the primary cache
// (charging the actual size of the block) and don not erase the block from
// CompressedSecondaryCache. A standalone handle is returned to the caller.
//
// When a block is evicted from the primary cache, we check whether
// there is a dummy block with the same key in CompressedSecondaryCache.
// 1. If the dummy block exits, the block is inserted into
// CompressedSecondaryCache.
// 2. If not, we just insert a dummy block (size 0) in CompressedSecondaryCache.
//
// Users can also cast a pointer to CompressedSecondaryCache and call methods on
// it directly, especially custom methods that may be added
// in the future. For example -
// std::unique_ptr<rocksdb::SecondaryCache> cache =
Expand All @@ -72,7 +86,9 @@ class CompressedSecondaryCache : public SecondaryCache {

std::unique_ptr<SecondaryCacheResultHandle> Lookup(
const Slice& key, const Cache::CreateCallback& create_cb, bool /*wait*/,
bool& is_in_sec_cache) override;
bool advise_erase, bool& is_in_sec_cache) override;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Need comment here to explain the requirement for the Lookup() implementation by advise_erase.


bool SupportForceErase() const override { return true; }

void Erase(const Slice& key) override;

Expand Down
Loading