Skip to content

Commit

Permalink
Merge branch 'unstable' into dev_bloom
Browse files Browse the repository at this point in the history
  • Loading branch information
zncleon authored Sep 18, 2023
2 parents a728353 + d42a89f commit 5bd0f53
Show file tree
Hide file tree
Showing 9 changed files with 147 additions and 98 deletions.
2 changes: 2 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -132,6 +132,7 @@ include(cmake/libevent.cmake)
include(cmake/fmt.cmake)
include(cmake/jsoncons.cmake)
include(cmake/xxhash.cmake)
include(cmake/span.cmake)

if (ENABLE_LUAJIT)
include(cmake/luajit.cmake)
Expand Down Expand Up @@ -162,6 +163,7 @@ list(APPEND EXTERNAL_LIBS jsoncons)
list(APPEND EXTERNAL_LIBS Threads::Threads)
list(APPEND EXTERNAL_LIBS ${Backtrace_LIBRARY})
list(APPEND EXTERNAL_LIBS xxhash)
list(APPEND EXTERNAL_LIBS span-lite)

# Add git sha to version.h
find_package(Git REQUIRED)
Expand Down
5 changes: 4 additions & 1 deletion NOTICE
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@ This product uses it under the Apache 2.0 License.
* oneTBB(https://github.com/oneapi-src/oneTBB)

Files src/common/rocksdb_crc32c.h and src/storage/batch_debugger.h are modified from RocksDB.
Files src/types/bloom_filter.* are modified from Apache Arrow.
The text of the license is the standard Apache 2.0 license.

================================================================
Expand All @@ -35,6 +36,7 @@ The following components are provided under the BSD-2-Clause License. See projec
The text of each license is also included in licenses/LICENSE-[project].txt.

* lz4(https://github.com/lz4/lz4)
* xxHash(https://github.com/Cyan4973/xxHash)

NB: This product only uses the source code in `lib` directory which is under the BSD 2-Clause.

Expand Down Expand Up @@ -69,7 +71,8 @@ Boost Software License Version 1.0
The following components are provided under the Boost Software License Version 1.0. See project link for details.
The text of each license is also included in licenses/LICENSE-[project].txt

* jsoncons (https://github.com/danielaparker/jsoncons)
* jsoncons(https://github.com/danielaparker/jsoncons)
* span-lite(https://github.com/martinmoene/span-lite)

================================================================
zlib/libpng licenses
Expand Down
27 changes: 27 additions & 0 deletions cmake/span.cmake
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements. See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership. The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License. You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied. See the License for the
# specific language governing permissions and limitations
# under the License.

include_guard()

include(cmake/utils.cmake)

FetchContent_DeclareGitHubWithMirror(span
martinmoene/span-lite v0.10.3
MD5=ee5c6721d4f4f56a6e6f250c68ad4132
)

FetchContent_MakeAvailableWithArgs(span)
23 changes: 23 additions & 0 deletions licenses/LICENSE-span-lite.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
Boost Software License - Version 1.0 - August 17th, 2003

Permission is hereby granted, free of charge, to any person or organization
obtaining a copy of the software and accompanying documentation covered by
this license (the "Software") to use, reproduce, display, distribute,
execute, and transmit the Software, and to prepare derivative works of the
Software, and to permit third-parties to whom the Software is furnished to
do so, all subject to the following:

The copyright notices in the Software and this entire statement, including
the above license grant, this restriction and the following disclaimer,
must be included in all copies of the Software, in whole or in part, and
all derivative works of the Software, unless such copies or derivative
works are solely in the form of machine-executable object code generated by
a source language processor.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE, TITLE AND NON-INFRINGEMENT. IN NO EVENT
SHALL THE COPYRIGHT HOLDERS OR ANYONE DISTRIBUTING THE SOFTWARE BE LIABLE
FOR ANY DAMAGES OR OTHER LIABILITY, WHETHER IN CONTRACT, TORT OR OTHERWISE,
ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER
DEALINGS IN THE SOFTWARE.
26 changes: 26 additions & 0 deletions licenses/LICENSE-xxhash.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
xxHash Library
Copyright (c) 2012-2021 Yann Collet
All rights reserved.

BSD 2-Clause License (https://www.opensource.org/licenses/bsd-license.php)

Redistribution and use in source and binary forms, with or without modification,
are permitted provided that the following conditions are met:

* Redistributions of source code must retain the above copyright notice, this
list of conditions and the following disclaimer.

* Redistributions in binary form must reproduce the above copyright notice, this
list of conditions and the following disclaimer in the documentation and/or
other materials provided with the distribution.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND
ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED
WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE
DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR
ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES
(INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES;
LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON
ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
37 changes: 11 additions & 26 deletions src/types/bloom_filter.cc
Original file line number Diff line number Diff line change
Expand Up @@ -22,9 +22,7 @@

#include "xxh3.h"

BlockSplitBloomFilter::BlockSplitBloomFilter() = default;

void BlockSplitBloomFilter::Init(uint32_t num_bytes) {
OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes) {
if (num_bytes < kMinimumBloomFilterBytes) {
num_bytes = kMinimumBloomFilterBytes;
}
Expand All @@ -38,43 +36,30 @@ void BlockSplitBloomFilter::Init(uint32_t num_bytes) {
num_bytes = kMaximumBloomFilterBytes;
}

num_bytes_ = num_bytes;
data_.resize(num_bytes_, 0);
data_view_ = data_;
std::string data(num_bytes, 0);
return {BlockSplitBloomFilter(data), std::move(data)};
}

bool BlockSplitBloomFilter::Init(const uint8_t* bitset, uint32_t num_bytes) {
StatusOr<BlockSplitBloomFilter> CreateBlockSplitBloomFilter(uint8_t* bitset, uint32_t num_bytes) {
if (num_bytes < kMinimumBloomFilterBytes || num_bytes > kMaximumBloomFilterBytes ||
(num_bytes & (num_bytes - 1)) != 0) {
return false;
return {Status::NotOK, "invalid input bitset length"};
}

num_bytes_ = num_bytes;
data_ = {reinterpret_cast<const char*>(bitset), num_bytes};
data_view_ = data_;
return true;
return BlockSplitBloomFilter({reinterpret_cast<char*>(bitset), num_bytes});
}

bool BlockSplitBloomFilter::Init(std::string bitset) {
StatusOr<BlockSplitBloomFilter> CreateBlockSplitBloomFilter(std::string& bitset) {
if (bitset.size() < kMinimumBloomFilterBytes || bitset.size() > kMaximumBloomFilterBytes ||
(bitset.size() & (bitset.size() - 1)) != 0) {
return false;
return {Status::NotOK, "invalid input bitset length"};
}

num_bytes_ = bitset.size();
data_ = std::move(bitset);
data_view_ = data_;
return true;
}

std::unique_ptr<const BlockSplitBloomFilter> BlockSplitBloomFilter::CreateReadOnlyBloomFilter(const std::string& bitset) {
return std::unique_ptr<const BlockSplitBloomFilter>(new BlockSplitBloomFilter(bitset));
return BlockSplitBloomFilter(bitset);
}

static constexpr uint32_t kBloomFilterHeaderSizeGuess = 256;

bool BlockSplitBloomFilter::FindHash(uint64_t hash) const {
const auto bucket_index = static_cast<uint32_t>(((hash >> 32) * (num_bytes_ / kBytesPerFilterBlock)) >> 32);
const auto bucket_index = static_cast<uint32_t>(((hash >> 32) * (data_.size() / kBytesPerFilterBlock)) >> 32);
const auto key = static_cast<uint32_t>(hash);
const auto* bitset32 = reinterpret_cast<const uint32_t*>(data_view_.data());

Expand All @@ -89,7 +74,7 @@ bool BlockSplitBloomFilter::FindHash(uint64_t hash) const {
}

void BlockSplitBloomFilter::InsertHash(uint64_t hash) {
const auto bucket_index = static_cast<uint32_t>(((hash >> 32) * (num_bytes_ / kBytesPerFilterBlock)) >> 32);
const auto bucket_index = static_cast<uint32_t>(((hash >> 32) * (data_.size() / kBytesPerFilterBlock)) >> 32);
const auto key = static_cast<uint32_t>(hash);
auto* bitset32 = reinterpret_cast<uint32_t*>(data_.data());

Expand Down
90 changes: 41 additions & 49 deletions src/types/bloom_filter.h
Original file line number Diff line number Diff line change
Expand Up @@ -20,8 +20,11 @@
#include <cmath>
#include <cstdint>
#include <memory>
#include <nonstd/span.hpp>
#include <string>

#include "status.h"

// Returns the smallest power of two that contains v. If v is already a
// power of two, it is returned as is.
static inline int64_t NextPower2(int64_t n) {
Expand All @@ -46,6 +49,40 @@ constexpr bool IsMultipleOf8(int64_t n) { return (n & 7) == 0; }
// This value will be reconsidered when implementing Bloom filter producer.
static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;

/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
static constexpr uint32_t kMinimumBloomFilterBytes = 32;

class BlockSplitBloomFilter;

using OwnedBlockSplitBloomFilter = std::tuple<BlockSplitBloomFilter, std::string>;

/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
/// rounded up/down to lower/upper bound if num_bytes is out of range and also
/// will be rounded up to a power of 2.
///
/// @param num_bytes The number of bytes to store Bloom filter bitset.
OwnedBlockSplitBloomFilter CreateBlockSplitBloomFilter(uint32_t num_bytes);

/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
/// bitset when the given bitset may not satisfy the 32-byte alignment requirement
/// which may lead to segfault when performing SIMD instructions. It is the caller's
/// responsibility to free the bitset passed in.
///
/// @param bitset The given bitset to initialize the Bloom filter.
/// @param num_bytes The number of bytes of given bitset.
/// @return false if the number of bytes of Bloom filter bitset is not a power of 2, and true means successfully init
StatusOr<BlockSplitBloomFilter> CreateBlockSplitBloomFilter(uint8_t* bitset, uint32_t num_bytes);

/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
/// bitset because the given bitset may not satisfy the 32-byte alignment requirement
/// which may lead to segfault when performing SIMD instructions. It is the caller's
/// responsibility to free the bitset passed in.
///
/// @param bitset The given bitset to initialize the Bloom filter.
/// @return false if the number of bytes of Bloom filter bitset is not a power of 2, and true means successfully init
StatusOr<BlockSplitBloomFilter> CreateBlockSplitBloomFilter(std::string& bitset);

/// The BlockSplitBloomFilter is implemented using block-based Bloom filters from
/// Putze et al.'s "Cache-,Hash- and Space-Efficient Bloom filters". The basic idea is to
/// hash the item to a tiny Bloom filter which size fit a single cache line or smaller.
Expand All @@ -55,44 +92,7 @@ static constexpr uint32_t kMaximumBloomFilterBytes = 128 * 1024 * 1024;
class BlockSplitBloomFilter {
public:
/// The constructor of BlockSplitBloomFilter. It uses XXH64 as hash function.
BlockSplitBloomFilter();

/// Initialize the BlockSplitBloomFilter. The range of num_bytes should be within
/// [kMinimumBloomFilterBytes, kMaximumBloomFilterBytes], it will be
/// rounded up/down to lower/upper bound if num_bytes is out of range and also
/// will be rounded up to a power of 2.
///
/// @param num_bytes The number of bytes to store Bloom filter bitset.
void Init(uint32_t num_bytes);

/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
/// bitset because the given bitset may not satisfy the 32-byte alignment requirement
/// which may lead to segfault when performing SIMD instructions. It is the caller's
/// responsibility to free the bitset passed in.
///
/// @param bitset The given bitset to initialize the Bloom filter.
/// @param num_bytes The number of bytes of given bitset.
/// @return false if the number of bytes of Bloom filter bitset is not a power of 2, and true means successfully init
bool Init(const uint8_t* bitset, uint32_t num_bytes);

/// Initialize the BlockSplitBloomFilter. It copies the bitset as underlying
/// bitset because the given bitset may not satisfy the 32-byte alignment requirement
/// which may lead to segfault when performing SIMD instructions. It is the caller's
/// responsibility to free the bitset passed in.
///
/// @param bitset The given bitset to initialize the Bloom filter.
/// @return false if the number of bytes of Bloom filter bitset is not a power of 2, and true means successfully init
bool Init(std::string bitset);

/// Create the read-only BlockSplitBloomFilter. It use the caller's bitset as underlying bitset. It is the caller's
/// responsibility to ensure the bitset would not to change.
///
/// @param bitset The given bitset for the Bloom filter underlying bitset.
/// @return the unique_ptr of the const non-owned BlockSplitBloomFilter
static std::unique_ptr<const BlockSplitBloomFilter> CreateReadOnlyBloomFilter(const std::string& bitset);

/// Minimum Bloom filter size, it sets to 32 bytes to fit a tiny Bloom filter.
static constexpr uint32_t kMinimumBloomFilterBytes = 32;
explicit BlockSplitBloomFilter(nonstd::span<char> data) : data_(data){};

/// Calculate optimal size according to the number of distinct values and false
/// positive probability.
Expand Down Expand Up @@ -156,14 +156,12 @@ class BlockSplitBloomFilter {
/// @param hash the hash of value to insert into Bloom filter.
void InsertHash(uint64_t hash);

uint32_t GetBitsetSize() const { return num_bytes_; }
uint32_t GetBitsetSize() const { return data_.size(); }

/// Get the plain bitset value from the Bloom filter bitset.
///
/// @return bitset value;
const std::string& GetData() const& { return data_; }

std::string&& GetData() && { return std::move(data_); }
std::string_view GetData() const { return {data_.data(), data_.size()}; }

/// Compute hash for string value by using its plain encoding result.
///
Expand All @@ -188,11 +186,5 @@ class BlockSplitBloomFilter {
0x705495c7U, 0x2df1424bU, 0x9efc4947U, 0x5c6bfb31U};

// The underlying buffer of bitset.
std::string data_;

// The view of data_
std::string_view data_view_;

// The number of bytes of Bloom filter bitset.
uint32_t num_bytes_;
nonstd::span<char> data_;
};
14 changes: 6 additions & 8 deletions src/types/redis_bloom_chain.cc
Original file line number Diff line number Diff line change
Expand Up @@ -20,6 +20,8 @@

#include "redis_bloom_chain.h"

#include "types/bloom_filter.h"

namespace redis {

std::string BloomChain::getBFKey(const Slice &ns_key, const BloomChainMetadata &metadata, uint16_t filters_index) {
Expand Down Expand Up @@ -52,8 +54,7 @@ rocksdb::Status BloomChain::createBloomChain(const Slice &ns_key, double error_r
metadata->base_capacity = capacity;
metadata->bloom_bytes = BlockSplitBloomFilter::OptimalNumOfBytes(capacity, error_rate);

BlockSplitBloomFilter block_split_bloom_filter;
block_split_bloom_filter.Init(metadata->bloom_bytes);
auto [block_split_bloom_filter, _] = CreateBlockSplitBloomFilter(metadata->bloom_bytes);

auto batch = storage_->GetWriteBatchBase();
WriteBatchLogData log_data(kRedisBloomFilter, {"createBloomChain"});
Expand All @@ -77,9 +78,7 @@ void BloomChain::createBloomFilterInBatch(const Slice &ns_key, BloomChainMetadat
metadata->n_filters += 1;
metadata->bloom_bytes += bloom_filter_bytes;

BlockSplitBloomFilter block_split_bloom_filter;
block_split_bloom_filter.Init(bloom_filter_bytes);
*bf_data = std::move(block_split_bloom_filter).GetData();
std::tie(std::ignore, *bf_data) = CreateBlockSplitBloomFilter(bloom_filter_bytes);

std::string bloom_chain_meta_bytes;
metadata->Encode(&bloom_chain_meta_bytes);
Expand All @@ -103,14 +102,13 @@ rocksdb::Status BloomChain::getBFDataList(const std::vector<std::string> &bf_key
}

void BloomChain::bloomAdd(const Slice &item, std::string *bf_data) {
BlockSplitBloomFilter block_split_bloom_filter;
block_split_bloom_filter.Init(std::move(*bf_data));
BlockSplitBloomFilter block_split_bloom_filter(*bf_data);

uint64_t h = BlockSplitBloomFilter::Hash(item.data(), item.size());
block_split_bloom_filter.InsertHash(h);
*bf_data = std::move(block_split_bloom_filter).GetData();
}


bool BloomChain::bloomCheck(const Slice &item, std::string &bf_data) {
std::unique_ptr<const BlockSplitBloomFilter> bloom_filter_read_only =
BlockSplitBloomFilter::CreateReadOnlyBloomFilter(bf_data);
Expand Down
Loading

0 comments on commit 5bd0f53

Please sign in to comment.