Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Paired Block Bloom Filter Algorithm #29

Closed
udi-speedb opened this issue Jun 26, 2022 · 14 comments · Fixed by #54
Closed

Paired Block Bloom Filter Algorithm #29

udi-speedb opened this issue Jun 26, 2022 · 14 comments · Fixed by #54
Assignees
Labels
enhancement New feature or request
Milestone

Comments

@udi-speedb
Copy link
Contributor

udi-speedb commented Jun 26, 2022

Why :

Reduce false positives rate while using the same amount of memory.

What:

Develop a filter which is fast and low on CPU consumption on the one hand, but with a better memory footprint- FPR trade-off on the other hand.

Technical detail:

In the traditional bloom filter there is a tradeoff between memory usage and performance. Rocksdb blocked bloom filter takes less time but consumes extra memory.

Ribbon filter, on the other hand, takes ~30% less memory but is much slower than the bloom filter (factor of 4).

The idea is to improve bloom filter in both memory consumption and keep it high performant.

Who:

The proposed filter should be most beneficial when there is a need for a very small FPR. Typically this happens when the penalty of a false positive is very big compared to the filter test time (database on the disk), and when true positives are rare.

Integrate a new type of filter policy: Paired Block Bloom Filter

@udi-speedb udi-speedb self-assigned this Jun 26, 2022
@udi-speedb udi-speedb added the enhancement New feature or request label Jun 26, 2022
@udi-speedb udi-speedb linked a pull request Jul 13, 2022 that will close this issue
@erez-speedb
Copy link

Specific tests:

  1. Standard tests with 24BPK on both main and the branch.
  2. Filters fast and ribbon.
  3. Paired compare to the best from before.
  4. Compare 8 BPK to 24 on main.

Action for now, create the baseline on main for 1,2,4

@erez-speedb erez-speedb self-assigned this Jul 13, 2022
@erez-speedb
Copy link

Additional tests
bloom + pair
Worse case scenario,
DB in cache, all keys exists.
Test: Fillup sequential, random reads.
Best case,
DB not in cache, get before write
Test TBD
Fillup n keys, random reads 10000X keys. -- small obj.
Overwrite without filllup?

@udi-speedb
Copy link
Contributor Author

udi-speedb commented Jul 17, 2022

The flag that sets the filter type in db_bench is filter_uri.
Paired bloom (new): -filter_uri spdb.PairedBloomFilter:BPK (e.g., -filter_uri spdb.PairedBloomFilter:23.4)
Fast Local Bloom: -filter_uri rocksdb.internal.FastLocalBloomFilter:BPK
Ribbon: -filter_uri rocksdb.internal.Standard128RibbonFilter:BPK

@udi-speedb
Copy link
Contributor Author

@erez-speedb I have pushed the branch rebased on latest main.
Please go ahead with the basic performance tests we have agreed upon. Thanks

@erez-speedb
Copy link

./db_bench --compression_type=None -db=/data/ -num=80000000 -value_size=1000 -key_size=16 --delayed_write_rate=536870912 -report_interval_seconds=1 -max_write_buffe
r_number=0 -histogram -duration=900 --use_existing_db -threads=50 -seek_nexts=100 -report_file=seekrandomwriterandom.csv -benchmark_read_rate_limit=0 -benchmark_write_rate_limit=0 --benchmarks=seekrandomwriterandom -filter_uri=spdb.PairedBloomFilter::23.4 -readwritepercent=95

failure creating filter policy[spdb.PairedBloomFilter::23.4]: Not implemented: Could not load FilterPolicy: spdb.PairedBloomFilter::23.4

@udi-speedb
Copy link
Contributor Author

@erez-speedb - Sorry, my mistake in the example. There should be a single ':' not '::'
-filter_uri=spdb.PairedBloomFilter:23.4

@erez-speedb
Copy link

Rerunning tests

@udi-speedb
Copy link
Contributor Author

One thing that needs attention:
The performance of the filter is heavily affected by the availability of AVX2 support in the processor.

  1. We must make sure to compare rocksdb / us on platforms that have the same AVX2 support.
  2. We may wish to compare rocksdb / us with AVX2 and without AVX2.
  3. The beneficial use case is when using AVX2 so this is a requirement for the best case scenario.

@isaac-io
Copy link
Contributor

isaac-io commented Aug 2, 2022

Blocked by #101.

@isaac-io
Copy link
Contributor

Didn't show an improvement with #101, so we need to define a good test to show the value of the feature.

@erez-speedb
Copy link

Running the test on a single HDD (simulating disk as bottleneck) and with DB size larger than RAM
Showed clear benefit

Image

With no additional memory usage, compare to the default bloom with the same BPK

@isaac-io
Copy link
Contributor

Depends on #71 and on #123.

@isaac-io isaac-io added this to the v2.1.0 milestone Sep 21, 2022
@Yuval-Ariel
Copy link
Contributor

QA passed on 4cf14cb

@erez-speedb
Copy link

Pass performance tests.

Image

Yuval-Ariel added a commit that referenced this issue May 3, 2023
as part of - Speedb's Paired Block Bloom (#29)
Yuval-Ariel pushed a commit that referenced this issue May 4, 2023
Yuval-Ariel added a commit that referenced this issue May 4, 2023
as part of - Speedb's Paired Block Bloom (#29)
udi-speedb added a commit that referenced this issue Nov 13, 2023
udi-speedb pushed a commit that referenced this issue Nov 14, 2023
as part of - Speedb's Paired Block Bloom (#29)
udi-speedb pushed a commit that referenced this issue Nov 15, 2023
as part of - Speedb's Paired Block Bloom (#29)
udi-speedb added a commit that referenced this issue Dec 3, 2023
udi-speedb pushed a commit that referenced this issue Dec 5, 2023
as part of - Speedb's Paired Block Bloom (#29)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
Status: ✅ Shipped
Development

Successfully merging a pull request may close this issue.

5 participants