-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Paired Block Bloom Filter Algorithm #29
Comments
Specific tests:
Action for now, create the baseline on main for 1,2,4 |
Additional tests |
The flag that sets the filter type in db_bench is filter_uri. |
@erez-speedb I have pushed the branch rebased on latest main. |
./db_bench --compression_type=None -db=/data/ -num=80000000 -value_size=1000 -key_size=16 --delayed_write_rate=536870912 -report_interval_seconds=1 -max_write_buffe failure creating filter policy[spdb.PairedBloomFilter::23.4]: Not implemented: Could not load FilterPolicy: spdb.PairedBloomFilter::23.4 |
@erez-speedb - Sorry, my mistake in the example. There should be a single ':' not '::' |
Rerunning tests |
One thing that needs attention:
|
Blocked by #101. |
Didn't show an improvement with #101, so we need to define a good test to show the value of the feature. |
QA passed on 4cf14cb |
as part of - Speedb's Paired Block Bloom (#29)
as part of - Speedb's Paired Block Bloom (#29)
as part of - Speedb's Paired Block Bloom (#29)
as part of - Speedb's Paired Block Bloom (#29)
as part of - Speedb's Paired Block Bloom (#29)
Why :
Reduce false positives rate while using the same amount of memory.
What:
Develop a filter which is fast and low on CPU consumption on the one hand, but with a better memory footprint- FPR trade-off on the other hand.
Technical detail:
In the traditional bloom filter there is a tradeoff between memory usage and performance. Rocksdb blocked bloom filter takes less time but consumes extra memory.
Ribbon filter, on the other hand, takes ~30% less memory but is much slower than the bloom filter (factor of 4).
The idea is to improve bloom filter in both memory consumption and keep it high performant.
Who:
The proposed filter should be most beneficial when there is a need for a very small FPR. Typically this happens when the penalty of a false positive is very big compared to the filter test time (database on the disk), and when true positives are rare.
Integrate a new type of filter policy: Paired Block Bloom Filter
The text was updated successfully, but these errors were encountered: