Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bloom crate is buggy and not maintained #1

Open
mfil opened this issue Jul 24, 2019 · 6 comments
Open

Bloom crate is buggy and not maintained #1

mfil opened this issue Jul 24, 2019 · 6 comments
Assignees
Labels
bug Something isn't working

Comments

@mfil
Copy link

mfil commented Jul 24, 2019

The bloom crate has a bug which was reported in 2017. There were no actions in the repo for three years.

Briefly, the bug is as follows: one needs to compute k hash values for an element added to the filter. In order to cut down the number of hash function calls, the crate computes two independent hashes and uses them as a seed for a RNG and then samples k values from it. I think that the basic idea is sound, but they botched the implementation of the RNG. See the next method for HashIter in src/hashing.rs. All RNG outputs after the first are a multiple of h2 modulo 2^64.

I'm sorry to say that I don't know a good alternative for the crate. I've been looking at several other bloom filter implementations in Rust for work and none of them inspired much confidence, so we ultimately decided to roll our own.

@david415
Copy link
Member

Thanks for the info @mfil Did you open source your company's rust bloom filter? The other possibility is to use a cuckoo filter even though I do not require deletions.

@david415 david415 self-assigned this Jul 25, 2019
@david415 david415 added the bug Something isn't working label Jul 25, 2019
@mfil
Copy link
Author

mfil commented Jul 25, 2019

We're just getting started on writing our implementation. I hope that our employer will let us open-source it when it's done but no promises.

I read a little bit about cuckoo filters and they claim better performance than bloom filters. At my job, we specifically need bloom filters for backwards compatibility, but if you have no such restrictions, cuckoo filters seem like a good choice even if you don't need deletion.

@david415
Copy link
Member

@burdges so there is a reason to use a cuckoo filter. @mfil company's who open source software after it is written do not understand open source. :( thanks for the advice. i haven't actually tried to write a mixnet in rust yet so this library isn't used given that our golang implementation works fine: https://github.com/katzenpost/server

@burdges
Copy link

burdges commented Jul 25, 2019

I believe cuckoo tables are much more cache efficient than bloom filters, so they often make sense when the table gets large.

And they're even better if you employ the overlap flag trick from 3.5-Way Cuckoo Hashing for the Price of 2-and-a-Bit by Eric Lehman and Rina Panigrahy. See also Load Thresholds for Cuckoo Hashing with Overlapping Blocks by Stefan Walzer.

I suspect cuckoo filters also reduce the "great entry api problem" rust-lang/rfcs#1769 along the lines of the indexing crate.

In essence, a cuckoo filter can likely expose the mapping from keys to some opaque Index<??> type that provides (a) an actual index into the table, (b) the bitmask XORed with the index to find the other index, (c) the overlap flag when known, and (d) any additional key bits. This works because indexes into a cuckoo table remain semi-stable.

I'd maybe want roughly

enum IndexState {
    NotFound,
    FoundFirst,
    FoundNext,
}
pub struct Index {
    /// Index of the bucket where the entry resides or else of either bucket where the entry could be placed.
    index: u32,
    /// The entry's first index xored with its second index
    mask: u32,
    /// Along with mask these bits identify the entry with an error probability at least 2^{-32}
    extra: u32,
    /// If the entry exists at index and if so if it was shifted into the next bucket.
    found: IndexState,
}

except if someone gets two for the same key then found might wind up set wrong. You'd might omit it or make it only a hint, and just check all four buckets each time.

If one resolves the found problem, then it's still an imperfect interface however because rust lacks true dependent types, so options go roughly:

  • use a closure like the scope fn in the indexing crate,
  • check Rc/Arcs at runtime using ptr_eq,
  • introduce some secondary immutable borrow along side the mutable borrow of the table,
  • trust users not to confuse indexes between tables, but maybe provide the entry API inside the index one .

Anyways, I never implemented this due to perfectionism, but actually the final option might give a really beautiful low-level interface.

As for Go, there were PIR projects presented at PETS 2018 that used cuckoo tables and maybe used Go. I know they never knew about the overlapping flag trick, but that's premature optimization really, like everything I wrote above. ;)

@david415
Copy link
Member

my approach is of course to take some existing rust library, probably a cuckoo filter and just use that. the only requirement is that it can be tuned such that the probability of a false positive is negligible for the mix network key rotation time window.... obviously. however at this time there's no particular reason to do this in rust given that the golang implementation works fine. i will however endeavor to write a rust client for the mix network... and in that case a rust sphinx replay cache is not needed. :)

@burdges
Copy link

burdges commented Jul 26, 2019

There are fewer compromises on tuning the false positive probability on a cuckoo filter than on a bloom filter because you can increase the extra bits, which increases size but in a fairly controlled way, while a bloom filter requires either adding hash functions or increasing size.

It's not actually "negligible" you want but just very small. I'd think even 2^{-32} sounds sufficient, but surely 2^{-64} works

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants