Skip to content
This repository has been archived by the owner on Jul 7, 2020. It is now read-only.

BloomFilter doesn't work when amount of bits > 2^32 #163

Open
ajtkulov opened this issue May 7, 2020 · 4 comments
Open

BloomFilter doesn't work when amount of bits > 2^32 #163

ajtkulov opened this issue May 7, 2020 · 4 comments

Comments

@ajtkulov
Copy link

ajtkulov commented May 7, 2020

In my live case, we need BloomFilter for a bigger amount (about 4-5Gb ram, >32B bits)

The code is dependent on java.util.BitSet with .ctor

public BitSet(int nbits) with a lot of getter/setter by int index.

@abramsm
Copy link
Contributor

abramsm commented May 8, 2020

Can you partition your data in some way so that you can than use a list of n smaller BloomFilters to hold each chunk of data?

@ajtkulov
Copy link
Author

ajtkulov commented May 8, 2020

it would work, but as a drawback, you have to query each chunk for contains method. So, complexity increases by N (chunk amount). Add method also becomes more sophisticated.

Actually, I have a "solution" (I'm not a java native, so excuse me in advance). There is a fork, there is a lot of copy-paste code: own BitSet with long type and BloomFilter hierarchy.
https://github.com/ajtkulov/stream-lib
It works, but I'm not sure that this is idiomatic for java and enough for PR.

@vitusya
Copy link

vitusya commented May 9, 2020

why complexity increase by N chunks? you need partition your data by hash, index_of_bloom = long_hash % number_of_chunks, complexity is same, memory complexity is num_chunks * bloomfilter_memory

@ajtkulov
Copy link
Author

ajtkulov commented May 9, 2020

@vitusya agree, thank you, it's much much easier.

The only theoretical issue is that (in case you build Bloom from dynamic data) some malicious person could generate items related to the same bucket. But it's related to all hashes structures, and it's a theory, not a practical thing.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants