Skip to content

A binary based, very space efficient Bloom filter implementation for PHP.

Notifications You must be signed in to change notification settings

deceze/BloomFilter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

5 Commits
 
 
 
 

Repository files navigation

Space efficient binary Bloom filter

A really space efficient Bloom filter implementation. The bit field is stored in a string and operations are done bitwise. This has a slight performance penalty when compared to other array-based implementations, but saves a lot of space.

Initialization (adding values to the set) takes roughly twice as long as array based BloomFilter implementations, but lookup times are virtually identical. Memory usage differs by several orders of magnitude, peak memory usage differs by about one order of magnitude. The serialized size differs by several orders of magnitude.

When serializing an instance using serialize(), the resulting string will contain non-printable characters! The serialized representation must be handled using binary safe functions. The serialized size is roughly $m/8 bytes.

An alternative 7-bit ASCII-safe representation can be obtained by casting the object to a string and unserialized using BloomFilter::unserializeFromStringRepresentation. This ASCII-safe representation takes about 33% more space than the straight serialized version.

Usage

$bf = new BloomFilter(40000, 14);
$bf->add('foo');
$bf->add('bar');
$bf->add('baz');

if ($bf->maybeInSet('foo')) …

Instead of guesstimating the second constructor argument (k, number of hash functions) manually, use BloomFilter::constructForTypicalSize by specifying a bit field size and the expected number of values:

// a 5 KiB instance for ~2000 values
$bf = BloomFilter::constructForTypicalSize(40000, 2000);

The main advantage of this BloomFilter class is that it can be efficiently serialized, for example to be used for pre-computed sets. Two options for serialization:

$serialized = serialize($bf);
$bf = unserialize($serialized);

Using the standard PHP serialization facilities will result in a non-ASCII-safe binary representation which can quickly be dumped to and read from a file, for example.

If a 7-bit ASCII-safe representation is necessary, cast the object to a string and unserialize it using BloomFilter::unserializeFromStringRepresentation:

$serialized = (string)$bf;
$bf = BloomFilter::unserializeFromStringRepresentation($serialized);

Requirements

  • PHP 5.3+

No special extensions necessary.

Licence

Provided as is without any guarantees, do with it whatever you like.

About

A binary based, very space efficient Bloom filter implementation for PHP.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages