Fast Java cuckoo filter implementation that utilizes sun.misc.Unsafe for native memory access.
Probabilistic Filters By Example
<dependency>
<groupId>net.cinnom</groupId>
<artifactId>nano-cuckoo</artifactId>
<version>3.0.2</version>
</dependency>
import net.cinnom.nanocuckoo.NanoCuckooFilter;
public class CuckooTest {
public void generalUsageTest() {
long capacity = 32;
// Use Builder to create a NanoCuckooFilter. Only required parameter is capacity.
final NanoCuckooFilter cuckooFilter = new NanoCuckooFilter.Builder( capacity )
.withCountingEnabled( true ) // Enable counting
.build();
String testValue = "test value";
// Insert a value into the filter
cuckooFilter.insert( testValue );
// Check that the value is in the filter
boolean isValueInFilter = cuckooFilter.contains( testValue ); // Returns true
// Check that some other value is in the filter
isValueInFilter = cuckooFilter.contains( "some other value" ); // Should return false, probably
// Insert the same value a couple more times
cuckooFilter.insert( testValue );
cuckooFilter.insert( testValue );
// Get a count of how many times the value is in the filter
int insertedCount = cuckooFilter.count( testValue ); // Returns 3 since we inserted three times with counting enabled
// Delete value from the filter once
boolean wasDeleted = cuckooFilter.delete( testValue ); // Returns true since a value was deleted
// Try to delete the value up to six more times
int deletedCount = cuckooFilter.delete( testValue, 6 ); // Returns 2 since only two copies of the value were left
isValueInFilter = cuckooFilter.contains( testValue ); // Returns false since all copies of the value were deleted
// Double filter capacity by doubling entries per bucket. However, this also roughly doubles max FPP.
cuckooFilter.expand();
// Close the filter when finished with it. This is optional to immediately free memory used by the filter.
// Otherwise, the memory will be freed when the GC gets around to it.
cuckooFilter.close();
}
}
Standard Java serialization is supported. Any custom implementations for StringEncoder, BucketHasher, FingerprintHasher, or RandomInt should be serializable, or filter serialization will fail.
In addition, readMemory and writeMemory can be used to directly dump and replace filter memory. The source and destination filters for readMemory and writeMemory should be built with the same parameters.
See NanoCuckooFilter.Builder documentation for various settings.
Currently, only String, byte[], and long (pre-hashed value) types can be inserted into the filter. Generic type support may be added later, but I generally recommend serializing data yourself for optimal performance.
Multithreaded insert/delete/contains/count are supported. Bucket expansion and serialization (including readMemory and writeMemory) will lock all buckets until they are finished.