-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: add cuckoofilter
to nippy-jar
#4533
Conversation
impl PartialEq for Cuckoo { | ||
fn eq(&self, _other: &Self) -> bool { | ||
self.remaining == _other.remaining && | ||
{ | ||
#[cfg(not(test))] | ||
{ | ||
unimplemented!("No way to figure it out without exporting (expensive), so only allow direct comparison on a test") | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just wondering if this is something where we could put the PartialEq impl to be just the remaining check, and do a conditional compilation of Eq? maybe not
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hm, i feel like that's unsafe behaviour..
#[derive(Debug, Serialize, Deserialize, PartialEq)] | ||
pub enum Filters { | ||
Cuckoo(Cuckoo), | ||
// Avoids irrefutable let errors. Remove this after adding another one. | ||
Unused, | ||
} | ||
|
||
impl Filter for Filters { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the goal of this given we have the trait? do you think we'll use cuckoo somewhere and bloom or no filter elsewhere?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
might add bloom later on to benchmark against
vec![3, 17], | ||
vec![3, 18], | ||
]; | ||
let (col1, col2) = test_data(); | ||
let num_rows = col1.len() as u64; | ||
let _num_columns = 2; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit
let _num_columns = 2; | |
let num_columns = 2; |
phf: bool, | ||
} | ||
|
||
impl NippyJar { | ||
/// Creates new [`NippyJar`]. | ||
pub fn new(columns: usize, bloom_filter: bool, phf: bool, path: &Path) -> Self { | ||
pub fn new(columns: usize, phf: bool, path: &Path) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit would make this with_phf(bool)
if we expect it to be false most of the time
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
overall (parent PR included) it's still a draft/wip, it will be removed eventually and replaced with the with_phf(....)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm
@@ -207,6 +227,14 @@ pub enum NippyJarError { | |||
ColumnLenMismatch(usize, usize), | |||
#[error("UnexpectedMissingValue row: {0} col:{1}")] | |||
UnexpectedMissingValue(u64, u64), | |||
#[error("err")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we want to do #[error(transparent)]
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, one nit
PR into #4512
Adds cuckoofilter. There's some docs in the link on why it might be better than bloom filter.
This will be used to try and check what transaction or block hashes are included in the snapshot file (there is a 3% false positive rate), without it being necessarily saved in the data.