Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: add cuckoofilter to nippy-jar #4533

Merged
merged 9 commits into from
Sep 11, 2023
Merged

Conversation

joshieDo
Copy link
Collaborator

@joshieDo joshieDo commented Sep 8, 2023

PR into #4512

Adds cuckoofilter. There's some docs in the link on why it might be better than bloom filter.

This will be used to try and check what transaction or block hashes are included in the snapshot file (there is a 3% false positive rate), without it being necessarily saved in the data.

@joshieDo joshieDo added C-enhancement New feature or request A-static-files Related to static files labels Sep 8, 2023
@joshieDo joshieDo requested a review from rakita as a code owner September 8, 2023 14:32
@joshieDo joshieDo mentioned this pull request Sep 8, 2023
Comment on lines +51 to +58
impl PartialEq for Cuckoo {
fn eq(&self, _other: &Self) -> bool {
self.remaining == _other.remaining &&
{
#[cfg(not(test))]
{
unimplemented!("No way to figure it out without exporting (expensive), so only allow direct comparison on a test")
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

just wondering if this is something where we could put the PartialEq impl to be just the remaining check, and do a conditional compilation of Eq? maybe not

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hm, i feel like that's unsafe behaviour..

crates/storage/nippy-jar/src/filter/cuckoo.rs Outdated Show resolved Hide resolved
Comment on lines +15 to +22
#[derive(Debug, Serialize, Deserialize, PartialEq)]
pub enum Filters {
Cuckoo(Cuckoo),
// Avoids irrefutable let errors. Remove this after adding another one.
Unused,
}

impl Filter for Filters {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the goal of this given we have the trait? do you think we'll use cuckoo somewhere and bloom or no filter elsewhere?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

might add bloom later on to benchmark against

vec![3, 17],
vec![3, 18],
];
let (col1, col2) = test_data();
let num_rows = col1.len() as u64;
let _num_columns = 2;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit

Suggested change
let _num_columns = 2;
let num_columns = 2;

phf: bool,
}

impl NippyJar {
/// Creates new [`NippyJar`].
pub fn new(columns: usize, bloom_filter: bool, phf: bool, path: &Path) -> Self {
pub fn new(columns: usize, phf: bool, path: &Path) -> Self {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit would make this with_phf(bool) if we expect it to be false most of the time

Copy link
Collaborator Author

@joshieDo joshieDo Sep 8, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall (parent PR included) it's still a draft/wip, it will be removed eventually and replaced with the with_phf(....)

Copy link
Member

@rkrasiuk rkrasiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm

crates/storage/nippy-jar/src/filter/cuckoo.rs Outdated Show resolved Hide resolved
@@ -207,6 +227,14 @@ pub enum NippyJarError {
ColumnLenMismatch(usize, usize),
#[error("UnexpectedMissingValue row: {0} col:{1}")]
UnexpectedMissingValue(u64, u64),
#[error("err")]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we want to do #[error(transparent)] here?

@joshieDo joshieDo merged commit c8f6307 into joshie/nippy-jar Sep 11, 2023
20 checks passed
@joshieDo joshieDo deleted the joshie/nippy-jar-bf branch September 11, 2023 11:30
Copy link
Collaborator

@shekhirin shekhirin left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, one nit

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-static-files Related to static files C-enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants