Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PFADD, PFCOUNT, PFMERGE commands #211

Merged
merged 10 commits into from
Jun 16, 2021
Merged

Conversation

ilbaktin
Copy link
Contributor

Hi there! We're using your awesome miniredis library for redis-related unit tests in the company I'm working at. Once we started using HLL-related commands of redis, we also definitely wanted to cover the corresponding logic with unit tests but unfortunately found out that HLL-related commands are not supported in miniredis.

I'm adding HLL-related commands to miniredis with this PR using one quite popular implementation of HLL in golang https://github.com/axiomhq/hyperloglog but not reimplementing the actual C++ implementation of HLL https://download.redis.io/redis-stable/src/hyperloglog.c. Why I did this choice - in fact the miniredis is mainly used for testing purposes and hence it's not necessary to have the 100% compatibility of all the internal details, it's enough to have the compatibility of the protocol and the functionality. Basically the implementation of HLL used in this PR behaves almost the same as redis' variant of HLL. One potential difference I could think of is that the set cardinality estimation accuracy could slightly differ on big sets (containing over 1M of elements), but anyway I'm not sure that somebody plans to have sets of over 1M cardinality in unit tests. Please let me know what you think about it.

Also one arguable question - not sure if it's OK for you that I'm adding a new dependency. I could have added the implementation of HLL explicitly as it's been done for geohash library.

Thanks, please let's discuss all the things. If you have any concern about what might be improved, please let me know.

@alicebob
Copy link
Owner

alicebob commented May 31, 2021 via email

@alicebob
Copy link
Owner

alicebob commented Jun 2, 2021

Thanks, I had a look and the code looks nice and simple. I never used the PF* redis commands, but they are one of the few which do not have a ton of options and arguments and special cases :)

slightly differ on big sets

That's absolutely totally fine :)

About the included hyperloglog library: it's MIT licensed, so how about we add the relevant files to a subdir in miniredis? (hyperloglog.go, registers.go, LICENSE ?). It's straightforward code which I don't expect to change a lot, and saves a dependency.

I assume we can't avoid the murmur dependency. What would happen if we use another hashing function?

Can you add some tests to the ./integration/ subdir? If the numbers don't match up 100% you can use c.DoLoosly() which still compares structures (or maybe c.DoRounded()). I'm happy to help there if you want.

@ilbaktin
Copy link
Contributor Author

ilbaktin commented Jun 3, 2021

Thanks for looking into the PR!

No problem, I can add relevant files from hyperloglog lib to miniredis subdir.

Regarding a hash function - before using Redis HLL commands we tried using some golang HLL lib (several options) and store these HLLs in a database. Trying golang HLL libs I found out that the most of them have better accuracy if I use murmur as a hash function. But again in tests we don't need better accuracy (and moreover HLL lib that I used in this PR has a default hash func - metro hash which is still not bad) so we can get rid of murmur I think.

And regarding integration tests - no problem, I will take a look and try to handle.

Will handle these changes when I have a chance, presumably on the weekend.

@alicebob
Copy link
Owner

alicebob commented Jun 3, 2021 via email

@ilbaktin
Copy link
Contributor Author

ilbaktin commented Jun 3, 2021

Update: metro lib is a dependency of hyperloglog lib, but metro lib is under MIT license, so we probably can take the necessary hash function right from metro lib to miniredis.

@ilbaktin
Copy link
Contributor Author

@alicebob Hi again!

Added a few changes as discussed before:

  1. Got rid of hyperloglog and murmur dependencies. Instead I added hyperloglog's and metro's hash source code directly into miniredis repo since they're under MIT license.
  2. Added integration tests for HLL related commands (and found a small bug in response message meanwhile). I handled unstable approximate nature of HLLs in integration tests with a new function DoApprox() comparing ints with a threshold.

Again, let me know what you think!

@@ -481,6 +482,48 @@ func (c *client) DoRounded(rounded int, cmd string, args ...string) {
}
}

// result must match, with floats rounded
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a bit outdated :)

@alicebob
Copy link
Owner

Looks great! I'll have a more proper look in a few days, but I don't expect any issues. Really nice it worked out without adding dependencies.

@alicebob alicebob merged commit 1cd723a into alicebob:master Jun 16, 2021
@alicebob
Copy link
Owner

Thanks for the very easy PR :)

I merged it to master, will make it a proper release in a few days or so.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants