-
Notifications
You must be signed in to change notification settings - Fork 209
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Question related to union performance #326
Comments
@Cheappie However, this particular characterization test held the number of sketches to be merged as a constant (32) and varied the total number of uniques equally distributed across the 32 sketches. There are numerous other ways to construct a merge speed test. I would suggest constructing a test that most closely resembles how merges are actually performed in your application. That being said, your particular test includes the time to create the union in addition to the merges and executes only two merges. To reduce the influence of the union construction I would recommend doing a lot more than just 2 update operations. In our back-end systems, a single union might be merging thousands to millions of sketches. Also, it is not typical for all sketches to have the same cardinality, which we assumed in our test. Your test used two different cardinalities. In the environments we encounter the sketch cardinalities tend to vary with a power-law distribution where only a few sketches are very large and then millions of sketches have very small cardinalities. Unfortunately, constructing meaningful tests for this kind of distribution is challenging because there are so many variables. Nonetheless, the reason this is interesting is the Theta union operation has an "early-stop" feature that should speed up the unioning process considerably as the number of sketches in the merge loop gets large. This might change the ordering of the performance ranking. Constructing such a test is something we have been wanting to do but haven't yet gotten around to. :) |
@leerho |
Hi, I just wanted to ask whether CpcSketch is fastest Sketch for performing union ? And whether is there other way to speed up unions than decreasing logK param ?
`
`
The text was updated successfully, but these errors were encountered: