-
Notifications
You must be signed in to change notification settings - Fork 227
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simple alternate algorithm using maxima, ranks and fixed cumulative weighting #71
Comments
I haven't looked at this enough yet, but here are some thoughts:
|
Thanks for your time and the pointers since I am not familiar with the literature. min/max vs centroidsI don't follow your point but in any case, the min are not used and the max are interpolated. I think this makes it it is much closer to t-Digest than to the Greenwald-Khanna since it only uses a single interpolated value. It interpolates the rank instead of the value. I also have a hunch that data is better used by increasing the number of buckets and overall resolution instead of keeping min values. weightingThe relative target weighting of buckets is fixed at the onset. Any distribution can be used and this is the current implementation: Assuming that the bucket error is proportional to the bucket weight and that we want every bucket to have similar kind of relative error:
This relative error approach is actually much closer to t-Digest than Greenwald-Khanna where the later uses absolute error instead. insertionSince the relative target bucket weight are fixed at the onset, there is relatively little cost to merging values and no buffer is required. Although the target weights need to be stored, they can be shared across many instances of the same size.
I guess I could take some of the unit-test cases and check where/when the error profile passes or fails. |
I'm closing this since it is not an issue and was simply open to share ideas. |
One last comment is that inverting the axes might be a very clever thought. The point is that a rational approximation for sin(k) is super easy while On Tue, Sep 20, 2016 at 1:45 AM, hugo notifications@github.com wrote:
|
Got down a rabbit hole, came out with this. It is a small part of a bigger project in javascript but some ideas might be of use to others. Sorry no Java.
In short, it flips around the value-rank compression approach. Instead of having approximated weighted-value centroids for a given rank, this algorithm uses actual values as-is but with approximated ranks.
maximum value and rank instead of weighted value centroids:
F(x) = P(X <= x)
fixed size, simple array, no tree structure,
There is already a number of tests and benchmarks but it is still a proof of concept at this time since it is still fairly new. I can elaborate more if there is any interest. (unless it already exists somewhere?)
The text was updated successfully, but these errors were encountered: