-
Notifications
You must be signed in to change notification settings - Fork 226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Inverse quantile algorithm is non-contiguous #72
Comments
I think that this algorithm doesn't account for situations with small Also, there are two approaches for thinking about quantiles in a q-digest The second approach would be to consider the samples to be uniform between Which approach is better is not clear to me. On Mon, Sep 19, 2016 at 5:29 PM, Alexander Sedov notifications@github.com
|
That’s all fine and dandy, but your formulae don’t reflect your approach of it being uniform between centroids, as you still center it on the centroid.
|
On Tue, Sep 20, 2016 at 1:56 PM, Alexander Sedov notifications@github.com
non-monotonic is not the intent. Need to check that. |
Alex, The latest implementation is considerable better behaved. I am working towards a release soon and will include a test for your pathology. The new quantile/cdf algorithm uses the following diagram as an intuitive basis: The fundamental idea here is that the solid red line represents our desired result. At the extremes, we interpolate between the first (or last) centroid and the recorded min and max values ever seen. It is assumed that each centroid is collocated with the median of the original data for the centroid and that the data is uniformly distributed between the centroids. By definition, this form should give monotonic functions for both x -> quantile and quantile -> x. |
Not only it gives strange results at the end (and indeed, python implementation fixes that; sorry, not fluent at Java), but also the ranges that are covered as q goes up are non-contiguous.
I propose the following changes:
This has the property of being contiguous and returning precise values for$q = 0, 1$ .
The text was updated successfully, but these errors were encountered: