-
-
Notifications
You must be signed in to change notification settings - Fork 316
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
r.quantile/r.stats.quantile/libstats: fix quantile algorithm #2108
Conversation
I tested using the example in
Changes appear in the 5th and 6th decimal positions. Is that ok, @metzm ? After exporting the raster (as we still don't have rgrass for grass8) and testing in R, I get:
but it's rounded to the 5th decimal by default. In any case, I'm in favor of consistency with other software packages, so +1 for this change. |
Yes, changes are expected, particularly for low and high quantiles, and particularly for small samples. Importantly, the 9 different algorithms listed in literature and software implementations produce different results in these cases. The 50% percentile should be identical, however. That was a real bug in |
A simple example with a list of 10 sorted values and their indices (zero-based ranks):
The 50% percentile would produce a split between the 5 lowest values 1, 2, 3, 4, 5 and the 5 highest values 6, 7, 8, 9, 10, the correct result is |
One test is failing, fixed with attached diff (I don't know how to commit against your PR?):
|
I am busy updating the tests for |
TODO: backport c700a8a to G 8.0.1 |
* use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
Backport done. |
…o#2108) * use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
…o#2108) * use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
…o#2108) * use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`
Hyndman and Fan (1996) (https://doi.org/10.2307/2684934) list 9 different algorithms to the corresponding rank of a sorted list for a given quantile. The algorithm used in GRASS is not listed. Therefore I decided to use the algorithm type 7 also used by R and numpy. In R, see
?quantile
for a description of the different algorithms.There was an independent bug in
r.quantile
andr.stats.quantile
: if the corresponding rank belonged to the last entry of a slot, the result was wrong because the first value of the next slot needs to be used to calculate the correct value for the given quantile.