r.quantile/r.stats.quantile/libstats: fix quantile algorithm #2108

metzm · 2022-01-19T17:35:30Z

Hyndman and Fan (1996) (https://doi.org/10.2307/2684934) list 9 different algorithms to the corresponding rank of a sorted list for a given quantile. The algorithm used in GRASS is not listed. Therefore I decided to use the algorithm type 7 also used by R and numpy. In R, see ?quantile for a description of the different algorithms.

There was an independent bug in r.quantile and r.stats.quantile: if the corresponding rank belonged to the last entry of a slot, the result was wrong because the first value of the next slot needs to be used to calculate the correct value for the given quantile.

veroandreo · 2022-01-20T14:26:24Z

I tested using the example in r.quantile manual:

# grass8.1dev
r.quantile input=elevation percentiles=0.1,1,10,25,50,75,90,99,99.9
Computing quantiles
0:0.100000:59.870384
1:1.000000:68.065155
2:10.000000:84.523346
3:25.000000:94.789986
4:50.000000:108.879921
5:75.000000:126.791985
6:90.000000:138.660385
7:99.000000:149.824585
8:99.900000:153.375854

# mm-branch
Computing quantiles
0:0.100000:59.870381
1:1.000000:68.065153
2:10.000000:84.523341
3:25.000000:94.789986
4:50.000000:108.879906
5:75.000000:126.791973
6:90.000000:138.660303
7:99.000000:149.824419
8:99.900000:153.375565

Changes appear in the 5th and 6th decimal positions. Is that ok, @metzm ?

After exporting the raster (as we still don't have rgrass for grass8) and testing in R, I get:

quantile(elev, probs=c(0.1,1,10,25,50,75,90,99,99.9)/100, type=7)
     0.1%        1%       10%       25%       50%       75%       90%       99% 
 59.87038  68.06515  84.52334  94.78999 108.87991 126.79197 138.66030 149.82442 
    99.9% 
153.37556

but it's rounded to the 5th decimal by default. In any case, I'm in favor of consistency with other software packages, so +1 for this change.

metzm · 2022-01-20T17:47:30Z

Yes, changes are expected, particularly for low and high quantiles, and particularly for small samples. Importantly, the 9 different algorithms listed in literature and software implementations produce different results in these cases. The 50% percentile should be identical, however. That was a real bug in lib/stats/c_percentile.c.

* adjust test for r.neighbors * sync r.stats quantile to r.quantile

metzm · 2022-01-20T18:38:00Z

A simple example with a list of 10 sorted values and their indices (zero-based ranks):

value:  1  2  3  4  5  6  7  8  9  10
index:  0  1  2  3  4  5  6  7  8   9

The 50% percentile would produce a split between the 5 lowest values 1, 2, 3, 4, 5 and the 5 highest values 6, 7, 8, 9, 10, the correct result is 5 * 0.5 + 6 * 0.5 = 5.5.
The previous code with n * quant, thus the index 10 * 0.5 = 5 would return the value 6 for the 50% percentile.
The new code with quant * (n - 1) correctly uses the index 4.5 to return the value 5.5 for the 50% percentile.

neteler · 2022-01-20T20:40:53Z

One test is failing, fixed with attached diff (I don't know how to commit against your PR?):
test_r_neighbors.zip

Running ./raster/r.neighbors/testsuite/test_r_neighbors.py...
========================================================================
...F...
======================================================================
FAIL: test_standard_options_quantile (__main__.TestNeighbors)
Test output with standard options (size=3, quadratic moving window,
----------------------------------------------------------------------
Traceback (most recent call last):
  File "raster/r.neighbors/testsuite/test_r_neighbors.py", line 480, in test_standard_options_quantile
    self.check_univar(test_case)
  File "raster/r.neighbors/testsuite/test_r_neighbors.py", line 429, in check_univar
    precision=1e-5,
  File "etc/python/grass/gunittest/case.py", line 308, in assertRasterFitsUnivar
    precision=precision,
  File "etc/python/grass/gunittest/case.py", line 282, in assertModuleKeyValue
    self.fail(self._formatMessage(msg, stdMsg))
AssertionError: r.univar map=test_standard_options_quantile_raster_0.03 percentile=90.0 separator== -g difference:
mismatch values (key, reference, actual): [('coeff_var', 18.5370455239192, 18.5383334276472), ('max', 156.208636016846, 156.20787109375), ('mean', 109.552850010781, 109.544974331288), ('mean_of_abs', 109.552850010781, 109.544974331288), ('min', 55.600062789917, 55.5965177536011), ('range', 100.608573226929, 100.611353340149), ('stddev', 20.3078616792495, 20.3078125947647), ('sum', 221844521.271832, 221828573.020859), ('variance', 412.409245983529, 412.407252384084)]
command: r.univar map=test_standard_options_quantile_raster_0.03 percentile=90.0 separator== -g {'map': 'test_standard_options_quantile_raster_0.03', 'separator': '=', 'flags': 'g'}

----------------------------------------------------------------------
Ran 7 tests in 24.097s
FAILED (failures=1)
========================================================================
FAILED ./raster/r.neighbors/testsuite/test_r_neighbors.py (1 test failed)

metzm · 2022-01-20T21:28:05Z

I am busy updating the tests for r.neighbors, WIP.

* update test results

metzm · 2022-01-21T18:45:57Z

TODO: backport c700a8a to G 8.0.1

* use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`

neteler · 2022-01-29T10:15:20Z

TODO: backport c700a8a to G 8.0.1

Backport done.

ecodiv · 2022-04-24T21:43:29Z

@metzm , did this perhaps cause the bug #2259 ?

…o#2108) * use type 7 algorithm of Hyndman and Fan (1996) for quantiles, as is the default in R and numpy * update manuals for `r.quantile`, `r.stats.quantile` * sync `r.stats quantile` to `r.quantile` * update test results for `r.neighbors`

fix quantile calculation, update manuals

6e4acf6

metzm added bug Something isn't working backport_needed C Related code is in C labels Jan 19, 2022

metzm added this to the 8.0.1 milestone Jan 19, 2022

metzm requested review from marisn and veroandreo January 19, 2022 17:35

quantiles

8df3509

* adjust test for r.neighbors * sync r.stats quantile to r.quantile

quantiles: adjust r.neighbors testsuite

c9082e7

quantiles

f6d4fdd

* update test results

metzm merged commit c700a8a into OSGeo:main Jan 21, 2022

metzm deleted the quantiles branch January 21, 2022 18:44

neteler removed the backport_needed label Jan 29, 2022

wenzeslaus mentioned this pull request Apr 25, 2022

[Bug] r.stats.quantile produces segfault #2259

Closed

wenzeslaus mentioned this pull request Jun 1, 2022

modules: Add JSON output for db.univar and v.db.univar #2386

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

r.quantile/r.stats.quantile/libstats: fix quantile algorithm #2108

r.quantile/r.stats.quantile/libstats: fix quantile algorithm #2108

metzm commented Jan 19, 2022

veroandreo commented Jan 20, 2022

metzm commented Jan 20, 2022

metzm commented Jan 20, 2022

neteler commented Jan 20, 2022

metzm commented Jan 20, 2022

metzm commented Jan 21, 2022

neteler commented Jan 29, 2022

ecodiv commented Apr 24, 2022 •

edited

Loading

r.quantile/r.stats.quantile/libstats: fix quantile algorithm #2108

r.quantile/r.stats.quantile/libstats: fix quantile algorithm #2108

Conversation

metzm commented Jan 19, 2022

veroandreo commented Jan 20, 2022

metzm commented Jan 20, 2022

metzm commented Jan 20, 2022

neteler commented Jan 20, 2022

metzm commented Jan 20, 2022

metzm commented Jan 21, 2022

neteler commented Jan 29, 2022

ecodiv commented Apr 24, 2022 • edited Loading

ecodiv commented Apr 24, 2022 •

edited

Loading