Possible incorrect argument used in test of inc_beta_ddb #2417

ricardoV94 · 2021-03-09T13:29:02Z

I think an incorrect argument is being passed to the function calls inc_beta_ddb_test:

math/test/unit/math/prim/fun/inc_beta_ddb_test.cpp

Lines 18 to 21 in de404cc

    
           EXPECT_FLOAT_EQ(3.2996082e-05, 
        
                           inc_beta_ddb(small_a, small_b, small_z, digamma(small_a), 
        
                                        digamma(small_a + small_b))) 
        
               << "reasonable values for a, b, x";

Shouldn't the fourth argument be digamma(small_b) instead of digamma(small_a) (and similarly to the other tests)?

Obviously this does not mean anything for the function itself, but it makes it difficult to check independently what the expected value should have been.

Apologies if I misunderstood something :)

The text was updated successfully, but these errors were encountered:

bbbales2 · 2021-03-09T21:31:14Z

Hmm, I don't really know about this function. I agree it looks strange.

Since you're asking the question I'm hoping maybe you can help me figure it out :D.

Do you know how to go about computing these things in R by any chance?

library(zipfR)

a = 1.5
b = 1.25
z = 0.5

dx = 1e-5

(zipfR::Ibeta(z, a, b + dx) - zipfR::Ibeta(z, a, b - dx)) / (2 * dx)

I thought this should be equivalent to inc_beta_ddb(1.5, 1.25, 0.5, digamma(1.25), digamma(2.75)), but it doesn't appear to be. Would there be different parameterizations of this type of function?

ricardoV94 · 2021-03-10T06:32:17Z

Looking here in the source code it looks like it should be the beta digamma

math/stan/math/prim/prob/beta_cdf.hpp

Line 133 in 9895be8

+= inc_beta_ddb(alpha_dbl, beta_dbl, y_dbl, digamma_beta[n],

I actually compared it with your other function which suggests the same: https://github.com/stan-dev/math/blob/master/stan/math/prim/fun/grad_reg_inc_beta.hpp

I am more used to python, so I also tested via

from scipy.special import betainc
eps = 1e-5
a = 2
b = 1
z = 0.5

grad = (betainc(a, b+eps, z) - betainc(a, b-eps, z)) / (2*eps)
print(grad)

The gradient is much closer using the digamma beta

ricardoV94 · 2021-03-10T08:05:08Z

(zipfR::Ibeta(z, a, b + dx) - zipfR::Ibeta(z, a, b - dx)) / (2 * dx)

It should be the zipf::Rbeta (regularized incomplete beta)

math/stan/math/prim/fun/inc_beta_ddb.hpp

Lines 20 to 21 in 9895be8

    
            * Returns the partial derivative of the regularized 
        
            * incomplete beta function, I_{z}(a, b) with respect to b.

bbbales2 · 2021-03-10T15:31:13Z

@ricardoV94 thanks for putting the time into investigating this. I'll have a look at this hopefully later today.

bbbales2 · 2021-03-10T23:33:47Z

Hmm, the code here seems to work for me for small values, but for the large value tests something might be going on. It might be that this function just isn't valid out to extreme parts of parameter space (in which case we won't test it out there), or could be a bug for large values (in which case will need to fix this).

I can run this code on wolframcloud and get a different thing than what the code gives (but it agrees with the finite differences):

a = 15000
b = 1.25
z = 0.999
((Gamma[b] Gamma[a + b])/Gamma[a]) (1 - z)^b HypergeometricPFQRegularized[ {b, b, 1 - a}, {1 + b, 1 + b}, 1 - z] + BetaRegularized[1 - z, b, a] (PolyGamma[b] - PolyGamma[a + b] - Log[1 - z])

That code comes from: https://functions.wolfram.com/GammaBetaErf/BetaRegularized/20/01/03/0001/

R finite differences give: 2.00849792806346e-06
Wolfram gives: 2.0085e-6
Stan gives: -2.1355524

Will have to continue the investigation later.

R code is:

(zipfR::Rbeta(z, a, b + dx) - zipfR::Rbeta(z, a, b - dx)) / (2 * dx))

Stan code is:

inc_beta_ddb(large_a, small_b, large_z, digamma(small_b),
                               digamma(large_a + small_b));

ricardoV94 · 2021-03-11T05:31:48Z

I noticed that when adapting your code to pymc3. The difference seems to go away if you make the threshold smaller in the STAN functions from 1e-10 to 1e-15

math/stan/math/prim/fun/inc_beta_ddb.hpp

Line 59 in 9895be8

double threshold = 1e-10;

And on the dda

Here is the python output of the same algorithm with more stringent thresholds:

# threshold = 1e-10
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# -2.1355523032324224

# threshod = 1e-12
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# -2.3924212087640326e-07

# threshold = 1e-15
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# 2.0060173144203154e-06

# threshold = 1e-18
_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))
# 2.008495010807732e-06

I assumed that was a speed tradeoff on your side. Or is it a red flag that the original difference was so large?

ricardoV94 · 2021-03-11T08:43:29Z

Something looks a bit funny with the summand variable in that extreme case. It starts by getting larger from the initial value and then getting smaller. By the "looks" of the algorithm I expected it to be a non-increasing value:

_betainc_ddb(15000, 1.25, 0.999, digamma(1.25), digamma(15000+1.25))

threshold=1e-12
summand=2.2496250468697938e-11  # initial summand is smaller than old threshold of 1e-10
summand=1.0384442264431505e-10
summand=3.665891374207815e-10
summand=1.047694298123003e-09
summand=2.5153463787056327e-09
summand=5.206333323510543e-09
summand=9.470635850818627e-09
summand=1.536623464951879e-08
summand=2.2501039747638058e-08
summand=3.002188727750973e-08
summand=3.6789065746491285e-08
summand=4.168201149077467e-08
summand=4.391455852728936e-08  # only here does it start decreasing
summand=4.323568264710523e-08
summand=3.995043593027368e-08
summand=3.477714397321581e-08
summand=2.8616825497631537e-08
summand=2.232595530024962e-08
summand=1.655896810584071e-08
summand=1.1704463091376664e-08
summand=7.90182769489177e-09
summand=5.1055153156812025e-09
summand=3.162945689332276e-09
summand=1.8820153177406136e-09
summand=1.0772476439193211e-09
summand=5.940180700712923e-10
summand=3.1598081569836583e-10
summand=1.6234662200064858e-10
summand=8.065943698192561e-11
summand=3.8794608086322824e-11
summand=1.8081595435582655e-11
summand=8.174648510873313e-12
summand=3.5880740066010608e-12
summand=1.5303262874820438e-12
summand=6.347265741755701e-13
Out[2]: -2.3924212087640326e-07  # Output

bbbales2 · 2021-03-11T13:09:50Z

Oof, yeah, I'm not sure where this power series came from or if someone derived it by hand, but I wasn't able to quickly identify the algorithm or anything here.

I wonder if the convergence criteria could be written in terms of sum_numer / sum_denom and make it more reliable? I'm not sure.

It seems like there's a problem here with the code, however I'm not quite sure what this is affecting yet. I went grepping in the code and I found grad_reg_inc_beta which from the comments seems like the same function but is computed a different way lol. I'm not sure what's going on. Anyway at this point this needs a more careful inspection in the code. If you figure out what's going on let me know.

Searching around I found this: https://core.ac.uk/download/pdf/82094101.pdf which might be something more concrete to base an implementation on :P.

ricardoV94 · 2021-03-11T13:36:27Z

grad_reg_inc_beta is also different in that it computes the derivatives of a and b simultaneously.

In any case, the current implementation (_dd*) seems to give the right results for non extreme values and also for extreme values when using a smaller threshold. So it's definitely not completely out there. I have no intuition about what could be used as a better convergence criteria however. I thought of something silly like checking for the threshold but also making sure the summand has decreased since the last iteration.

I found that reference you linked to but I am not sure how useful it is, since the goal seems to extend the derivatives to negative values of the incomplete beta. I found this older reference (https://core.ac.uk/download/pdf/6303153.pdf) with a C implementation here (https://github.com/canerturkmen/betaincder/blob/master/betaincder/c/beta.c) more promising as a benchmark.

The current algorithm in STAN gives the right results for the tabulated values in Table 1 of that reference.

Edit: It seems the current functions were added by @betanalpha in here: b465d12 and tweaked later a couple of times (without a substantial change in their logic), possibly because they were deemed more robust than the grad_reg_inc_beta?

bbbales2 · 2021-03-11T13:54:34Z

I found this older reference

Woops, yeah, that's the one I meant to link. Thanks for finding that C implementation.

seems to give the right results for non extreme values and also for extreme values when using a smaller threshold. So it's definitely not completely out there

Yeah I'm just not sure how to think about this. It seems like this function gets used in a lot of places so I'd need to look at those places to get a sense how important the large values are.

possibly because they were deemed more robust

Oh oh nice, so maybe we should be replacing grad_reg_inc_beta with this. It looks like the negative binomial cdfs and lcdfs are doing different things, which doesn't seem great. (Edit: and it does seem like inc_beta_dd* replaced grad_reg_inc_beta)

ricardoV94 · 2021-03-23T16:35:48Z

Hi @bbbales2

I am just wondering if you had any insights about the "issues" above in the meanwhile. I still couldn't quite figure out the source of the algorithm, but I am happy to help if you have any ideas on how to proceed :)

bbbales2 · 2021-03-23T23:54:38Z

@ricardoV94 thanks for the ping. I don't have any particular insights :P.

From what you've dug up, I'm thinking:

If there's no simple way to make these functions converge for the test cases, make the test cases easier and fix the bugs in them
grad_reg_inc_beta should probably be replaced everywhere in our code with these functions if it was worth doing in negative binomial
It would be nice to pin down the algorithm so that future people digging in the code know where to start looking but if it's not easy, as long as the results match the Mathematica reference in a useful range of values I'm happy :D.

but I am happy to help

If that's an offer to make a PR, by all means :D.

1 is a definite need to do and should be pretty straightforward.

2 would be more work -- it looks like this grad_reg_inc_beta gets used a lot -- would need to check the new algorithm works numerically everywhere and also doesn't cause big slowdowns.

sakrejda · 2021-03-24T01:19:35Z

A couple of comments:

the function states that it's only tested up to a/b = 12500 so 15000 is beyond that range (esp. with the large z). It's not surprising you'd lower accuracy outside that range
it makes sense the initial summand might be small and then get larger since the initial value and the following steps come from different formulas, I think overall that's ok (unless there's a specific error)
when I've changed related functions in the past it's definitely a lot of work to track down all the places they cause problems... so excellent warning :)
I've also found it useful to think of these as functions that always fail beyond some range, so one useful question is whether we can write a specific test that fails to produce values matching some credible reference (finite diff, mathematica, etc..) That would make for a more specific target for improvement.

ricardoV94 · 2021-03-24T10:28:02Z

I've also found it useful to think of these as functions that always fail beyond some range, so one useful question is whether we can write a specific test that fails to produce values matching some credible reference (finite diff, mathematica, etc..) That would make for a more specific target for improvement.

@sakrejda Could you expand on this? We found by accident that the discussed test point (even if given the right arguments) fails to match other implementations (i.e., finite-differences, the previous grad_reg_inc_beta and the automatic differentiation obtained in PyMC3, which all agree on the same value). Do you suggest this test condition should be kept to highlight a known convergence failure beyond the "working domain"?

ricardoV94 · 2021-03-24T10:38:41Z

@bbbales2

From what you've dug up, I'm thinking:

If there's no simple way to make these functions converge for the test cases, make the test cases easier and fix the bugs in them

It actually seems pretty straightforward to make them converge by making the threshold more strict. But I am completely unable to assess whether this is a worthwhile trade-off, since there is no clear justification for the initial threshold in the documentation. Changing the tests sounds fine, altough @sakrejda seems to argue such "failing" tests may still be useful to keep around.

grad_reg_inc_beta should probably be replaced everywhere in our code with these functions if it was worth doing in negative binomial

I don't have a good intuition. I can do some benchmarks to see if there is an obvious difference in computational cost for "usual values". One possible advantage of the old function is that it computes the derivatives of a and b simultaneously which might be faster than calling the separate methods one at a time.

It would be nice to pin down the algorithm so that future people digging in the code know where to start looking but if it's not easy, as long as the results match the Mathematica reference in a useful range of values I'm happy :D.

I will have a look at this, together with point 1 above. Perhaps a simple numerical analysis to measure the largest error / disagreement observed with this function in a given a, b, z range would be useful to add to the inline documentation of the functions.

but I am happy to help

If that's an offer to make a PR, by all means :D.

1 is a definite need to do and should be pretty straightforward.

I can give it a try.

2 would be more work -- it looks like this grad_reg_inc_beta gets used a lot -- would need to check the new algorithm works numerically everywhere and also doesn't cause big slowdowns.

We can discuss this after collecting more info perhaps.

sakrejda · 2021-03-24T13:38:43Z

Changing the tests sounds fine, altough @sakrejda seems to argue such "failing" tests may still be useful to keep around.

Since there's almost always a range beyond which these functions fail it seems like the standard is to document the range in the function doc (it's a soft failure, they get less accurate and there's error in everything else, it often doesn't matter much, it can often be avoided by rescaling). I would modify the test range so the function passes and leave an issue that states that beyond a certain point the function looses accuracy. If somebody needs that region we can add a PR to fix that region.

We found by accident that the discussed test point (even if given the right arguments) fails to match other implementations (i.e., finite-differences, the previous grad_reg_inc_beta and the automatic differentiation obtained in PyMC3, which all agree on the same value).

My suggestion is: 1) update the main post on the issue to document the reference values used (with code) and where you think the range should be extended; 2) make a branch that documents the failure; and 3) create a PR that fixes the branch to pass the new tests. So far I see mentions of various reference values and I'd be happy to help fix issues with the function but I'm not currently up to documenting the issue b/c you guys seem to have a handle on that already.

I don't have a good intuition. I can do some benchmarks to see if there is an obvious difference in computational cost for "usual values". One possible advantage of the old function is that it computes the derivatives of a and b simultaneously which might be faster than calling the separate methods one at a time.

Yes, there's usually some efficiency to be gained by computing both at the same time, and sometimes you can get better stability but you'd have to compare the code between the different functions (I can help with that once it gets there if you want).

I will have a look at this, together with point 1 above. Perhaps a simple numerical analysis to measure the largest error / disagreement observed with this function in a given a, b, z range would be useful to add to the inline documentation of the functions.

The algorithm may well be that some took the power-series used for the more general function and simplified the calculation where possible. I don't know what the current math lib standards are but from my point of view getting closer to known errors for specific domains would help the Stan math library.

@sakrejda Could you expand on this?

My main point was that all the other implementations fail beyond some domain too. They just do it more or less gracefully and some of them cover a larger domain before failing.

sakrejda · 2021-03-24T13:49:11Z

Some history: so I looked at this function (https://github.com/stan-dev/math/blob/9895be8ee0a27a9c7935cf19345d76fcbc85a69d/stan/math/prim/fun/inc_beta_ddb.hpp) and at grad_inc_beta and I think the original code was Michael Betancourt, and that I modified some of the current code to improve accuracy in parts of the domain (that's those if statements at the head of the function which use standard identities). grad_inc_beta currently defers to grad_2F1 and that avoids duplicating this kind of code.. I wrote that version of the algorithm and we (me and @bbbales2 I think...) updated the convergence conditions there so it might make sense to look at that version of the calculation if we want to improve the convergence checks.

So tl;dr: @ricardoV94 if you're interested in improving this algorithm in a specific direction I think that would make the lib better and I'm happy to help with the implementation.

ricardoV94 · 2021-03-24T14:05:23Z

@sakrejda Improving the convergence checks (assuming it can be done) sounds good. But does it make sense to maintain the two distinct approaches in the first place?

bbbales2 · 2021-03-24T14:09:19Z

I would modify the test range so the function passes and leave an issue that states that beyond a certain point the function looses accuracy. If somebody needs that region we can add a PR to fix that region.

+1

finite-differences, the previous grad_reg_inc_beta and the automatic differentiation obtained in PyMC3, which all agree on the same value)

grad_inc_beta currently defers to grad_2F1 and that avoids duplicating this kind of code.. I wrote that version of the algorithm

Yeah I was assuming that inc_beta_ddb was just better than grad_reg_inc_beta always. It sounds like maybe this is not true or has changed since the inc_beta_ddb pull if grad_reg_inc_beta has been improved.

But does it make sense to maintain the two distinct approaches at all?

I guess if one is better in different places then two makes sense, but it might be that one of these is just better than the other everywhere.

It seems wrong that we're using one for neg_binomial_cdf and another for neg_binomial_lcdf for sure.

sakrejda · 2021-03-24T14:10:54Z

That's a good question. To answer it we would need to compare inc_beta_ddb to grad_inc_beta over the same domain both in terms of accuracy and runtime. Is that something you are interested in trying? That seems to be where Ben is going with this as well. Either we get a clear answer about the need for a better inc_beta_ddb or we get to remove extra code. Ultimately it's going a be a PR with a pretty significant testing burden..

bbbales2 · 2021-03-24T14:12:09Z

It looks like inc_beta_dda is used in neg_binomial_cdf, neg_binomial_2_cdf and beta_cdf. So it seems like the function with a smaller reach. If grad_inc_beta covers that domain, then yeah I think we can just use grad_inc_beta.

ricardoV94 · 2021-03-24T14:15:01Z

On the other hand the inc_beta_dd* seem to be tested in more extreme scenarios. Not sure whether grad_inc_beta has been applied to the same cases to see how it fares.

bbbales2 · 2021-03-24T14:17:30Z

Yeah I'm definitely into an ez cleanup pull to fix tests and adjust docs if necessary.

significant testing burden

I guess the things we'd need to check are:

The problem inc_beta_ddb was introduced for is solved by current grad_reg_inc_beta. This is probably a combo of sleuthing the old pull + just running all our unit tests (a unit test should have been added to catch this, but it doesn't always happen)
Check performance. There's a benchmark script now so doing that should be as simple as:

benchmarks/benchmark.py --overloads Rev --skip_similar_signatures neg_binomial_lcdf

Example output looks like:

----------------------------------------------------------------------------------------------------------------
Benchmark                                                                      Time             CPU   Iterations
----------------------------------------------------------------------------------------------------------------
neg_binomial_lcdf_Prim_int_Rev_real_Rev_real/1/manual_time                 23838 ns        24529 ns        28707
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/1/manual_time               28881 ns        30994 ns        25135
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/4/manual_time               94405 ns        96667 ns         7047
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/16/manual_time             357939 ns       361341 ns         1897
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/64/manual_time            1438082 ns      1445610 ns          484
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/256/manual_time           5635943 ns      5659356 ns          113
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/1024/manual_time         22753745 ns     22839340 ns           31
neg_binomial_lcdf_Prim_int_Rev_real_Rev_vector/4096/manual_time         89693252 ns     89981340 ns            7
...

sakrejda · 2021-03-24T14:31:52Z

I think this issue is drifting so I wanted to address the original question: it does look like the wrong argument is used in the test.

ricardoV94 · 2021-03-24T14:52:54Z

I just did a very rough comparison (in Python) and the inc_beta_dd* seems to be a couple order of magnitudes faster than the grad_reg_inc_beta (even when calling both _dda and _ddb separately):

a, b, z = 150, 1.25, 0.99
digamma_alpha = sp.digamma(a)
digamma_beta = sp.digamma(b)
digamma_sum = sp.digamma(a+b)

grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.002718021453572821, 0.3368083597855926)

inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# (-0.0027178389107593466, 0.3368034464501962)

%timeit grad_reg_inc_beta(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 59.9 ms ± 806 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

%timeit inc_beta_ddab(a, b, z, digamma_alpha, digamma_beta, digamma_sum)
# 43.6 µs ± 5.05 µs per loop (mean ± std. dev. of 7 runs, 10000 loops each)

Of course this may not be the case for other values + the accuracy doubts raised above

bbbales2 · 2021-03-24T21:08:30Z

Did you try the implementation here: https://github.com/canerturkmen/betaincder/blob/master/betaincder/c/beta.c ? It looks like it comes with a Python library: https://github.com/canerturkmen/betaincder/

Maybe if this is both fast and covers the parameter space we should switch to a version of it instead of inc_beta_dda or grad_reg_inc_beta?

I plugged grad_reg_inc_beta into the benchmarks in neg_binomial_cdf and a grad_reg_inc_beta-solution was also quite a bit slower (on whatever generic values its being tested here):

inc_beta_dda
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
neg_binomial_cdf_Prim_int_Rev_real_Rev_real/1/manual_time                          2971 ns         3013 ns       233350
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1/manual_time                        3070 ns         3133 ns       227581
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/4/manual_time                       11831 ns        11901 ns        57954
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/16/manual_time                      46401 ns        46484 ns        14662
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/64/manual_time                     185235 ns       185327 ns         3707
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/256/manual_time                    742690 ns       742856 ns          913
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1024/manual_time                  2966910 ns      2965076 ns          236

grad_inc_beta
-----------------------------------------------------------------------------------------------------------------------
Benchmark                                                                             Time             CPU   Iterations
-----------------------------------------------------------------------------------------------------------------------
neg_binomial_cdf_Prim_int_Rev_real_Rev_real/1/manual_time                          9408 ns         9447 ns        72263
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1/manual_time                        9548 ns         9617 ns        73250
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/4/manual_time                       39634 ns        39687 ns        17772
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/16/manual_time                     156383 ns       156474 ns         4386
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/64/manual_time                     610616 ns       610688 ns         1135
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/256/manual_time                   2516604 ns      2516469 ns          280
neg_binomial_cdf_Prim_int_Rev_real_Rev_vector/1024/manual_time                 10101337 ns     10096834 ns           61

sakrejda · 2021-03-24T21:52:52Z

Benchmarks on a limited set of values are not going to be very informative...

bbbales2 · 2021-03-24T21:59:13Z

not going to be very informative...

I feel informed though! I now know that in two situations grad_inc_beta was slower.

But these are all iterative algorithms, so are you saying it is likely grad_reg_inc_beta is faster in other areas? Or just that differences this radical seem fishy to you?

I don't see tolerance arguments here so it's not even clear we're computing to similar precisions or anything.

ricardoV94 · 2021-03-25T07:01:35Z

Obviously we also need to check for the accuracy of the computations, but those speed benchmarks look promising.

I'll do a more careful test soon taking into consideration whether/where the outputs agree or diverge also with finite differences and autodiff (I don't have Mathematica to compare, but maybe someone here has?) I can also check for that other implementation on that repo.

sakrejda · 2021-03-25T13:21:52Z

I feel informed though! I now know that in two situations grad_inc_beta was slower.

I'm saying the number of iterations it takes the iterative algorithm to arrive at a specific accuracy depends heavily on the point in parameter space you are running the calculation for. So every time I modified these functions I ended up having to run a grid or random sample with over-sampling at the boundaries to feel confident the code changes weren't just thrashing around. It's very common to have different implementations for different parts of the domain too, so it may make sense to keep both implementations, just re-organize when they're called!

Anyway, I wasn't trying to be negative about the work you are doing, just cautioning against making decisions that turn into thrashing

bbbales2 · 2021-03-25T13:57:41Z

I don't have Mathematica to compare, but maybe someone here has?

I used this to compute the earlier functions: https://www.wolframcloud.com/

Seems free to use.

Edit: And derivatives copy-pasted from: https://functions.wolfram.com/GammaBetaErf/BetaRegularized/20/01/03/0001/

every time I modified these functions I ended up having to run a grid or random sample with over-sampling at the boundaries to feel confident the code changes weren't just thrashing around

Yeah makes sense. Hopefully we added unit tests for all the numeric weirdnesses we've hit along the way so we don't accidentally shrink our function domains or anything :/.

I wasn't trying to be negative about the work you are doing

Yeah no prob

ricardoV94 · 2021-05-11T14:37:36Z

I finally got some time to do some comparisons. You can check my repo here and the comparison notebook here:

I compared the following algorithms:

grad_reg_inc_beta (STAN)
inc_beta (STAN)
inc_beta_strict (same as inc_beta with a threshold of 1e-18 instead of the now default 1e-10)
inbeder (described in https://core.ac.uk/download/pdf/6303153.pdf)

All algorithms were implemented in vanilla non-vectorized python, which hopefully makes for a fair comparison.

I evaluated the derivative w.r.t. a on the following grid:

search_grid = dict(
    z = (0.001, 0.25, 0.5, 0.75, 0.999),
    a = (1.5, 15, 150, 1500, 15000),
    b = (1.25, 12.5, 125, 1250, 12500),
)

And used the following code to obtain reference values from Wolfram Cloud:

NumberForm[
  Table[
    (
      -((Gamma[a] Gamma[a + b]) / Gamma[b])) z^a HypergeometricPFQRegularized[{a, a, 1 - b}, {1 + a, 1 + a}, z] 
      + BetaRegularized[z, a, b] (Log[z] - PolyGamma[a] + PolyGamma[a + b]
    ),
    {z, { 0.001,0.25,0.5, 0.75, 0.999}},
    {a, {1.5, 15, 150, 1500, 15000}},
    {b, {1.25, 12.5, 125, 1250, 12500}}
  ], 
  16
]

Most of the times the 4 algorithms agree closely to each other. The results of the current inc_beta tend to be a bit less precise, but changing the threshold to 1e-18 (i.e., inc_beta_strict) fixes these differences.

The grad_reg_inc_beta fails to converge more often than the alternatives (see nans in the first plot), and tends to be considerably slower. It also takes "ages" when z=0.999 and b in (1.25, 12.5) (see second plot), but maybe this would be aborted early on by this line that I did not implement:

math/stan/math/prim/fun/grad_2F1.hpp

Line 41 in 9207570

check_2F1_converges("grad_2F1", a1, a2, b1, z);

The output of inc_beta_strict and inbeder tend to be very similar (more so than those of inc_beta_strict and grad_reg_inc_beta). In terms of performance inc_beta_strict seems sligtly faster for extreme z in (0.001, 0.999), while incbeder is faster for intermediate z in (0.25, 0.5, 0.75).

Log absolute error of derivative wrt a (using Wolfram as a reference)

Log of average execution time in seconds (out of 100 reps; computing both derivatives wrt a and b simultaneously for fairness):

You may also notice large absolute errors relative to the wolfram reference that are shared by all implementations in the first plot (bright yellow squares). When inspecting these it seems instability on the side of wolfram. For instance for z= 0.25; a = 150; b = 1250, wolfram reports a crazy large derivative of -9.0079425013435e+215, while the 4 algorithms return a derivative close to zero, which I think is correct? Here is the plot of betainc for z=0.25, b=1250 with a on the x-axis and incbeta on the y-axis:

                  wolfram: -9.0079425013435e+215
   stan_grad_reg_inc_beta: 5.64240006350661e-13
            stan_inc_beta: -6.875931379629293e-42
     stan_inc_beta_strict: -6.8759313797203e-42
                  inbeder: -6.875906440587697e-42

In addition, whenever the grad_reg_inc_beta seems to disagree with the alternatives and be closer to the wolfram reference, it seems both are biased away from the correct answer (e.g., z=0.25, a=15; b=12500):

                  wolfram: 5.374002228495603
   stan_grad_reg_inc_beta: 5.307335561828936
            stan_inc_beta: -0.0
     stan_inc_beta_strict: -0.0
                  inbeder: -0.0

Overall, I would say the inc_beta algorithm looks preferable to the grad_reg_inc_beta, but perhaps it is justified to increase the threshold from 1e-10 closer to 1e-18 for more stable results.

This change also sounds less disruptive than replacing it by the alternative inbeder even though that one has the potential to be slightly faster in some scenarios (plus it's derivation being clearly documented in the reference above).

Let me know if you think there is something else / more I should have done :)

ricardoV94 mentioned this issue Mar 11, 2021

Replace slow dist_math::incomplete_beta with aesara op pymc-devs/pymc#4519

Closed

6 tasks

ricardoV94 mentioned this issue Jun 4, 2021

Replace incomplete_beta with betainc and speedup logcdf tests pymc-devs/pymc#4736

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Possible incorrect argument used in test of inc_beta_ddb #2417

Possible incorrect argument used in test of inc_beta_ddb #2417

ricardoV94 commented Mar 9, 2021 •

edited

Loading

bbbales2 commented Mar 9, 2021

ricardoV94 commented Mar 10, 2021 •

edited

Loading

ricardoV94 commented Mar 10, 2021 •

edited

Loading

bbbales2 commented Mar 10, 2021

bbbales2 commented Mar 10, 2021 •

edited

Loading

ricardoV94 commented Mar 11, 2021 •

edited

Loading

ricardoV94 commented Mar 11, 2021 •

edited

Loading

bbbales2 commented Mar 11, 2021

ricardoV94 commented Mar 11, 2021 •

edited

Loading

bbbales2 commented Mar 11, 2021 •

edited

Loading

ricardoV94 commented Mar 23, 2021

bbbales2 commented Mar 23, 2021

sakrejda commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021 •

edited

Loading

sakrejda commented Mar 24, 2021

sakrejda commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021 •

edited

Loading

bbbales2 commented Mar 24, 2021

sakrejda commented Mar 24, 2021

bbbales2 commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021

bbbales2 commented Mar 24, 2021

sakrejda commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021 •

edited

Loading

bbbales2 commented Mar 24, 2021

sakrejda commented Mar 24, 2021

bbbales2 commented Mar 24, 2021

ricardoV94 commented Mar 25, 2021

sakrejda commented Mar 25, 2021

bbbales2 commented Mar 25, 2021 •

edited

Loading

ricardoV94 commented May 11, 2021 •

edited

Loading

Possible incorrect argument used in test of inc_beta_ddb #2417

Possible incorrect argument used in test of inc_beta_ddb #2417

Comments

ricardoV94 commented Mar 9, 2021 • edited Loading

bbbales2 commented Mar 9, 2021

ricardoV94 commented Mar 10, 2021 • edited Loading

ricardoV94 commented Mar 10, 2021 • edited Loading

bbbales2 commented Mar 10, 2021

bbbales2 commented Mar 10, 2021 • edited Loading

ricardoV94 commented Mar 11, 2021 • edited Loading

ricardoV94 commented Mar 11, 2021 • edited Loading

bbbales2 commented Mar 11, 2021

ricardoV94 commented Mar 11, 2021 • edited Loading

bbbales2 commented Mar 11, 2021 • edited Loading

ricardoV94 commented Mar 23, 2021

bbbales2 commented Mar 23, 2021

sakrejda commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021 • edited Loading

sakrejda commented Mar 24, 2021

sakrejda commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021 • edited Loading

bbbales2 commented Mar 24, 2021

sakrejda commented Mar 24, 2021

bbbales2 commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021

bbbales2 commented Mar 24, 2021

sakrejda commented Mar 24, 2021

ricardoV94 commented Mar 24, 2021 • edited Loading

bbbales2 commented Mar 24, 2021

sakrejda commented Mar 24, 2021

bbbales2 commented Mar 24, 2021

ricardoV94 commented Mar 25, 2021

sakrejda commented Mar 25, 2021

bbbales2 commented Mar 25, 2021 • edited Loading

ricardoV94 commented May 11, 2021 • edited Loading

ricardoV94 commented Mar 9, 2021 •

edited

Loading

ricardoV94 commented Mar 10, 2021 •

edited

Loading

ricardoV94 commented Mar 10, 2021 •

edited

Loading

bbbales2 commented Mar 10, 2021 •

edited

Loading

ricardoV94 commented Mar 11, 2021 •

edited

Loading

ricardoV94 commented Mar 11, 2021 •

edited

Loading

ricardoV94 commented Mar 11, 2021 •

edited

Loading

bbbales2 commented Mar 11, 2021 •

edited

Loading

ricardoV94 commented Mar 24, 2021 •

edited

Loading

ricardoV94 commented Mar 24, 2021 •

edited

Loading

ricardoV94 commented Mar 24, 2021 •

edited

Loading

bbbales2 commented Mar 25, 2021 •

edited

Loading

ricardoV94 commented May 11, 2021 •

edited

Loading