Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Is Mean for NegativeBinomial distribution correct? #455

Open
dsyme opened this issue Dec 5, 2016 · 6 comments
Open

Is Mean for NegativeBinomial distribution correct? #455

dsyme opened this issue Dec 5, 2016 · 6 comments
Assignees

Comments

@dsyme
Copy link

dsyme commented Dec 5, 2016

Playing around with distributions and it seems that the Mean of the NegativeBinomial distribution doesn't match the mean of samples.

open MathNet.Numerics
let b7 = Distributions.NegativeBinomial(2.0, 0.5) 

/// This gives about 4.0
[ for i in 0 .. 100 -> b7.Sample() ]  |> List.averageBy float

//This says 2.0
b7.Mean
@cdrnet
Copy link
Member

cdrnet commented Dec 5, 2016

That looks suspicious, thanks for reporting!

@cdrnet cdrnet self-assigned this Dec 5, 2016
@cdrnet
Copy link
Member

cdrnet commented Dec 7, 2016

It turns out the mean is correct, but the samples return the number of trials (for reaching r successes) instead of the number of failures (until r successes are reached). Both are common definitions of the distribution, but they of course should be consistent within the implementation. We chose the second definition since it can be extended to all positive real numbers of r instead of just integers.

Quick workaround until it is fixed: subtract r (here: 2.0) from all the generated samples.

Unfortunately since r is real, we cannot actually subtract it from our samples since the discrete distribution interface assumes samples are integers. We may need to split this distribution if one accepting only integer r and thus also samples integers, and an extended one for real r with another interface.

@cdrnet cdrnet modified the milestones: Numerics v4.0, Numerics v4.x Feb 8, 2018
@thamilto
Copy link

thamilto commented Jul 2, 2018

I'd like to point out that the quick work-around doesn't seem to work. In the example given by dsyme if I run this on my PC I get positive integer and 0 samples. I don't follow how this can be interpreted as "the number of trials (for reaching r successes)". Knocking off 2.0 from a 0 sample, regardless of types, doesn't seem to make sense.

Can someone elaborate?

@vermorel
Copy link

vermorel commented Mar 8, 2019

I might have pinpointed the problem in NegativeBinomial, the code is:

        static int SampleUnchecked(System.Random rnd, double r, double p)
        {
            var lambda = Gamma.SampleUnchecked(rnd, r, p);
            var c = Math.Exp(-lambda);
            var p1 = 1.0;
            var k = 0;
            do
            {
                k = k + 1;
                p1 = p1*rnd.NextDouble();
            }
            while (p1 >= c);
            return k - 1;
        }

The first code line should be replaced by

var lambda = Gamma.SampleUnchecked(rnd, r, (1 - p)/p);

Also, it would be nice to point-out in the code that the second part is actually drawing a Poisson sample (which could benefit from the optimized implementation in PoissonDistribution.cs which distinguish small lambdas from big lambdas).

@cdrnet cdrnet removed this from the Numerics v4.x milestone Jul 24, 2021
@tlatarche
Copy link

tlatarche commented Aug 17, 2021

Is there a plan to implement the fix suggested by @vermorel above? This would solve a similar issue I'm seeing.

EDIT: Correction - having given further thought it should be p/(1-p) not (1-p)/p

@Arlofin
Copy link
Contributor

Arlofin commented Sep 29, 2022

The sample method is simply incorrect, @tlatarche's proposal is correct.
Just to recall the relationships: With the chosen definition (r is number of successes, p is probability of success), the mean is given by (1-p)/p*r and the NB distribution can be expressed as Poisson distribution with parameter being a sample from the Gamma distribution with parameters r and p/(1-p).
The current sampling implementation can be interpreted as using instead a p', which is implicitly defined by p = p'/(1-p'). With a little bit of calculus, it can be shown that this leads to a sampling mean which differs by the true mean by a factor of 1-p, which is consistent with @dsyme's observation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants