Skip to content

Conversation

@MattStephanson
Copy link
Contributor

Fixes #1530.

  • For small mean, replaces Poisson approximation with exact waiting time method based on cumulative sum of exponential variates.
  • Uses cheap but rigorous approximation of exit condition for fast return on first iteration.

 - For small mean, replaces Poisson approximation with exact waiting time method based on cumulative sum of exponential variates.
 - Uses cheap but rigorous approximation of exit condition for fast return on first iteration.
@MattStephanson MattStephanson requested a review from a team as a code owner December 14, 2020 07:28
@StephanTLavavej StephanTLavavej added the bug Something isn't working label Dec 15, 2020
@MattStephanson
Copy link
Contributor Author

BTW, for reference, the underlying algorithm is from "Non-Uniform Random Variate Generation", Luc Devroye, section X.4, "Second waiting time method".

@MattStephanson MattStephanson marked this pull request as draft December 27, 2020 19:06
@MattStephanson
Copy link
Contributor Author

There is an inconsistency wrt Rand and _Rand

I think I'm going to restore the original version anyway, because 1.0 - _Par0._Mean makes it impossible to return anything other than 0 when _Par0._Mean < DBL_EPSILON.

@MattStephanson MattStephanson marked this pull request as ready for review December 27, 2020 19:41
Copy link
Member

@StephanTLavavej StephanTLavavej left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me - the logic here is significantly beyond my experience so I really appreciate the PR. 😻 I will push changes for 2 trivial things I noticed.

Copy link
Contributor

@statementreply statementreply left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The proposed change doesn't correctly handle the case that n * (1-p) < 1.

When p > 0.5, except for the small n case, the original algorithm generates B(n, p) as n - B(n, 1-p). See binomial_distribution::param_type::_Init():

_Pp    = _Px < 0.5 ? _Px : (1.0 - _Px);
_Mean  = _Tx * _Pp;

and the return below:

return _Par0._Px == _Par0._Pp ? _Res : static_cast<_Ty>(_Par0._Tx - _Res);

Test case:

#include <cmath>
#include <iostream>
#include <random>
#include <vector>

using namespace std;

int main() {
    constexpr int it_max  = 10'000'000;
    constexpr int n       = 25;
    constexpr double mean = n - 0.99;
    constexpr double p    = mean / n;
    constexpr double var  = n * p * (1.0 - p);

    mt19937 gen;
    binomial_distribution<> dist(n, p);

    vector<int> counts(n + 1);
    for (int i = 0; i < it_max; ++i) {
        ++counts[static_cast<size_t>(dist(gen))];
    }

    double sample_mean = 0.0;
    for (size_t i = 1; i < counts.size(); ++i) {
        sample_mean += static_cast<double>(i) * counts[i];
    }
    sample_mean /= it_max;

    cout << "Expected mean: " << mean << endl
         << "Observed mean: " << sample_mean << endl;

    double sample_var = 0.0;
    for (size_t i = 0; i < counts.size(); ++i) {
        sample_var += counts[i] * pow(i - sample_mean, 2);
    }
    sample_var /= it_max - 1;

    cout << "Expected variance: " << var << endl
         << "Observed variance: " << sample_var << endl;

    return 0;
}

Pre-PR output:

Expected mean: 24.01
Observed mean: 24.0104
Expected variance: 0.950796
Observed variance: 0.988557

Post-PR output:

Expected mean: 24.01
Observed mean: 0.990254
Expected variance: 0.950796
Observed variance: 0.950525

(Sorry I missed this in my previous review)

@MattStephanson
Copy link
Contributor Author

@StephanTLavavej I've pushed a bugfix based on @statementreply's feedback after your approval.

Base automatically changed from master to main January 28, 2021 00:35
Copy link
Contributor

@CaseyCarter CaseyCarter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI: I've pushed a merge with main, and updated the list of libcxx test skips to indicate that my guess that #1530 caused std/numerics/rand/rand.dis/rand.dist.bern/rand.dist.bern.bin/eval.PR44847.pass.cpp to fail was wrong.

@StephanTLavavej StephanTLavavej merged commit 03c79d7 into microsoft:main Feb 12, 2021
@StephanTLavavej
Copy link
Member

Thanks for noticing and fixing this accuracy bug! 🎯 🎯 🎯

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

<random>: Poisson approximation for binomial_distribution is not accurate

6 participants