<random>: Improve binomial_distribution accuracy #1531

MattStephanson · 2020-12-14T07:28:11Z

For small mean, replaces Poisson approximation with exact waiting time method based on cumulative sum of exponential variates.
Uses cheap but rigorous approximation of exit condition for fast return on first iteration.

- For small mean, replaces Poisson approximation with exact waiting time method based on cumulative sum of exponential variates. - Uses cheap but rigorous approximation of exit condition for fast return on first iteration.

MattStephanson · 2020-12-21T16:05:21Z

BTW, for reference, the underlying algorithm is from "Non-Uniform Random Variate Generation", Luc Devroye, section X.4, "Second waiting time method".

stl/inc/random

MattStephanson · 2020-12-27T19:08:03Z

There is an inconsistency wrt Rand and _Rand

I think I'm going to restore the original version anyway, because 1.0 - _Par0._Mean makes it impossible to return anything other than 0 when _Par0._Mean < DBL_EPSILON.

StephanTLavavej

Looks good to me - the logic here is significantly beyond my experience so I really appreciate the PR. 😻 I will push changes for 2 trivial things I noticed.

stl/inc/random

tests/std/tests/GH_001530_binomial_accuracy/test.cpp

statementreply

The proposed change doesn't correctly handle the case that n * (1-p) < 1.

When p > 0.5, except for the small n case, the original algorithm generates B(n, p) as n - B(n, 1-p). See binomial_distribution::param_type::_Init():

_Pp    = _Px < 0.5 ? _Px : (1.0 - _Px);
_Mean  = _Tx * _Pp;

and the return below:

return _Par0._Px == _Par0._Pp ? _Res : static_cast<_Ty>(_Par0._Tx - _Res);

Test case:

#include <cmath>
#include <iostream>
#include <random>
#include <vector>

using namespace std;

int main() {
    constexpr int it_max  = 10'000'000;
    constexpr int n       = 25;
    constexpr double mean = n - 0.99;
    constexpr double p    = mean / n;
    constexpr double var  = n * p * (1.0 - p);

    mt19937 gen;
    binomial_distribution<> dist(n, p);

    vector<int> counts(n + 1);
    for (int i = 0; i < it_max; ++i) {
        ++counts[static_cast<size_t>(dist(gen))];
    }

    double sample_mean = 0.0;
    for (size_t i = 1; i < counts.size(); ++i) {
        sample_mean += static_cast<double>(i) * counts[i];
    }
    sample_mean /= it_max;

    cout << "Expected mean: " << mean << endl
         << "Observed mean: " << sample_mean << endl;

    double sample_var = 0.0;
    for (size_t i = 0; i < counts.size(); ++i) {
        sample_var += counts[i] * pow(i - sample_mean, 2);
    }
    sample_var /= it_max - 1;

    cout << "Expected variance: " << var << endl
         << "Observed variance: " << sample_var << endl;

    return 0;
}

Pre-PR output:

Expected mean: 24.01
Observed mean: 24.0104
Expected variance: 0.950796
Observed variance: 0.988557

Post-PR output:

Expected mean: 24.01
Observed mean: 0.990254
Expected variance: 0.950796
Observed variance: 0.950525

(Sorry I missed this in my previous review)

MattStephanson · 2021-01-14T06:10:39Z

@StephanTLavavej I've pushed a bugfix based on @statementreply's feedback after your approval.

tests/std/tests/GH_001530_binomial_accuracy/test.cpp

CaseyCarter

FYI: I've pushed a merge with main, and updated the list of libcxx test skips to indicate that my guess that #1530 caused std/numerics/rand/rand.dis/rand.dist.bern/rand.dist.bern.bin/eval.PR44847.pass.cpp to fail was wrong.

StephanTLavavej · 2021-02-12T22:37:41Z

Thanks for noticing and fixing this accuracy bug! 🎯 🎯 🎯

Fixes inaccurate binomial_distribution for certain parameters

c3e29e1

- For small mean, replaces Poisson approximation with exact waiting time method based on cumulative sum of exponential variates. - Uses cheap but rigorous approximation of exit condition for fast return on first iteration.

MattStephanson requested a review from a team as a code owner December 14, 2020 07:28

buildfix

8ab30a0

StephanTLavavej added the bug Something isn't working label Dec 15, 2020

CaseyCarter assigned mnatsuhara Dec 16, 2020

statementreply reviewed Dec 27, 2020

View reviewed changes

stl/inc/random Outdated Show resolved Hide resolved

stl/inc/random Outdated Show resolved Hide resolved

MattStephanson added 3 commits December 27, 2020 08:36

code review

ff17926

code review: simplify control flow

8dfeba4

buildfix

a107fd4

MattStephanson marked this pull request as draft December 27, 2020 19:06

MattStephanson added 2 commits December 27, 2020 11:14

Fix copy-paste mistake

498c024

Make comment consistent with code

225f510

MattStephanson marked this pull request as ready for review December 27, 2020 19:41

StephanTLavavej reviewed Jan 9, 2021

View reviewed changes

stl/inc/random Outdated Show resolved Hide resolved

tests/std/tests/GH_001530_binomial_accuracy/test.cpp Show resolved Hide resolved

Code review feedback.

9203baf

StephanTLavavej approved these changes Jan 9, 2021

View reviewed changes

statementreply suggested changes Jan 13, 2021

View reviewed changes

Handle p ~1 case.

8bad736

statementreply reviewed Jan 16, 2021

View reviewed changes

tests/std/tests/GH_001530_binomial_accuracy/test.cpp Outdated Show resolved Hide resolved

MattStephanson added 2 commits January 15, 2021 23:58

typo

5182599

resolve merge conflict

694498b

Base automatically changed from master to main January 28, 2021 00:35

StephanTLavavej approved these changes Feb 6, 2021

View reviewed changes

CaseyCarter added 2 commits February 8, 2021 13:20

Merge branch 'main' into microsoftgh-1530

22a044f

Casey's guess that microsoft#1530 caused these libcxx failures was wrong

f4807a5

CaseyCarter approved these changes Feb 8, 2021

View reviewed changes

StephanTLavavej approved these changes Feb 8, 2021

View reviewed changes

StephanTLavavej assigned StephanTLavavej and unassigned mnatsuhara Feb 8, 2021

StephanTLavavej merged commit 03c79d7 into microsoft:main Feb 12, 2021

MattStephanson deleted the gh-1530 branch February 14, 2021 01:06

StephanTLavavej mentioned this pull request Apr 28, 2025

<random>: Remove one meaningless _Small_poisson_distribution::_Init call #5442

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

<random>: Improve binomial_distribution accuracy #1531

<random>: Improve binomial_distribution accuracy #1531

Uh oh!

MattStephanson commented Dec 14, 2020

Uh oh!

MattStephanson commented Dec 21, 2020

Uh oh!

Uh oh!

Uh oh!

MattStephanson commented Dec 27, 2020

Uh oh!

StephanTLavavej left a comment

Uh oh!

Uh oh!

Uh oh!

statementreply left a comment

Uh oh!

MattStephanson commented Jan 14, 2021

Uh oh!

Uh oh!

CaseyCarter left a comment

Uh oh!

StephanTLavavej commented Feb 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

<random>: Improve binomial_distribution accuracy #1531

<random>: Improve binomial_distribution accuracy #1531

Uh oh!

Conversation

MattStephanson commented Dec 14, 2020

Uh oh!

MattStephanson commented Dec 21, 2020

Uh oh!

Uh oh!

Uh oh!

MattStephanson commented Dec 27, 2020

Uh oh!

StephanTLavavej left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

statementreply left a comment

Choose a reason for hiding this comment

Uh oh!

MattStephanson commented Jan 14, 2021

Uh oh!

Uh oh!

CaseyCarter left a comment

Choose a reason for hiding this comment

Uh oh!

StephanTLavavej commented Feb 12, 2021

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants