-
Notifications
You must be signed in to change notification settings - Fork 1.6k
<random>: Improve binomial_distribution accuracy #1531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- For small mean, replaces Poisson approximation with exact waiting time method based on cumulative sum of exponential variates. - Uses cheap but rigorous approximation of exit condition for fast return on first iteration.
|
BTW, for reference, the underlying algorithm is from "Non-Uniform Random Variate Generation", Luc Devroye, section X.4, "Second waiting time method". |
I think I'm going to restore the original version anyway, because |
StephanTLavavej
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me - the logic here is significantly beyond my experience so I really appreciate the PR. 😻 I will push changes for 2 trivial things I noticed.
statementreply
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The proposed change doesn't correctly handle the case that n * (1-p) < 1.
When p > 0.5, except for the small n case, the original algorithm generates B(n, p) as n - B(n, 1-p). See binomial_distribution::param_type::_Init():
_Pp = _Px < 0.5 ? _Px : (1.0 - _Px);
_Mean = _Tx * _Pp;and the return below:
return _Par0._Px == _Par0._Pp ? _Res : static_cast<_Ty>(_Par0._Tx - _Res);Test case:
#include <cmath>
#include <iostream>
#include <random>
#include <vector>
using namespace std;
int main() {
constexpr int it_max = 10'000'000;
constexpr int n = 25;
constexpr double mean = n - 0.99;
constexpr double p = mean / n;
constexpr double var = n * p * (1.0 - p);
mt19937 gen;
binomial_distribution<> dist(n, p);
vector<int> counts(n + 1);
for (int i = 0; i < it_max; ++i) {
++counts[static_cast<size_t>(dist(gen))];
}
double sample_mean = 0.0;
for (size_t i = 1; i < counts.size(); ++i) {
sample_mean += static_cast<double>(i) * counts[i];
}
sample_mean /= it_max;
cout << "Expected mean: " << mean << endl
<< "Observed mean: " << sample_mean << endl;
double sample_var = 0.0;
for (size_t i = 0; i < counts.size(); ++i) {
sample_var += counts[i] * pow(i - sample_mean, 2);
}
sample_var /= it_max - 1;
cout << "Expected variance: " << var << endl
<< "Observed variance: " << sample_var << endl;
return 0;
}Pre-PR output:
Expected mean: 24.01
Observed mean: 24.0104
Expected variance: 0.950796
Observed variance: 0.988557
Post-PR output:
Expected mean: 24.01
Observed mean: 0.990254
Expected variance: 0.950796
Observed variance: 0.950525
(Sorry I missed this in my previous review)
|
@StephanTLavavej I've pushed a bugfix based on @statementreply's feedback after your approval. |
CaseyCarter
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FYI: I've pushed a merge with main, and updated the list of libcxx test skips to indicate that my guess that #1530 caused std/numerics/rand/rand.dis/rand.dist.bern/rand.dist.bern.bin/eval.PR44847.pass.cpp to fail was wrong.
|
Thanks for noticing and fixing this accuracy bug! 🎯 🎯 🎯 |
Fixes #1530.