Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Uniform sampling: use Canon's method #1287

Merged
merged 10 commits into from
Mar 24, 2023
Merged

Conversation

dhardy
Copy link
Member

@dhardy dhardy commented Feb 17, 2023

Closes #570, #1145, #1154, #1196, #1286. See also #1172 (TODO: SIMD), #494 (here we add "unbiased" feature flag).

Also implements PartialEq for all our Uniform impls and Eq for all but FP. See #1217.


Yet another PR to finally update Uniform integer sampling (maybe):

  • Uses Canon's method (up to two RNG samples) for distribution and single sampling
  • Adds an "unbiased" feature flag, which instead uses Lemire's method for distributions and Canon's methods with unlimited samples for single-sampling

Based on canon-uniform-benches branch, revised
This is a small tweak unsupported by evidence, but brings
SIMD in line with unbiased integer range sampling.
Note: unbiased does pass current value-stability tests,
but could fail extra ones in the future.
@dhardy
Copy link
Member Author

dhardy commented Feb 17, 2023

Baseline results (new benchmark on master)

samplei8/SmallRng/single
time: [1.9184 ns 1.9187 ns 1.9190 ns]
Found 20223 outliers among 100000 measurements (20.22%)
1018 (1.02%) low severe
16 (0.02%) low mild
158 (0.16%) high mild
19031 (19.03%) high severe
Benchmarking samplei8/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 52520.
samplei8/SmallRng/distr time: [1.1100 ns 1.1103 ns 1.1107 ns]
Found 13145 outliers among 100000 measurements (13.14%)
5635 (5.63%) low severe
423 (0.42%) low mild
2051 (2.05%) high mild
5036 (5.04%) high severe
samplei8/ChaCha8Rng/single
time: [2.3286 ns 2.3305 ns 2.3324 ns]
Found 5234 outliers among 100000 measurements (5.23%)
1877 (1.88%) high mild
3357 (3.36%) high severe
samplei8/ChaCha8Rng/distr
time: [1.7106 ns 1.7107 ns 1.7109 ns]
Found 4087 outliers among 100000 measurements (4.09%)
444 (0.44%) low mild
2617 (2.62%) high mild
1026 (1.03%) high severe
samplei8/Pcg32/single time: [1.6857 ns 1.6865 ns 1.6873 ns]
Found 6777 outliers among 100000 measurements (6.78%)
510 (0.51%) high mild
6267 (6.27%) high severe
samplei8/Pcg32/distr time: [1.2538 ns 1.2539 ns 1.2540 ns]
Found 27845 outliers among 100000 measurements (27.84%)
22197 (22.20%) low severe
47 (0.05%) low mild
7 (0.01%) high mild
5594 (5.59%) high severe
samplei8/Pcg64/single time: [2.1280 ns 2.1290 ns 2.1301 ns]
Found 22971 outliers among 100000 measurements (22.97%)
13429 (13.43%) low severe
5441 (5.44%) low mild
558 (0.56%) high mild
3543 (3.54%) high severe
samplei8/Pcg64/distr time: [1.4348 ns 1.4349 ns 1.4350 ns]
Found 40705 outliers among 100000 measurements (40.70%)
16283 (16.28%) low severe
24422 (24.42%) high severe

samplei16/SmallRng/single
time: [1.9058 ns 1.9059 ns 1.9060 ns]
Found 2986 outliers among 100000 measurements (2.99%)
2986 (2.99%) high severe
Benchmarking samplei16/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.4s, enable flat sampling, or reduce sample count to 52870.
samplei16/SmallRng/distr
time: [1.0703 ns 1.0706 ns 1.0709 ns]
Found 986 outliers among 100000 measurements (0.99%)
444 (0.44%) high mild
542 (0.54%) high severe
samplei16/ChaCha8Rng/single
time: [2.0399 ns 2.0404 ns 2.0409 ns]
Found 4574 outliers among 100000 measurements (4.57%)
138 (0.14%) low mild
65 (0.07%) high mild
4371 (4.37%) high severe
samplei16/ChaCha8Rng/distr
time: [1.7870 ns 1.7874 ns 1.7877 ns]
Found 4997 outliers among 100000 measurements (5.00%)
1024 (1.02%) low mild
2289 (2.29%) high mild
1684 (1.68%) high severe
samplei16/Pcg32/single time: [1.6959 ns 1.6967 ns 1.6975 ns]
Found 2849 outliers among 100000 measurements (2.85%)
554 (0.55%) high mild
2295 (2.29%) high severe
samplei16/Pcg32/distr time: [1.2458 ns 1.2460 ns 1.2461 ns]
Found 4853 outliers among 100000 measurements (4.85%)
453 (0.45%) high mild
4400 (4.40%) high severe
samplei16/Pcg64/single time: [1.9074 ns 1.9076 ns 1.9078 ns]
Found 3419 outliers among 100000 measurements (3.42%)
14 (0.01%) low mild
1315 (1.31%) high mild
2090 (2.09%) high severe
samplei16/Pcg64/distr time: [1.4340 ns 1.4342 ns 1.4345 ns]
Found 34978 outliers among 100000 measurements (34.98%)
22432 (22.43%) low severe
12546 (12.55%) high severe

samplei32/SmallRng/single
time: [4.9445 ns 4.9550 ns 4.9655 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high severe
Benchmarking samplei32/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 6.0s, enable flat sampling, or reduce sample count to 50180.
samplei32/SmallRng/distr
time: [1.8612 ns 1.8700 ns 1.8791 ns]
Found 8951 outliers among 100000 measurements (8.95%)
5188 (5.19%) high mild
3763 (3.76%) high severe
samplei32/ChaCha8Rng/single
time: [5.8213 ns 5.8339 ns 5.8463 ns]
samplei32/ChaCha8Rng/distr
time: [2.3517 ns 2.3605 ns 2.3697 ns]
Found 8845 outliers among 100000 measurements (8.85%)
5780 (5.78%) high mild
3065 (3.06%) high severe
samplei32/Pcg32/single time: [4.7795 ns 4.7899 ns 4.8005 ns]
Found 3 outliers among 100000 measurements (0.00%)
3 (0.00%) high severe
samplei32/Pcg32/distr time: [2.0956 ns 2.1029 ns 2.1099 ns]
Found 9127 outliers among 100000 measurements (9.13%)
5384 (5.38%) high mild
3743 (3.74%) high severe
samplei32/Pcg64/single time: [5.4698 ns 5.4821 ns 5.4942 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei32/Pcg64/distr time: [2.5071 ns 2.5155 ns 2.5239 ns]
Found 8743 outliers among 100000 measurements (8.74%)
5745 (5.75%) high mild
2998 (3.00%) high severe

samplei64/SmallRng/single
time: [5.9268 ns 5.9361 ns 5.9454 ns]
Found 2 outliers among 100000 measurements (0.00%)
2 (0.00%) high mild
samplei64/SmallRng/distr
time: [1.7516 ns 1.7579 ns 1.7644 ns]
Found 9262 outliers among 100000 measurements (9.26%)
5356 (5.36%) high mild
3906 (3.91%) high severe
samplei64/ChaCha8Rng/single
time: [7.8579 ns 7.8709 ns 7.8840 ns]
Found 3 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
2 (0.00%) high severe
samplei64/ChaCha8Rng/distr
time: [3.5666 ns 3.5778 ns 3.5892 ns]
Found 8734 outliers among 100000 measurements (8.73%)
6057 (6.06%) high mild
2677 (2.68%) high severe
samplei64/Pcg32/single time: [7.1110 ns 7.1241 ns 7.1368 ns]
samplei64/Pcg32/distr time: [2.9155 ns 2.9241 ns 2.9327 ns]
Found 9162 outliers among 100000 measurements (9.16%)
5522 (5.52%) high mild
3640 (3.64%) high severe
samplei64/Pcg64/single time: [6.6004 ns 6.6123 ns 6.6247 ns]
Found 62 outliers among 100000 measurements (0.06%)
62 (0.06%) high mild
samplei64/Pcg64/distr time: [2.5028 ns 2.5110 ns 2.5195 ns]
Found 9183 outliers among 100000 measurements (9.18%)
4994 (4.99%) high mild
4189 (4.19%) high severe

samplei128/SmallRng/single
time: [11.482 ns 11.496 ns 11.510 ns]
Found 185 outliers among 100000 measurements (0.18%)
185 (0.18%) high mild
samplei128/SmallRng/distr
time: [5.6678 ns 5.6780 ns 5.6879 ns]
Found 8484 outliers among 100000 measurements (8.48%)
7430 (7.43%) high mild
1054 (1.05%) high severe
samplei128/ChaCha8Rng/single
time: [14.400 ns 14.419 ns 14.439 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei128/ChaCha8Rng/distr
time: [7.8090 ns 7.8233 ns 7.8374 ns]
Found 8322 outliers among 100000 measurements (8.32%)
6595 (6.59%) high mild
1727 (1.73%) high severe
samplei128/Pcg32/single time: [13.412 ns 13.430 ns 13.448 ns]
Found 15 outliers among 100000 measurements (0.01%)
15 (0.01%) high mild
samplei128/Pcg32/distr time: [7.1153 ns 7.1296 ns 7.1429 ns]
Found 8659 outliers among 100000 measurements (8.66%)
6130 (6.13%) high mild
2529 (2.53%) high severe
samplei128/Pcg64/single time: [12.365 ns 12.382 ns 12.399 ns]
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei128/Pcg64/distr time: [6.5060 ns 6.5183 ns 6.5302 ns]
Found 8991 outliers among 100000 measurements (8.99%)
5736 (5.74%) high mild
3255 (3.25%) high severe

New results (compared to baseline)

samplei8/SmallRng/single
time: [1.4978 ns 1.4982 ns 1.4986 ns]
change: [-21.946% -21.917% -21.892%] (p = 0.00 < 0.05)
Performance has improved.
Found 17805 outliers among 100000 measurements (17.80%)
1817 (1.82%) low severe
18 (0.02%) low mild
530 (0.53%) high mild
15440 (15.44%) high severe
samplei8/SmallRng/distr time: [1.8966 ns 1.8971 ns 1.8977 ns]
change: [+70.464% +70.566% +70.650%] (p = 0.00 < 0.05)
Performance has regressed.
Found 15393 outliers among 100000 measurements (15.39%)
10589 (10.59%) low severe
33 (0.03%) low mild
6 (0.01%) high mild
4765 (4.76%) high severe
samplei8/ChaCha8Rng/single
time: [2.0854 ns 2.0858 ns 2.0862 ns]
change: [-10.575% -10.497% -10.423%] (p = 0.00 < 0.05)
Performance has improved.
Found 593 outliers among 100000 measurements (0.59%)
4 (0.00%) low mild
474 (0.47%) high mild
115 (0.12%) high severe
samplei8/ChaCha8Rng/distr
time: [2.6350 ns 2.6357 ns 2.6364 ns]
change: [+54.015% +54.066% +54.109%] (p = 0.00 < 0.05)
Performance has regressed.
Found 16017 outliers among 100000 measurements (16.02%)
1602 (1.60%) low mild
10054 (10.05%) high mild
4361 (4.36%) high severe
samplei8/Pcg32/single time: [1.4994 ns 1.5000 ns 1.5005 ns]
change: [-11.112% -11.060% -11.008%] (p = 0.00 < 0.05)
Performance has improved.
Found 34661 outliers among 100000 measurements (34.66%)
1486 (1.49%) low severe
20299 (20.30%) low mild
68 (0.07%) high mild
12808 (12.81%) high severe
Benchmarking samplei8/Pcg32/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.3s, enable flat sampling, or reduce sample count to 53100.
samplei8/Pcg32/distr time: [1.0578 ns 1.0580 ns 1.0582 ns]
change: [-15.545% -15.496% -15.430%] (p = 0.00 < 0.05)
Performance has improved.
Found 25037 outliers among 100000 measurements (25.04%)
5278 (5.28%) low severe
113 (0.11%) low mild
2061 (2.06%) high mild
17585 (17.59%) high severe
samplei8/Pcg64/single time: [1.9000 ns 1.9004 ns 1.9009 ns]
change: [-10.787% -10.738% -10.688%] (p = 0.00 < 0.05)
Performance has improved.
Found 4558 outliers among 100000 measurements (4.56%)
343 (0.34%) high mild
4215 (4.21%) high severe
samplei8/Pcg64/distr time: [1.4648 ns 1.4649 ns 1.4651 ns]
change: [+2.0833% +2.0954% +2.1074%] (p = 0.00 < 0.05)
Performance has regressed.
Found 12686 outliers among 100000 measurements (12.69%)
7611 (7.61%) low severe
50 (0.05%) low mild
10 (0.01%) high mild
5015 (5.01%) high severe

samplei16/SmallRng/single
time: [1.6958 ns 1.6961 ns 1.6963 ns]
change: [-11.022% -11.008% -10.994%] (p = 0.00 < 0.05)
Performance has improved.
Found 3109 outliers among 100000 measurements (3.11%)
2 (0.00%) high mild
3107 (3.11%) high severe
samplei16/SmallRng/distr
time: [1.9426 ns 1.9432 ns 1.9437 ns]
change: [+81.022% +81.160% +81.268%] (p = 0.00 < 0.05)
Performance has regressed.
Found 30150 outliers among 100000 measurements (30.15%)
12940 (12.94%) low severe
17210 (17.21%) high severe
samplei16/ChaCha8Rng/single
time: [1.8547 ns 1.8553 ns 1.8559 ns]
change: [-9.1088% -9.0724% -9.0349%] (p = 0.00 < 0.05)
Performance has improved.
Found 14926 outliers among 100000 measurements (14.93%)
8618 (8.62%) low mild
2771 (2.77%) high mild
3537 (3.54%) high severe
samplei16/ChaCha8Rng/distr
time: [2.6924 ns 2.6935 ns 2.6947 ns]
change: [+50.625% +50.699% +50.754%] (p = 0.00 < 0.05)
Performance has regressed.
Found 4189 outliers among 100000 measurements (4.19%)
974 (0.97%) high mild
3215 (3.21%) high severe
samplei16/Pcg32/single time: [1.5690 ns 1.5693 ns 1.5696 ns]
change: [-7.5591% -7.5113% -7.4643%] (p = 0.00 < 0.05)
Performance has improved.
Found 10941 outliers among 100000 measurements (10.94%)
1206 (1.21%) low severe
15 (0.01%) low mild
91 (0.09%) high mild
9629 (9.63%) high severe
Benchmarking samplei16/Pcg32/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.2s, enable flat sampling, or reduce sample count to 53730.
samplei16/Pcg32/distr time: [1.0367 ns 1.0369 ns 1.0370 ns]
change: [-16.514% -16.471% -16.428%] (p = 0.00 < 0.05)
Performance has improved.
Found 7284 outliers among 100000 measurements (7.28%)
9 (0.01%) low severe
2537 (2.54%) high mild
4738 (4.74%) high severe
samplei16/Pcg64/single time: [1.7850 ns 1.7854 ns 1.7857 ns]
change: [-6.4245% -6.4057% -6.3846%] (p = 0.00 < 0.05)
Performance has improved.
Found 5112 outliers among 100000 measurements (5.11%)
38 (0.04%) high mild
5074 (5.07%) high severe
samplei16/Pcg64/distr time: [1.4695 ns 1.4695 ns 1.4696 ns]
change: [+2.4432% +2.4613% +2.4796%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3202 outliers among 100000 measurements (3.20%)
12 (0.01%) high mild
3190 (3.19%) high severe

samplei32/SmallRng/single
time: [2.9643 ns 2.9717 ns 2.9790 ns]
change: [-40.218% -40.027% -39.812%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei32/SmallRng/distr
time: [2.1418 ns 2.1468 ns 2.1520 ns]
change: [+14.158% +14.625% +15.127%] (p = 0.00 < 0.05)
Performance has regressed.
samplei32/ChaCha8Rng/single
time: [3.4354 ns 3.4436 ns 3.4519 ns]
change: [-41.166% -40.973% -40.797%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/ChaCha8Rng/distr
time: [4.3137 ns 4.3205 ns 4.3273 ns]
change: [+82.252% +83.029% +83.739%] (p = 0.00 < 0.05)
Performance has regressed.
Found 17 outliers among 100000 measurements (0.02%)
17 (0.02%) high mild
samplei32/Pcg32/single time: [2.8065 ns 2.8139 ns 2.8213 ns]
change: [-41.459% -41.254% -41.074%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg32/distr time: [2.2714 ns 2.2769 ns 2.2826 ns]
change: [+7.8178% +8.2767% +8.7563%] (p = 0.00 < 0.05)
Performance has regressed.
samplei32/Pcg64/single time: [3.5373 ns 3.5460 ns 3.5546 ns]
change: [-35.538% -35.317% -35.115%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg64/distr time: [2.8174 ns 2.8238 ns 2.8304 ns]
change: [+11.798% +12.255% +12.692%] (p = 0.00 < 0.05)
Performance has regressed.

samplei64/SmallRng/single
time: [4.4161 ns 4.4223 ns 4.4285 ns]
change: [-25.650% -25.501% -25.338%] (p = 0.00 < 0.05)
Performance has improved.
Found 8 outliers among 100000 measurements (0.01%)
8 (0.01%) high mild
samplei64/SmallRng/distr
time: [1.9359 ns 1.9407 ns 1.9454 ns]
change: [+9.9079% +10.397% +10.890%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high severe
samplei64/ChaCha8Rng/single
time: [5.7185 ns 5.7264 ns 5.7344 ns]
change: [-27.418% -27.246% -27.100%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei64/ChaCha8Rng/distr
time: [4.0937 ns 4.1020 ns 4.1103 ns]
change: [+14.257% +14.652% +15.085%] (p = 0.00 < 0.05)
Performance has regressed.
samplei64/Pcg32/single time: [4.9361 ns 4.9447 ns 4.9533 ns]
change: [-30.763% -30.592% -30.414%] (p = 0.00 < 0.05)
Performance has improved.
Found 48 outliers among 100000 measurements (0.05%)
44 (0.04%) high mild
4 (0.00%) high severe
samplei64/Pcg32/distr time: [3.3706 ns 3.3777 ns 3.3847 ns]
change: [+15.104% +15.511% +15.898%] (p = 0.00 < 0.05)
Performance has regressed.
samplei64/Pcg64/single time: [4.7009 ns 4.7079 ns 4.7150 ns]
change: [-28.960% -28.801% -28.651%] (p = 0.00 < 0.05)
Performance has improved.
samplei64/Pcg64/distr time: [2.8317 ns 2.8380 ns 2.8442 ns]
change: [+12.584% +13.020% +13.467%] (p = 0.00 < 0.05)
Performance has regressed.
Found 3 outliers among 100000 measurements (0.00%)
3 (0.00%) high severe

samplei128/SmallRng/single
time: [9.6697 ns 9.6778 ns 9.6860 ns]
change: [-15.933% -15.813% -15.695%] (p = 0.00 < 0.05)
Performance has improved.
Found 20 outliers among 100000 measurements (0.02%)
20 (0.02%) high mild
samplei128/SmallRng/distr
time: [6.7277 ns 6.7370 ns 6.7460 ns]
change: [+18.371% +18.650% +18.908%] (p = 0.00 < 0.05)
Performance has regressed.
Found 95 outliers among 100000 measurements (0.10%)
95 (0.10%) high mild
samplei128/ChaCha8Rng/single
time: [12.092 ns 12.107 ns 12.121 ns]
change: [-16.186% -16.036% -15.883%] (p = 0.00 < 0.05)
Performance has improved.
Found 5 outliers among 100000 measurements (0.01%)
3 (0.00%) high mild
2 (0.00%) high severe
samplei128/ChaCha8Rng/distr
time: [8.8107 ns 8.8237 ns 8.8367 ns]
change: [+12.524% +12.787% +13.051%] (p = 0.00 < 0.05)
Performance has regressed.
samplei128/Pcg32/single time: [10.177 ns 10.190 ns 10.203 ns]
change: [-24.246% -24.126% -23.979%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei128/Pcg32/distr time: [8.3067 ns 8.3188 ns 8.3311 ns]
change: [+16.386% +16.681% +16.965%] (p = 0.00 < 0.05)
Performance has regressed.
Found 100 outliers among 100000 measurements (0.10%)
100 (0.10%) high mild
samplei128/Pcg64/single time: [10.039 ns 10.047 ns 10.056 ns]
change: [-18.984% -18.858% -18.732%] (p = 0.00 < 0.05)
Performance has improved.
Found 47 outliers among 100000 measurements (0.05%)
47 (0.05%) high mild
samplei128/Pcg64/distr time: [7.0425 ns 7.0534 ns 7.0643 ns]
change: [+7.9572% +8.2093% +8.4552%] (p = 0.00 < 0.05)
Performance has regressed.
Found 19 outliers among 100000 measurements (0.02%)
18 (0.02%) high mild
1 (0.00%) high severe

Looks like a decent improvement for single-sampling, but considerably worse for distribution sampling.
New results (unbiased feature)

samplei8/SmallRng/single
time: [1.9076 ns 1.9082 ns 1.9089 ns]
change: [-0.5836% -0.5448% -0.5079%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 22388 outliers among 100000 measurements (22.39%)
18016 (18.02%) low severe
44 (0.04%) low mild
22 (0.02%) high mild
4306 (4.31%) high severe
Benchmarking samplei8/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.6s, enable flat sampling, or reduce sample count to 51640.
samplei8/SmallRng/distr time: [1.1186 ns 1.1188 ns 1.1189 ns]
change: [+1.0264% +1.1032% +1.1788%] (p = 0.00 < 0.05)
Performance has regressed.
Found 7570 outliers among 100000 measurements (7.57%)
186 (0.19%) low severe
3 (0.00%) low mild
2649 (2.65%) high mild
4732 (4.73%) high severe
samplei8/ChaCha8Rng/single
time: [2.0471 ns 2.0477 ns 2.0483 ns]
change: [-12.214% -12.133% -12.059%] (p = 0.00 < 0.05)
Performance has improved.
Found 14935 outliers among 100000 measurements (14.94%)
2 (0.00%) low severe
642 (0.64%) low mild
5812 (5.81%) high mild
8479 (8.48%) high severe
samplei8/ChaCha8Rng/distr
time: [1.7169 ns 1.7173 ns 1.7176 ns]
change: [+0.3602% +0.3809% +0.4034%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4558 outliers among 100000 measurements (4.56%)
138 (0.14%) low mild
2210 (2.21%) high mild
2210 (2.21%) high severe
samplei8/Pcg32/single time: [1.7169 ns 1.7175 ns 1.7181 ns]
change: [+1.7780% +1.8401% +1.8967%] (p = 0.00 < 0.05)
Performance has regressed.
Found 30316 outliers among 100000 measurements (30.32%)
6836 (6.84%) low mild
920 (0.92%) high mild
22560 (22.56%) high severe
samplei8/Pcg32/distr time: [1.2623 ns 1.2624 ns 1.2626 ns]
change: [+0.6649% +0.6766% +0.6904%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 27891 outliers among 100000 measurements (27.89%)
19384 (19.38%) low severe
75 (0.07%) low mild
7 (0.01%) high mild
8425 (8.43%) high severe
samplei8/Pcg64/single time: [2.0078 ns 2.0088 ns 2.0097 ns]
change: [-5.7091% -5.6495% -5.5883%] (p = 0.00 < 0.05)
Performance has improved.
Found 16243 outliers among 100000 measurements (16.24%)
8554 (8.55%) high mild
7689 (7.69%) high severe
samplei8/Pcg64/distr time: [1.4221 ns 1.4222 ns 1.4224 ns]
change: [-0.8919% -0.8796% -0.8670%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6857 outliers among 100000 measurements (6.86%)
17 (0.02%) low severe
3 (0.00%) high mild
6837 (6.84%) high severe

samplei16/SmallRng/single
time: [1.7045 ns 1.7048 ns 1.7052 ns]
change: [-10.573% -10.550% -10.531%] (p = 0.00 < 0.05)
Performance has improved.
Found 45291 outliers among 100000 measurements (45.29%)
20430 (20.43%) low severe
24861 (24.86%) high severe
Benchmarking samplei16/SmallRng/distr: Warming up for 1.0000 s
Warning: Unable to complete 100000 samples in 3.0s. You may wish to increase target time to 5.5s, enable flat sampling, or reduce sample count to 52320.
samplei16/SmallRng/distr
time: [1.0812 ns 1.0815 ns 1.0819 ns]
change: [+0.5067% +0.6014% +0.6705%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1152 outliers among 100000 measurements (1.15%)
290 (0.29%) high mild
862 (0.86%) high severe
samplei16/ChaCha8Rng/single
time: [1.9882 ns 1.9886 ns 1.9890 ns]
change: [-2.5715% -2.5408% -2.5091%] (p = 0.00 < 0.05)
Performance has improved.
Found 8702 outliers among 100000 measurements (8.70%)
301 (0.30%) low mild
4494 (4.49%) high mild
3907 (3.91%) high severe
samplei16/ChaCha8Rng/distr
time: [1.7356 ns 1.7360 ns 1.7363 ns]
change: [-2.8998% -2.8740% -2.8467%] (p = 0.00 < 0.05)
Performance has improved.
Found 13632 outliers among 100000 measurements (13.63%)
2 (0.00%) low mild
12538 (12.54%) high mild
1092 (1.09%) high severe
samplei16/Pcg32/single time: [1.5160 ns 1.5162 ns 1.5164 ns]
change: [-10.682% -10.638% -10.594%] (p = 0.00 < 0.05)
Performance has improved.
Found 26466 outliers among 100000 measurements (26.47%)
13076 (13.08%) low severe
217 (0.22%) low mild
229 (0.23%) high mild
12944 (12.94%) high severe
samplei16/Pcg32/distr time: [1.2681 ns 1.2684 ns 1.2686 ns]
change: [+1.7735% +1.7986% +1.8233%] (p = 0.00 < 0.05)
Performance has regressed.
Found 23148 outliers among 100000 measurements (23.15%)
208 (0.21%) low severe
1 (0.00%) low mild
84 (0.08%) high mild
22855 (22.86%) high severe
samplei16/Pcg64/single time: [1.9071 ns 1.9077 ns 1.9083 ns]
change: [-0.0228% +0.0049% +0.0417%] (p = 0.74 > 0.05)
No change in performance detected.
Found 36106 outliers among 100000 measurements (36.11%)
18848 (18.85%) low severe
1174 (1.17%) low mild
88 (0.09%) high mild
15996 (16.00%) high severe
samplei16/Pcg64/distr time: [1.4509 ns 1.4511 ns 1.4514 ns]
change: [+1.1517% +1.1772% +1.2000%] (p = 0.00 < 0.05)
Performance has regressed.
Found 18304 outliers among 100000 measurements (18.30%)
5057 (5.06%) low severe
13 (0.01%) low mild
21 (0.02%) high mild
13213 (13.21%) high severe

samplei32/SmallRng/single
time: [3.6718 ns 3.6817 ns 3.6919 ns]
change: [-25.935% -25.697% -25.453%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/SmallRng/distr
time: [1.8915 ns 1.8981 ns 1.9048 ns]
change: [+0.8519% +1.3457% +1.7827%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8712 outliers among 100000 measurements (8.71%)
5349 (5.35%) high mild
3363 (3.36%) high severe
samplei32/ChaCha8Rng/single
time: [4.3550 ns 4.3663 ns 4.3772 ns]
change: [-25.412% -25.156% -24.906%] (p = 0.00 < 0.05)
Performance has improved.
Found 4 outliers among 100000 measurements (0.00%)
4 (0.00%) high mild
samplei32/ChaCha8Rng/distr
time: [2.7254 ns 2.7344 ns 2.7433 ns]
change: [+15.251% +15.837% +16.393%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8503 outliers among 100000 measurements (8.50%)
6055 (6.05%) high mild
2448 (2.45%) high severe
samplei32/Pcg32/single time: [3.6329 ns 3.6435 ns 3.6541 ns]
change: [-24.192% -23.935% -23.661%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg32/distr time: [2.1075 ns 2.1145 ns 2.1215 ns]
change: [+0.0819% +0.5507% +1.0484%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 9153 outliers among 100000 measurements (9.15%)
5330 (5.33%) high mild
3823 (3.82%) high severe
samplei32/Pcg64/single time: [4.2954 ns 4.3080 ns 4.3205 ns]
change: [-21.711% -21.416% -21.098%] (p = 0.00 < 0.05)
Performance has improved.
samplei32/Pcg64/distr time: [2.4684 ns 2.4771 ns 2.4858 ns]
change: [-1.9894% -1.5294% -1.0481%] (p = 0.00 < 0.05)
Performance has improved.
Found 9636 outliers among 100000 measurements (9.64%)
5199 (5.20%) high mild
4437 (4.44%) high severe

samplei64/SmallRng/single
time: [5.2532 ns 5.2629 ns 5.2723 ns]
change: [-11.567% -11.342% -11.128%] (p = 0.00 < 0.05)
Performance has improved.
samplei64/SmallRng/distr
time: [1.7670 ns 1.7737 ns 1.7806 ns]
change: [+0.3709% +0.8994% +1.5000%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 9659 outliers among 100000 measurements (9.66%)
5357 (5.36%) high mild
4302 (4.30%) high severe
samplei64/ChaCha8Rng/single
time: [6.7406 ns 6.7520 ns 6.7634 ns]
change: [-14.394% -14.216% -14.015%] (p = 0.00 < 0.05)
Performance has improved.
samplei64/ChaCha8Rng/distr
time: [3.5487 ns 3.5596 ns 3.5705 ns]
change: [-0.9413% -0.5079% -0.0649%] (p = 0.02 < 0.05)
Change within noise threshold.
Found 8781 outliers among 100000 measurements (8.78%)
5848 (5.85%) high mild
2933 (2.93%) high severe
samplei64/Pcg32/single time: [5.7866 ns 5.7992 ns 5.8118 ns]
change: [-18.870% -18.598% -18.380%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high mild
samplei64/Pcg32/distr time: [3.2564 ns 3.2654 ns 3.2744 ns]
change: [+11.243% +11.670% +12.117%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8838 outliers among 100000 measurements (8.84%)
5754 (5.75%) high mild
3084 (3.08%) high severe
samplei64/Pcg64/single time: [5.6715 ns 5.6825 ns 5.6933 ns]
change: [-14.281% -14.061% -13.872%] (p = 0.00 < 0.05)
Performance has improved.
Found 1 outliers among 100000 measurements (0.00%)
1 (0.00%) high severe
samplei64/Pcg64/distr time: [2.5092 ns 2.5177 ns 2.5262 ns]
change: [-0.1602% +0.2676% +0.7783%] (p = 0.26 > 0.05)
No change in performance detected.
Found 9574 outliers among 100000 measurements (9.57%)
5084 (5.08%) high mild
4490 (4.49%) high severe

samplei128/SmallRng/single
time: [10.410 ns 10.423 ns 10.436 ns]
change: [-9.4864% -9.3321% -9.1484%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100000 measurements (0.01%)
10 (0.01%) high mild
samplei128/SmallRng/distr
time: [5.6088 ns 5.6180 ns 5.6275 ns]
change: [-1.2694% -1.0568% -0.8129%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8739 outliers among 100000 measurements (8.74%)
5882 (5.88%) high mild
2857 (2.86%) high severe
samplei128/ChaCha8Rng/single
time: [12.332 ns 12.349 ns 12.366 ns]
change: [-14.509% -14.355% -14.176%] (p = 0.00 < 0.05)
Performance has improved.
Found 13 outliers among 100000 measurements (0.01%)
13 (0.01%) high mild
samplei128/ChaCha8Rng/distr
time: [7.9089 ns 7.9233 ns 7.9378 ns]
change: [+0.9915% +1.2787% +1.5530%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8295 outliers among 100000 measurements (8.29%)
6697 (6.70%) high mild
1598 (1.60%) high severe
samplei128/Pcg32/single time: [11.405 ns 11.422 ns 11.440 ns]
change: [-15.114% -14.949% -14.783%] (p = 0.00 < 0.05)
Performance has improved.
samplei128/Pcg32/distr time: [7.4301 ns 7.4443 ns 7.4582 ns]
change: [+4.1310% +4.4150% +4.7024%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8602 outliers among 100000 measurements (8.60%)
6161 (6.16%) high mild
2441 (2.44%) high severe
samplei128/Pcg64/single time: [10.496 ns 10.511 ns 10.526 ns]
change: [-15.273% -15.113% -14.953%] (p = 0.00 < 0.05)
Performance has improved.
samplei128/Pcg64/distr time: [6.3761 ns 6.3885 ns 6.4010 ns]
change: [-2.2301% -1.9907% -1.7150%] (p = 0.00 < 0.05)
Performance has improved.
Found 8753 outliers among 100000 measurements (8.75%)
5805 (5.80%) high mild
2948 (2.95%) high severe

These look not-quite-as-good for single-sampling (but still an improvement), and significantly better for distribution sampling...

...I hate micro-benchmarking (see results in #1286). Looks like we should just use Lemire's method for distribution sampling in all cases.

@vks
Copy link
Collaborator

vks commented Feb 17, 2023

Looks like we should just use Lemire's method for distribution sampling in all cases.

Agreed, especially if it is less biased.

@dhardy
Copy link
Member Author

dhardy commented Feb 18, 2023

Bench re-runs (lower clock speed, better formatted): results.ods Highlights:

      biased vs base unbiased vs base unbiased vs biased
samplei8 ChaCha8Rng distr 57.00% 0.10% -36.30%
samplei8 Pcg32 distr -16.70% 0.50% 20.60%
samplei8 Pcg64 distr 2.10% 0.10% -2.00%
samplei8 SmallRng distr 70.60% 1.60% -40.50%
samplei16 ChaCha8Rng distr 49.00% -0.20% -33.00%
samplei16 Pcg32 distr -16.60% 0.00% 20.00%
samplei16 Pcg64 distr 2.40% 0.30% -2.10%
samplei16 SmallRng distr 74.20% -2.90% -44.30%
samplei32 ChaCha8Rng distr 80.80% 13.50% -37.20%
samplei32 Pcg32 distr 8.20% 0.30% -7.30%
samplei32 Pcg64 distr 11.60% -0.80% -11.10%
samplei32 SmallRng distr 12.50% -0.10% -11.20%
samplei64 ChaCha8Rng distr 13.30% -0.50% -12.20%
samplei64 Pcg32 distr 14.60% 11.30% -2.90%
samplei64 Pcg64 distr 12.40% -0.40% -11.40%
samplei64 SmallRng distr 11.70% 0.40% -10.10%
samplei128 ChaCha8Rng distr 12.50% -1.40% -12.30%
samplei128 Pcg32 distr 14.40% 2.00% -10.90%
samplei128 Pcg64 distr 9.30% -2.10% -10.40%
samplei128 SmallRng distr 20.00% -0.80% -17.30%
samplei8 ChaCha8Rng single -9.70% -11.10% -1.60%
samplei8 Pcg32 single -12.80% -0.10% 14.60%
samplei8 Pcg64 single -8.80% -5.60% 3.60%
samplei8 SmallRng single -21.50% 0.00% 27.40%
samplei16 ChaCha8Rng single -9.40% -0.80% 9.50%
samplei16 Pcg32 single -7.30% -12.10% -5.20%
samplei16 Pcg64 single -7.80% -1.80% 6.50%
samplei16 SmallRng single -11.00% -11.10% -0.10%
samplei32 ChaCha8Rng single -41.70% -24.90% 28.90%
samplei32 Pcg32 single -42.20% -25.30% 29.20%
samplei32 Pcg64 single -36.50% -22.20% 22.60%
samplei32 SmallRng single -40.20% -25.10% 25.30%
samplei64 ChaCha8Rng single -26.70% -12.70% 19.00%
samplei64 Pcg32 single -29.50% -18.00% 16.30%
samplei64 Pcg64 single -28.80% -15.60% 18.60%
samplei64 SmallRng single -26.20% -12.40% 18.80%
samplei128 ChaCha8Rng single -17.50% -14.80% 3.20%
samplei128 Pcg32 single -25.50% -16.60% 11.90%
samplei128 Pcg64 single -20.40% -16.00% 5.60%
samplei128 SmallRng single -14.60% -9.90% 5.50%

So, yes, this supports the idea that we should always use Lemire's method for distribution sampling.

@dhardy
Copy link
Member Author

dhardy commented Feb 18, 2023

Remaining question: whether to keep both biased and unbiased options for single-sampling (using a feature flag). See #494. I am inclined to keep this under the following conditions:

  • Biased is the default (otherwise it is an optimisation that will likely get little use, so why bother).
  • Only the default option is tested by value-stability tests. (Currently achieved by only build-testing with "unbiased" enabled.)

There is not a strong rationale for this however, we could reduce to just one implementation (either).

@dhardy dhardy marked this pull request as ready for review February 20, 2023 10:07
@dhardy
Copy link
Member Author

dhardy commented Feb 20, 2023

I'm inclined to merge this as-is. Review please, maybe @TheIronBorn or @vks?

src/distributions/uniform.rs Outdated Show resolved Hide resolved
src/distributions/uniform.rs Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
@dhardy
Copy link
Member Author

dhardy commented Feb 21, 2023

Thanks @TheIronBorn. Updated.

@dhardy
Copy link
Member Author

dhardy commented Mar 23, 2023

I'd like to merge this but am still waiting for a reviewer to approve (policy requires review is not by the author). @TheIronBorn you last reviewed this; would you mind revisiting?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

New publications on integer range sampling and shuffling
3 participants