-
Notifications
You must be signed in to change notification settings - Fork 430
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sampling benches in rand* 0.9.0-alpha.0 #1409
Comments
It would be useful to know what CPU you ran this on, RUSTFLAGS/etc. Note also if you didn't already that Canon's method is only used for |
I did a lot of benchmarking on this, but it was a while back. From what I remember, in many cases there was not a single best option. Also, as @TheIronBorn says, results are likely dependant on your CPU architecture. See my merge PR with some benchmarks and links to others: #1287 |
$ rustc --print target-cpus | head -n2
Available CPUs for this target:
native - Select the CPU of the current host (currently znver3).
$ rustup show -v
Default host: x86_64-unknown-linux-gnu
rustup home: /home/thell/.rustup
installed toolchains
--------------------
stable-x86_64-pc-windows-gnu
(rustc does not exist)
stable-x86_64-unknown-linux-gnu (default)
rustc 1.76.0 (07dca489a 2024-02-04)
nightly-x86_64-unknown-linux-gnu
rustc 1.78.0-nightly (9c3ad802d 2024-03-07)
active toolchain
----------------
stable-x86_64-unknown-linux-gnu (default)
rustc 1.76.0 (07dca489a 2024-02-04) $ git pull
$ git checkout dba696e9 -b pre-alpha
$ git reset --hard && git clean -fdx
$ RUSTFLAGS='-C target-cpu=native' rustup run nightly cargo bench --bench uniform --features small_rng -- --save-baseline pre-alpha $ git reset --hard
$ git checkout -b alpha 0.9.0-alpha.0
$ ls ./target/c*
samplei128 samplei16 samplei32 samplei64 samplei8
$ RUSTFLAGS='-C target-cpu=native' rustup run nightly cargo bench --bench uniform --features small_rng -- --save-baseline alpha RUSTFLAGS='-C target-cpu=native' rustup run nightly cargo bench --bench uniform --features small_rng -- --load-baseline pre-alpha --baseline
alpha I wasn't sure how to have the console output with color shown so I just put them all in here. Almost all of the i32 and i64 tests are within default noise threshold with the exception being ChaCha8 which regressed. Perhaps something was changed at some previous point and I should re-test using the current stable release tag? ResultsRunning benches/uniform.rs (target/release/deps/uniform-1dbdf69be6706036)
samplei8/SmallRng/single
time: [1.9008 ns 1.9021 ns 1.9037 ns]
change: [+0.2405% +0.3151% +0.4014%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6317 outliers among 100000 measurements (6.32%)
3300 (3.30%) high mild
3017 (3.02%) high severe
samplei8/SmallRng/distr time: [1.0950 ns 1.0956 ns 1.0964 ns]
change: [-0.0325% +0.1498% +0.3289%] (p = 0.12 > 0.05)
No change in performance detected.
Found 16083 outliers among 100000 measurements (16.08%)
3394 (3.39%) high mild
12689 (12.69%) high severe
samplei8/ChaCha8Rng/single
time: [2.0377 ns 2.0382 ns 2.0386 ns]
change: [-1.4719% -1.4391% -1.4090%] (p = 0.00 < 0.05)
Performance has improved.
Found 3413 outliers among 100000 measurements (3.41%)
2265 (2.27%) high mild
1148 (1.15%) high severe
samplei8/ChaCha8Rng/distr
time: [1.7296 ns 1.7299 ns 1.7302 ns]
change: [-1.5488% -1.5209% -1.4961%] (p = 0.00 < 0.05)
Performance has improved.
Found 1695 outliers among 100000 measurements (1.70%)
22 (0.02%) low mild
380 (0.38%) high mild
1293 (1.29%) high severe
samplei8/Pcg32/single time: [1.6675 ns 1.6686 ns 1.6704 ns]
change: [+9.3987% +9.4737% +9.6010%] (p = 0.00 < 0.05)
Performance has regressed.
Found 1686 outliers among 100000 measurements (1.69%)
201 (0.20%) high mild
1485 (1.49%) high severe
samplei8/Pcg32/distr time: [1.0939 ns 1.0941 ns 1.0942 ns]
change: [+0.1939% +0.3701% +0.5310%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 3666 outliers among 100000 measurements (3.67%)
998 (1.00%) high mild
2668 (2.67%) high severe
samplei8/Pcg64/single time: [1.9474 ns 1.9489 ns 1.9506 ns]
change: [+1.4925% +1.5704% +1.6890%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8212 outliers among 100000 measurements (8.21%)
6587 (6.59%) high mild
1625 (1.62%) high severe
samplei8/Pcg64/distr time: [1.5320 ns 1.5324 ns 1.5328 ns]
change: [+0.2939% +0.3313% +0.3705%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 6196 outliers among 100000 measurements (6.20%)
3 (0.00%) high mild
6193 (6.19%) high severe
samplei16/SmallRng/single
time: [1.6374 ns 1.6379 ns 1.6384 ns]
change: [-11.384% -11.323% -11.275%] (p = 0.00 < 0.05)
Performance has improved.
Found 1746 outliers among 100000 measurements (1.75%)
136 (0.14%) high mild
1610 (1.61%) high severe
samplei16/SmallRng/distr
time: [1.1045 ns 1.1047 ns 1.1050 ns]
change: [+0.3058% +0.4961% +0.6822%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 4429 outliers among 100000 measurements (4.43%)
1074 (1.07%) high mild
3355 (3.35%) high severe
samplei16/ChaCha8Rng/single
time: [1.8959 ns 1.8963 ns 1.8966 ns]
change: [+1.1191% +1.1439% +1.1704%] (p = 0.00 < 0.05)
Performance has regressed.
Found 19919 outliers among 100000 measurements (19.92%)
18434 (18.43%) high mild
1485 (1.49%) high severe
samplei16/ChaCha8Rng/distr
time: [1.7354 ns 1.7367 ns 1.7386 ns]
change: [+0.1916% +0.2717% +0.3957%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 2411 outliers among 100000 measurements (2.41%)
15 (0.01%) low mild
685 (0.69%) high mild
1711 (1.71%) high severe
samplei16/Pcg32/single time: [1.3995 ns 1.3997 ns 1.4000 ns]
change: [-3.1237% -3.0960% -3.0672%] (p = 0.00 < 0.05)
Performance has improved.
Found 24995 outliers among 100000 measurements (25.00%)
623 (0.62%) low severe
247 (0.25%) low mild
2613 (2.61%) high mild
21512 (21.51%) high severe
samplei16/Pcg32/distr time: [1.0966 ns 1.0968 ns 1.0970 ns]
change: [+0.0429% +0.3030% +0.6548%] (p = 0.04 < 0.05)
Change within noise threshold.
Found 4469 outliers among 100000 measurements (4.47%)
1438 (1.44%) high mild
3031 (3.03%) high severe
samplei16/Pcg64/single time: [1.9126 ns 1.9130 ns 1.9135 ns]
change: [-0.4449% -0.4158% -0.3866%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 1734 outliers among 100000 measurements (1.73%)
202 (0.20%) high mild
1532 (1.53%) high severe
samplei16/Pcg64/distr time: [1.5354 ns 1.5359 ns 1.5364 ns]
change: [+0.9211% +0.9628% +1.0018%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 10652 outliers among 100000 measurements (10.65%)
1393 (1.39%) high mild
9259 (9.26%) high severe
samplei32/SmallRng/single
time: [3.9972 ns 4.0033 ns 4.0097 ns]
change: [-0.5468% -0.3320% -0.1152%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 15 outliers among 100000 measurements (0.01%)
15 (0.01%) high mild
samplei32/SmallRng/distr
time: [2.0955 ns 2.1032 ns 2.1107 ns]
change: [+0.1780% +0.6765% +1.1834%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 8459 outliers among 100000 measurements (8.46%)
6355 (6.36%) high mild
2104 (2.10%) high severe
samplei32/ChaCha8Rng/single
time: [3.6114 ns 3.6196 ns 3.6277 ns]
change: [+0.5737% +0.8999% +1.2095%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7 outliers among 100000 measurements (0.01%)
7 (0.01%) high mild
samplei32/ChaCha8Rng/distr
time: [2.6196 ns 2.6296 ns 2.6392 ns]
change: [+6.6041% +7.1811% +7.7045%] (p = 0.00 < 0.05)
Performance has regressed.
Found 8818 outliers among 100000 measurements (8.82%)
5672 (5.67%) high mild
3146 (3.15%) high severe
samplei32/Pcg32/single time: [3.0424 ns 3.0501 ns 3.0582 ns]
change: [-1.3766% -0.9890% -0.6601%] (p = 0.00 < 0.05)
Change within noise threshold.
samplei32/Pcg32/distr time: [2.1075 ns 2.1153 ns 2.1231 ns]
change: [-1.4113% -0.8950% -0.4133%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8456 outliers among 100000 measurements (8.46%)
6794 (6.79%) high mild
1662 (1.66%) high severe
samplei32/Pcg64/single time: [4.3433 ns 4.3531 ns 4.3632 ns]
change: [-0.1027% +0.2315% +0.5439%] (p = 0.15 > 0.05)
No change in performance detected.
Found 6 outliers among 100000 measurements (0.01%)
3 (0.00%) high mild
3 (0.00%) high severe
samplei32/Pcg64/distr time: [2.8218 ns 2.8318 ns 2.8423 ns]
change: [-0.1475% +0.3556% +0.8927%] (p = 0.18 > 0.05)
No change in performance detected.
Found 8683 outliers among 100000 measurements (8.68%)
5879 (5.88%) high mild
2804 (2.80%) high severe
samplei64/SmallRng/single
time: [4.7586 ns 4.7656 ns 4.7724 ns]
change: [-0.0581% +0.1328% +0.3318%] (p = 0.20 > 0.05)
No change in performance detected.
Found 22 outliers among 100000 measurements (0.02%)
22 (0.02%) high mild
samplei64/SmallRng/distr
time: [1.9820 ns 1.9894 ns 1.9969 ns]
change: [-0.2791% +0.2478% +0.8057%] (p = 0.36 > 0.05)
No change in performance detected.
Found 7729 outliers among 100000 measurements (7.73%)
7696 (7.70%) high mild
33 (0.03%) high severe
samplei64/ChaCha8Rng/single
time: [6.1432 ns 6.1521 ns 6.1609 ns]
change: [+2.1822% +2.3995% +2.5999%] (p = 0.00 < 0.05)
Performance has regressed.
Found 91 outliers among 100000 measurements (0.09%)
82 (0.08%) high mild
9 (0.01%) high severe
samplei64/ChaCha8Rng/distr
time: [3.3616 ns 3.3734 ns 3.3852 ns]
change: [+0.5329% +0.9489% +1.4765%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 8176 outliers among 100000 measurements (8.18%)
6897 (6.90%) high mild
1279 (1.28%) high severe
samplei64/Pcg32/single time: [5.1936 ns 5.2022 ns 5.2108 ns]
change: [-0.5869% -0.3398% -0.0981%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 37 outliers among 100000 measurements (0.04%)
37 (0.04%) high mild
samplei64/Pcg32/distr time: [3.1294 ns 3.1391 ns 3.1494 ns]
change: [-0.1468% +0.2478% +0.7038%] (p = 0.27 > 0.05)
No change in performance detected.
Found 8510 outliers among 100000 measurements (8.51%)
6389 (6.39%) high mild
2121 (2.12%) high severe
samplei64/Pcg64/single time: [5.2153 ns 5.2234 ns 5.2318 ns]
change: [+0.0131% +0.2179% +0.4687%] (p = 0.05 < 0.05)
Change within noise threshold.
Found 14 outliers among 100000 measurements (0.01%)
14 (0.01%) high mild
samplei64/Pcg64/distr time: [2.9215 ns 2.9317 ns 2.9420 ns]
change: [+0.1886% +0.7063% +1.2397%] (p = 0.01 < 0.05)
Change within noise threshold.
Found 8034 outliers among 100000 measurements (8.03%)
6741 (6.74%) high mild
1293 (1.29%) high severe
samplei128/SmallRng/single
time: [9.3967 ns 9.4054 ns 9.4141 ns]
change: [+0.5760% +0.7111% +0.8483%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 60 outliers among 100000 measurements (0.06%)
52 (0.05%) high mild
8 (0.01%) high severe
samplei128/SmallRng/distr
time: [4.0873 ns 4.0978 ns 4.1082 ns]
change: [-0.2938% +0.0917% +0.4264%] (p = 0.62 > 0.05)
No change in performance detected.
Found 8806 outliers among 100000 measurements (8.81%)
6068 (6.07%) high mild
2738 (2.74%) high severe
samplei128/ChaCha8Rng/single
time: [11.655 ns 11.669 ns 11.683 ns]
change: [-2.8671% -2.6973% -2.5594%] (p = 0.00 < 0.05)
Performance has improved.
Found 90 outliers among 100000 measurements (0.09%)
89 (0.09%) high mild
1 (0.00%) high severe
samplei128/ChaCha8Rng/distr
time: [6.3019 ns 6.3170 ns 6.3323 ns]
change: [-0.8811% -0.5615% -0.2195%] (p = 0.00 < 0.05)
Change within noise threshold.
Found 7936 outliers among 100000 measurements (7.94%)
7379 (7.38%) high mild
557 (0.56%) high severe
samplei128/Pcg32/single time: [9.9469 ns 9.9607 ns 9.9746 ns]
change: [-1.5890% -1.3931% -1.1947%] (p = 0.00 < 0.05)
Performance has improved.
Found 10 outliers among 100000 measurements (0.01%)
9 (0.01%) high mild
1 (0.00%) high severe
samplei128/Pcg32/distr time: [6.3234 ns 6.3372 ns 6.3508 ns]
change: [-0.4388% -0.1246% +0.1837%] (p = 0.43 > 0.05)
No change in performance detected.
Found 8658 outliers among 100000 measurements (8.66%)
5817 (5.82%) high mild
2841 (2.84%) high severe
samplei128/Pcg64/single time: [10.006 ns 10.017 ns 10.029 ns]
change: [-2.0306% -1.8609% -1.6901%] (p = 0.00 < 0.05)
Performance has improved.
Found 44 outliers among 100000 measurements (0.04%)
39 (0.04%) high mild
5 (0.01%) high severe
samplei128/Pcg64/distr time: [5.4523 ns 5.4664 ns 5.4804 ns]
change: [-0.0032% +0.3598% +0.7320%] (p = 0.06 > 0.05)
No change in performance detected.
Found 8410 outliers among 100000 measurements (8.41%)
6442 (6.44%) high mild
1968 (1.97%) high severe |
Yeah, those discussions and benches are why I was excited to give it a try with the uniform sample. |
Sorry, I should have said CPU micro-architecture. Not that I've seen enough data to draw any real conclusions about how the various methods perform for each. Try: Also, that's a narrow range of commits you picked — there don't appear to be any code changes. Yes, micro-benchmarks can be this inconsistent, unfortunately. |
processor : 0
vendor_id : AuthenticAMD
cpu family : 25
model : 80
model name : AMD Ryzen 7 5700G with Radeon Graphics
stepping : 0
microcode : 0xffffffff
cpu MHz : 3792.776
cache size : 512 KB
physical id : 0 Regarding the range of commits... oh boy, thank you for pointing that out. I bet that's what I did wrong, I was thinking the prepare alpha was the merge (that's what I get for not looking closer). I'll find where the |
Thanks again. That's more like what I was hoping to see.
Since there isn't a uniform bench from prior to the 'canon' commit I'll just assume it got better too. 😄 Now I'm really looking forward 0.9.0. I'll close this as use error and, again, let me express my appreciation for your efforts. 👍 |
Hello! I was pretty excited to see the alpha release because of all the hype regarding Canon's method but either I did something wrong or, ..., well 🤷
The first bench is from dba696e on Feb 15 and then the alpha release.
The command used was
samplei32
single
distr
SmallRng
3.30 ns
(✅ 1.00x)2.01 ns
(✅ 1.00x)SmallRng
3.31 ns
(✅ 1.00x)2.07 ns
(✅ 1.00x)ChaCha8Rng
3.59 ns
(✅ 1.09x slower)2.48 ns
(❌ 1.23x slower)ChaCha8Rng
3.62 ns
(✅ 1.09x slower)2.66 ns
(❌ 1.29x slower)Pcg32
2.85 ns
(✅ 1.16x faster)2.25 ns
(❌ 1.12x slower)Pcg32
2.93 ns
(✅ 1.13x faster)2.28 ns
(✅ 1.10x slower)Pcg64
3.88 ns
(❌ 1.18x slower)2.66 ns
(❌ 1.32x slower)Pcg64
3.86 ns
(❌ 1.17x slower)2.72 ns
(❌ 1.32x slower)samplei64
single
distr
SmallRng
4.63 ns
(✅ 1.00x)1.90 ns
(✅ 1.00x)SmallRng
4.58 ns
(✅ 1.00x)1.91 ns
(✅ 1.00x)ChaCha8Rng
5.97 ns
(❌ 1.29x slower)3.60 ns
(❌ 1.89x slower)ChaCha8Rng
5.90 ns
(❌ 1.29x slower)3.63 ns
(❌ 1.90x slower)Pcg32
5.29 ns
(❌ 1.14x slower)3.12 ns
(❌ 1.64x slower)Pcg32
5.25 ns
(❌ 1.15x slower)3.12 ns
(❌ 1.63x slower)Pcg64
5.00 ns
(✅ 1.08x slower)2.68 ns
(❌ 1.40x slower)Pcg64
5.05 ns
(✅ 1.10x slower)2.70 ns
(❌ 1.41x slower)As you can see, no gains. At least things didn't get worse. I also tested with feature unbiased and I didn't see anything change but if I understand correctly that shouldn't alter single samples.
So did I miss something?
The text was updated successfully, but these errors were encountered: