-
Notifications
You must be signed in to change notification settings - Fork 431
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clean up benchmarks #465
Clean up benchmarks #465
Conversation
benches/distributions.rs
Outdated
let mut accum = 0u32; | ||
for _ in 0..::RAND_BENCH_N { | ||
let x: char = distr.sample(&mut rng); | ||
accum = accum.wrapping_add(x as u32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this is the only difference, I think you could just do the same for booleans, though doesn't really matter
for _ in 0..::RAND_BENCH_N { | ||
black_box(rng.gen_bool(p)); | ||
accum ^= rng.gen_bool(p); | ||
p += 0.0001; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Adjusting p seems sensible, but it adds an extra op, making this incomparable with misc_gen_bool_const
. Was the old method not okay?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Although LLVM didn't move the range set-up code out of the inner loop, I see no reason why it couldn't. Changing p
makes sure it can not.
Funny thing: using black_box
before the for loop is slower than incrementing p
each round.
Updated to fold the |
for _ in 0..::RAND_BENCH_N { | ||
black_box(rng.sample(d)); | ||
let d = rand::distributions::Bernoulli::new(p); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I think the point was to check what affect using an unknown p
has over using a constant, not to adjust the distribution each loop and do a single call for each instance.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is also possible to benchmark, but then we should expect the benchmark to give the same result as the const version. Does not seem like something all that useful to measure to me.
This includes the change as discussed for the
gen_bool
enBernoulli
benchmarks.Also it turns out almost all our uses of
black_box
were unnecessary, it is enough to return a variable or some accumulator at the end of the benchmark run.Results with
cargo benchcmp --threshold 2
:misc_gen_bool
andmisc_bernoulli
benchmarks are clearly different (in part because of usingStdRng
).bool
orchar
using an accumulator gives more realistic results.Hc128Rng
andStdRng
produce results somewhere between 1250 MB/s and 1800 MB/s, without recompilation. Still don't know what causes the variance.black_box
from thegen_bytes
benchmarks, it improves performance a bit more than I think is plausible, but I didn't want to figure out the assembly for those...