-
-
Notifications
You must be signed in to change notification settings - Fork 456
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve shuffle performance #297
Conversation
Nice to compare against Xorshift.
|
Wait, your code can shuffle a 1000-element list in 13% more time than it takes to generate a u32 with Xorshift? I don't believe that. Also, I stand by my previous comment that algorithms don't belong in the TODO: investigate why the benchmark is so fast. This doesn't need to land in the 0.5 release. |
I believe the benchmark is right. For a slice with only 1000 elements two values can be swapped with every round of the RNG. Also the only things it does extra are keeping a loop counter, a multiply and the swap itself. If you increase the benchmark to use slices of for example 40mb it takes about twice as long, as expected because it mostly uses the slower loop. |
Am I missing something? |
Ah, I suppose I have played too much with the RNG benchmarks. I just assumed the factor 1000 and didn't even think to mention it 😄. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the details mentioned about i
, I'm happy with the rough shape of the algorithm.
I'm not really happy about putting a non-trivial algorithm within Rng
though. Hopefully later we can find another place for this; in the mean time perhaps a private function would do.
I'm satisfied the biases would not be significant for most uses, but e.g. std
has sort
and sort_unstable
, i.e. the default version is the safest version, so perhaps we should have shuffle
and shuffle_fast
(or shuffle_biased
)?
src/lib.rs
Outdated
} | ||
|
||
let mut i = i as u16; | ||
while i >= 2 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be while i >= 5
with special cases for the last choices? This algorithm could wrap i
. Also implies the guards on above while loops may need changing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That is also possible. My thought was to handle the special cases before starting a loop. Maybe doesn't matter anything.
You are getting creative with the labels 😄.
I agree. At the moment I am trying out a single trait that implements functions like dhardy#82 (comment). Curious to see if it works at all... Also can we ignore this PR for a while? I am getting less convinced by my own arguments that the bias is small 😄. The range is getting 1 step smaller every time. Is there another way to determine the acceptable |
Yes, sure, it doesn't really matter if this is left until after 0.5. Huh, I was trying to make labels actually convey useful information at a glance. Slightly useful maybe. |
Seems like a success to me |
I have done several tries with a So I have only kept the idea here to use smaller integers than Benchmark results:
|
Maybe moving the function to the What do you think? |
Also, if you want to ignore this until after 0.5, feel free to do so. |
I have some catching up to do, sorry. I'd like to get a 0.5 pre-release cut very soon; I guess non-ABI breaking changes like this could potentially still make the final 0.5 anyway. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure if you've got your range off by 1. Other than that looks okay. It's complicated and like most things random difficult to test conclusively, unfortunately.
let mut i = values.len() as u64; | ||
while i > (1 << 31) { | ||
i -= 1; | ||
values.swap(i as usize, rng.gen_range(0, i + 1) as usize); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We allow sampling i
here (no swap)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If i
is the length of the slice, i - 1
is the latest index. We should generate the range over (i - 1) + 1
, because it is exclusive, and to allow a case where i
does not get swapped. So I believe this code is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In a single line you have "swap i and gen_range(0, i+1)", i.e. the latter sample can be i, which is I believe correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just confirming 😄
src/seq.rs
Outdated
let val = value as u16; | ||
value = value >> 16; | ||
let (hi, lo) = val.wmul(i); | ||
let zone = ::core::u16::MAX - (::core::u16::MAX - i + 1) % i; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But this code looks like it excludes i
, comparing to the range code. I think replace i by range = i + 1
in this line and the one above (wmul(range)
)?
Not sure about the organisation. You could no-std enable seq or just put the private function in |
Updated. (just in case) |
A nice property of the changes here: previously |
I am going to close this PR for now. I have kept working in it, and some more of the things from dhardy#82, in https://github.com/pitdicker/rand/commits/slice_random. |
I found another easy performance win for shuffles: unchecked swaps:
(I used only Perhaps there is some way we could convince the compiler bound checks are not needed. Otherwise this will do: #[inline]
unsafe fn swap_unchecked<T>(values: &mut [T], a: usize, b: usize) {
let pa: *mut T = values.get_unchecked_mut(a);
let pb: *mut T = values.get_unchecked_mut(b);
ptr::swap(pa, pb);
} (we could even only use |
Using two methods: not using perfectly unbiased ranges (see rationale in the comment), and making better use of the RNG (like not requesting 64 bits on an
usize
when there are less than 2^32 elements).Before:
After:
Note that I didn't do endless tries optimizing this, I only implemented the basic ideas. It is a little surprising that x86 does not improve in a way similar to x86_64.