-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve ISAAC performance #36
Conversation
This does not change benchmark results, just makes the code similar.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting work.
With some well-known generators (e.g. MT19937 and ChaCha) I can see value in trying to produce the same output sequence as other implementations; for ISAAC I'm less sure. If we're okay with breaking away from the standard output for ISAAC, I'd consider reversing the buffer order (counting UP like ChaCha) too.
I think we now need "true values" tests for each output type! I wrote one:
#[test]
fn test_isaac64_true_bytes() {
let seed: &[_] = &[1, 23, 456, 7890, 12345];
let mut rng1 = Isaac64Rng::from_seed(seed);
let mut buf = [0u8; 32];
rng1.fill_bytes(&mut buf);
assert_eq!(buf,
[98, 205, 127, 160, 83, 98, 49, 17,
141, 186, 192, 50, 116, 69, 205, 240,
156, 242, 26, 63, 54, 166, 135, 199,
140, 237, 103, 8, 93, 196, 151, 7]);
}
} | ||
|
||
let buf = unsafe { &*(&mut self.buffer as *mut [w32; STATE_WORDS] | ||
as *mut [u8; STATE_WORDS * 4]) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can replace *mut
with *const
here (with &self.buffer
).
|
||
// convert to LE: | ||
if cfg!(target_endian = "big") { | ||
for ref mut x in self.rsl[index_u32..(index_u32 + chunk_size_u32)].iter_mut() { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You don't need the cfg!
part; the optimiser can figure this out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I thought so too, but without it the benchmarks where slower.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I ran them and they were the same for me. Weird.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will test it again. If it is not necessary, it is better without :-)
let index_u8 = index_u64 * 8; | ||
|
||
// convert to LE: | ||
if cfg!(target_endian = "big") { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as above
} | ||
|
||
let rsl = unsafe { &*(&mut self.rsl as *mut [u32; RAND_SIZE] | ||
as *mut [u8; RAND_SIZE * 4]) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, *const
} | ||
|
||
let rsl = unsafe { &*(&mut self.rsl as *mut [u64; RAND_SIZE] | ||
as *mut [u8; RAND_SIZE * 8]) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, *const
} | ||
|
||
let mut index_u64 = (self.cnt >> 1) as usize; | ||
let available = index_u64 * 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not just use index_u32
here and keep any extra half-word?
I tried implementing this, and ran into a complication: the Endian conversion still works on 64-bit words. Dealing with this the hard way (rounding up/down and making sure to discard any extra converted half-word) is possible, but adds complication. Benchmark shows no difference... well, it shouldn't really, especially if I don't try throwing away half-words.
So better just forget this idea!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before I had the endian conversion that was how it worked. With the conversion it just got a bit difficult. It only matters if you frequently call fill_bytes
after next_u32
, and thought it was not worth the extra complexity.
} | ||
|
||
let rsl = unsafe { &*(&mut self.rsl as *mut [u64; RAND_SIZE] | ||
as *mut [u32; RAND_SIZE * 2]) }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once again, *const
Thanks for the review!
That would make things quite a bit easier. I currently have some testing code that compares For For those the current code was already optimal. And anything that has changed, was not available in the C code. So while I would make those changes, I would not reverse the buffer order. Hmmm, maybe we could change the order the values are written into the buffer. Than we can also change the order to read from it, and nothing would be different. Maybe ISAAC and ChaCHa could then even chare the |
Yes, if we have no good reason to break away from the reference for If indeed your idea about writing into the buffer backwards works out, that would obviously be better! |
|
||
// convert to LE: | ||
for ref mut x in self.buffer[self.index..self.index+words].iter_mut() { | ||
**x = w((*x).0.to_le()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think mutating the output buffer from ChaCha is correct... The output buffer is used for the next round. Have to think of something else.
I have tried to set up big-endian testing several times, no success yet. Do you happen to know an easy way (that works on Fedora)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, no it doesn't
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you need an emulator. If you like I can try to set something up.
https://stackoverflow.com/questions/3337896/imitate-emulate-a-big-endian-behavior-in-c
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you think we could get CI? With trust we would get big-endian testing for free. I have not made a pr yet, but last I checked we didn't build on Windows (small fix though).
Writing into the buffer backwards worked. I will clean it up over the weekend. |
Copy non-controversial part of #36 Credit: Paul Dicker <pitdicker@gmail.com>
Includes both the values output now and the values which should be output by #36.
To be fair I am not sure about this PR.
It makes
isaac:::fill_bytes
,isaac64:::fill_bytes
, andisaac64::next_u32
much faster, all by 45%.But it is also a little bit backwards-incompatible: the order of the results in
fill_bytes
is reversed, andnext_u32
no longer drops the first 32 bits of every result.It has been sitting on my computer for 2+ weeks ;-)