Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve ISAAC performance #36

Closed
wants to merge 4 commits into from

Conversation

pitdicker
Copy link

To be fair I am not sure about this PR.

It makes isaac:::fill_bytes, isaac64:::fill_bytes, and isaac64::next_u32 much faster, all by 45%.
But it is also a little bit backwards-incompatible: the order of the results in fill_bytes is reversed, and next_u32 no longer drops the first 32 bits of every result.

It has been sitting on my computer for 2+ weeks ;-)

Copy link
Owner

@dhardy dhardy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting work.

With some well-known generators (e.g. MT19937 and ChaCha) I can see value in trying to produce the same output sequence as other implementations; for ISAAC I'm less sure. If we're okay with breaking away from the standard output for ISAAC, I'd consider reversing the buffer order (counting UP like ChaCha) too.

I think we now need "true values" tests for each output type! I wrote one:

    #[test]
    fn test_isaac64_true_bytes() {
        let seed: &[_] = &[1, 23, 456, 7890, 12345];
        let mut rng1 = Isaac64Rng::from_seed(seed);
        let mut buf = [0u8; 32];
        rng1.fill_bytes(&mut buf);
        assert_eq!(buf,
                   [98, 205, 127, 160, 83, 98, 49, 17,
                    141, 186, 192, 50, 116, 69, 205, 240,
                    156, 242, 26, 63, 54, 166, 135, 199,
                    140, 237, 103, 8, 93, 196, 151, 7]);
    }

}

let buf = unsafe { &*(&mut self.buffer as *mut [w32; STATE_WORDS]
as *mut [u8; STATE_WORDS * 4]) };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can replace *mut with *const here (with &self.buffer).


// convert to LE:
if cfg!(target_endian = "big") {
for ref mut x in self.rsl[index_u32..(index_u32 + chunk_size_u32)].iter_mut() {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need the cfg! part; the optimiser can figure this out.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought so too, but without it the benchmarks where slower.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran them and they were the same for me. Weird.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will test it again. If it is not necessary, it is better without :-)

let index_u8 = index_u64 * 8;

// convert to LE:
if cfg!(target_endian = "big") {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as above

}

let rsl = unsafe { &*(&mut self.rsl as *mut [u32; RAND_SIZE]
as *mut [u8; RAND_SIZE * 4]) };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, *const

}

let rsl = unsafe { &*(&mut self.rsl as *mut [u64; RAND_SIZE]
as *mut [u8; RAND_SIZE * 8]) };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

again, *const

}

let mut index_u64 = (self.cnt >> 1) as usize;
let available = index_u64 * 8;
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not just use index_u32 here and keep any extra half-word?

I tried implementing this, and ran into a complication: the Endian conversion still works on 64-bit words. Dealing with this the hard way (rounding up/down and making sure to discard any extra converted half-word) is possible, but adds complication. Benchmark shows no difference... well, it shouldn't really, especially if I don't try throwing away half-words.

So better just forget this idea!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before I had the endian conversion that was how it worked. With the conversion it just got a bit difficult. It only matters if you frequently call fill_bytes after next_u32, and thought it was not worth the extra complexity.

}

let rsl = unsafe { &*(&mut self.rsl as *mut [u64; RAND_SIZE]
as *mut [u32; RAND_SIZE * 2]) };
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Once again, *const

@pitdicker
Copy link
Author

Thanks for the review!

If we're okay with breaking away from the standard output for ISAAC, I'd consider reversing the buffer order (counting UP like ChaCha) too.

That would make things quite a bit easier.

I currently have some testing code that compares IsaacRng against the reference implementation. I think matching that is somewhat useful, just so we can clearly communicate it works as it should.

For IsaacRng the next_u32 function produces exactly the same output as the reference implementation.
For Isaac64Rng the next_u64 function does that also.

For those the current code was already optimal. And anything that has changed, was not available in the C code. So while I would make those changes, I would not reverse the buffer order.

Hmmm, maybe we could change the order the values are written into the buffer. Than we can also change the order to read from it, and nothing would be different. Maybe ISAAC and ChaCHa could then even chare the fill_bytes code. I will give that a try!

@dhardy
Copy link
Owner

dhardy commented Nov 9, 2017

Yes, if we have no good reason to break away from the reference for next_u32 and next_u64 on the respective generators, it does make sense. It's just that the current compromise restricts what changes can be made in the future without further breakage. But it might still be worth it.

If indeed your idea about writing into the buffer backwards works out, that would obviously be better!


// convert to LE:
for ref mut x in self.buffer[self.index..self.index+words].iter_mut() {
**x = w((*x).0.to_le());
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think mutating the output buffer from ChaCha is correct... The output buffer is used for the next round. Have to think of something else.

I have tried to set up big-endian testing several times, no success yet. Do you happen to know an easy way (that works on Fedora)?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, no it doesn't

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess you need an emulator. If you like I can try to set something up.

https://stackoverflow.com/questions/3337896/imitate-emulate-a-big-endian-behavior-in-c

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you think we could get CI? With trust we would get big-endian testing for free. I have not made a pr yet, but last I checked we didn't build on Windows (small fix though).

@pitdicker
Copy link
Author

pitdicker commented Nov 10, 2017

Writing into the buffer backwards worked. I will clean it up over the weekend.

dhardy added a commit that referenced this pull request Nov 10, 2017
Copy non-controversial part of #36
Credit: Paul Dicker <pitdicker@gmail.com>
@dhardy dhardy mentioned this pull request Nov 10, 2017
dhardy added a commit that referenced this pull request Nov 10, 2017
Includes both the values output now and the values which should be output by #36.
@pitdicker pitdicker closed this Nov 11, 2017
@pitdicker pitdicker deleted the fill_bytes_impls branch November 11, 2017 07:12
dhardy added a commit that referenced this pull request Dec 15, 2017
Includes both the values output now and the values which should be output by #36.

[Cherry-picked from fd2660b]
dhardy added a commit that referenced this pull request Dec 17, 2017
Includes both the values output now and the values which should be output by #36.

[Cherry-picked from fd2660b]
dhardy added a commit that referenced this pull request Dec 18, 2017
Includes both the values output now and the values which should be output by #36.

[Cherry-picked from fd2660b]
dhardy added a commit that referenced this pull request Dec 27, 2017
Includes both the values output now and the values which should be output by #36.

[Cherry-picked from fd2660b]
pitdicker pushed a commit to pitdicker/rand that referenced this pull request Apr 4, 2018
Includes both the values output now and the values which should be output by dhardy#36.

[Cherry-picked from fd2660b]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants