Improve ISAAC performance #36

pitdicker · 2017-11-08T17:07:25Z

To be fair I am not sure about this PR.

It makes isaac:::fill_bytes, isaac64:::fill_bytes, and isaac64::next_u32 much faster, all by 45%.
But it is also a little bit backwards-incompatible: the order of the results in fill_bytes is reversed, and next_u32 no longer drops the first 32 bits of every result.

It has been sitting on my computer for 2+ weeks ;-)

This does not change benchmark results, just makes the code similar.

dhardy

Interesting work.

With some well-known generators (e.g. MT19937 and ChaCha) I can see value in trying to produce the same output sequence as other implementations; for ISAAC I'm less sure. If we're okay with breaking away from the standard output for ISAAC, I'd consider reversing the buffer order (counting UP like ChaCha) too.

I think we now need "true values" tests for each output type! I wrote one:

    #[test]
    fn test_isaac64_true_bytes() {
        let seed: &[_] = &[1, 23, 456, 7890, 12345];
        let mut rng1 = Isaac64Rng::from_seed(seed);
        let mut buf = [0u8; 32];
        rng1.fill_bytes(&mut buf);
        assert_eq!(buf,
                   [98, 205, 127, 160, 83, 98, 49, 17,
                    141, 186, 192, 50, 116, 69, 205, 240,
                    156, 242, 26, 63, 54, 166, 135, 199,
                    140, 237, 103, 8, 93, 196, 151, 7]);
    }

dhardy · 2017-11-09T17:44:30Z

src/prng/chacha.rs

+        }
+
+        let buf = unsafe { &*(&mut self.buffer as *mut [w32; STATE_WORDS]
+                                               as *mut [u8; STATE_WORDS * 4]) };


You can replace *mut with *const here (with &self.buffer).

dhardy · 2017-11-09T17:55:32Z

src/prng/isaac.rs

+
+        // convert to LE:
+        if cfg!(target_endian = "big") {
+            for ref mut x in self.rsl[index_u32..(index_u32 + chunk_size_u32)].iter_mut() {


You don't need the cfg! part; the optimiser can figure this out.

I thought so too, but without it the benchmarks where slower.

I ran them and they were the same for me. Weird.

I will test it again. If it is not necessary, it is better without :-)

dhardy · 2017-11-09T17:55:46Z

src/prng/isaac64.rs

+        let index_u8 = index_u64 * 8;
+
+        // convert to LE:
+        if cfg!(target_endian = "big") {


dhardy · 2017-11-09T18:04:07Z

src/prng/isaac.rs

+        }
+
+        let rsl = unsafe { &*(&mut self.rsl as *mut [u32; RAND_SIZE]
+                                            as *mut [u8; RAND_SIZE * 4]) };


again, *const

dhardy · 2017-11-09T18:04:18Z

src/prng/isaac64.rs

+        }
+
+        let rsl = unsafe { &*(&mut self.rsl as *mut [u64; RAND_SIZE]
+                                            as *mut [u8; RAND_SIZE * 8]) };


again, *const

dhardy · 2017-11-09T18:30:22Z

src/prng/isaac64.rs

+        }
+
+        let mut index_u64 = (self.cnt >> 1) as usize;
+        let available = index_u64 * 8;


Why not just use index_u32 here and keep any extra half-word?

I tried implementing this, and ran into a complication: the Endian conversion still works on 64-bit words. Dealing with this the hard way (rounding up/down and making sure to discard any extra converted half-word) is possible, but adds complication. Benchmark shows no difference... well, it shouldn't really, especially if I don't try throwing away half-words.

So better just forget this idea!

Before I had the endian conversion that was how it worked. With the conversion it just got a bit difficult. It only matters if you frequently call fill_bytes after next_u32, and thought it was not worth the extra complexity.

dhardy · 2017-11-09T18:31:59Z

src/prng/isaac64.rs

+        }
+
+        let rsl = unsafe { &*(&mut self.rsl as *mut [u64; RAND_SIZE]
+                                            as *mut [u32; RAND_SIZE * 2]) };


Once again, *const

pitdicker · 2017-11-09T19:49:02Z

Thanks for the review!

If we're okay with breaking away from the standard output for ISAAC, I'd consider reversing the buffer order (counting UP like ChaCha) too.

That would make things quite a bit easier.

I currently have some testing code that compares IsaacRng against the reference implementation. I think matching that is somewhat useful, just so we can clearly communicate it works as it should.

For IsaacRng the next_u32 function produces exactly the same output as the reference implementation.
For Isaac64Rng the next_u64 function does that also.

For those the current code was already optimal. And anything that has changed, was not available in the C code. So while I would make those changes, I would not reverse the buffer order.

Hmmm, maybe we could change the order the values are written into the buffer. Than we can also change the order to read from it, and nothing would be different. Maybe ISAAC and ChaCHa could then even chare the fill_bytes code. I will give that a try!

dhardy · 2017-11-09T20:00:31Z

Yes, if we have no good reason to break away from the reference for next_u32 and next_u64 on the respective generators, it does make sense. It's just that the current compromise restricts what changes can be made in the future without further breakage. But it might still be worth it.

If indeed your idea about writing into the buffer backwards works out, that would obviously be better!

pitdicker · 2017-11-10T05:52:19Z

src/prng/chacha.rs

-
-            // convert to LE:
-            for ref mut x in self.buffer[self.index..self.index+words].iter_mut() {
-                **x = w((*x).0.to_le());


I don't think mutating the output buffer from ChaCha is correct... The output buffer is used for the next round. Have to think of something else.

I have tried to set up big-endian testing several times, no success yet. Do you happen to know an easy way (that works on Fedora)?

Oops, no it doesn't

I guess you need an emulator. If you like I can try to set something up.

https://stackoverflow.com/questions/3337896/imitate-emulate-a-big-endian-behavior-in-c

Do you think we could get CI? With trust we would get big-endian testing for free. I have not made a pr yet, but last I checked we didn't build on Windows (small fix though).

pitdicker · 2017-11-10T05:58:43Z

Writing into the buffer backwards worked. I will clean it up over the weekend.

Copy non-controversial part of #36 Credit: Paul Dicker <pitdicker@gmail.com>

Includes both the values output now and the values which should be output by #36.

Includes both the values output now and the values which should be output by #36. [Cherry-picked from fd2660b]

Includes both the values output now and the values which should be output by dhardy#36. [Cherry-picked from fd2660b]

pitdicker added 4 commits October 22, 2017 15:17

Split gen_usize_* benchmarks into gen_u32_* and gen_u64_*

432270d

Improve performance of isaac64::next_u32 by 45%

277b804

Improve performance of isaac*::fill_bytes by 45%

87cf51b

Make ChaChaRng::fill_bytes similar to Isaac.

74d6b18

This does not change benchmark results, just makes the code similar.

dhardy reviewed Nov 9, 2017

View reviewed changes

pitdicker commented Nov 10, 2017

View reviewed changes

dhardy mentioned this pull request Nov 10, 2017

Replace rand_core::impls module with macros? #38

Closed

dhardy added a commit that referenced this pull request Nov 10, 2017

Split gen_usize_* benchmarks into gen_u32_* and gen_u64_*

d387570

Copy non-controversial part of #36 Credit: Paul Dicker <pitdicker@gmail.com>

dhardy mentioned this pull request Nov 10, 2017

True bytes #42

Merged

dhardy added a commit that referenced this pull request Nov 10, 2017

Isaac64: add test for true 32-bit values

fd2660b

Includes both the values output now and the values which should be output by #36.

pitdicker mentioned this pull request Nov 11, 2017

Improve ISAAC performance (take 2) #45

Merged

pitdicker closed this Nov 11, 2017

pitdicker deleted the fill_bytes_impls branch November 11, 2017 07:12

dhardy added a commit that referenced this pull request Dec 15, 2017

Isaac64: add test for true 32-bit values

e0f3972

Includes both the values output now and the values which should be output by #36. [Cherry-picked from fd2660b]

dhardy added a commit that referenced this pull request Dec 17, 2017

Isaac64: add test for true 32-bit values

5472243

Includes both the values output now and the values which should be output by #36. [Cherry-picked from fd2660b]

dhardy added a commit that referenced this pull request Dec 18, 2017

Isaac64: add test for true 32-bit values

053b813

Includes both the values output now and the values which should be output by #36. [Cherry-picked from fd2660b]

dhardy added a commit that referenced this pull request Dec 27, 2017

Isaac64: add test for true 32-bit values

a7e9d77

Includes both the values output now and the values which should be output by #36. [Cherry-picked from fd2660b]

pitdicker pushed a commit to pitdicker/rand that referenced this pull request Apr 4, 2018

Isaac64: add test for true 32-bit values

72eac0b

Includes both the values output now and the values which should be output by dhardy#36. [Cherry-picked from fd2660b]

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve ISAAC performance #36

Improve ISAAC performance #36

pitdicker commented Nov 8, 2017

dhardy left a comment

dhardy Nov 9, 2017

dhardy Nov 9, 2017

pitdicker Nov 9, 2017

dhardy Nov 9, 2017

pitdicker Nov 9, 2017

dhardy Nov 9, 2017

dhardy Nov 9, 2017

dhardy Nov 9, 2017

dhardy Nov 9, 2017

pitdicker Nov 9, 2017

dhardy Nov 9, 2017

pitdicker commented Nov 9, 2017

dhardy commented Nov 9, 2017

pitdicker Nov 10, 2017

pitdicker Nov 10, 2017

dhardy Nov 10, 2017

pitdicker Nov 10, 2017

pitdicker commented Nov 10, 2017 •

edited

Loading

Improve ISAAC performance #36

Improve ISAAC performance #36

Conversation

pitdicker commented Nov 8, 2017

dhardy left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker commented Nov 9, 2017

dhardy commented Nov 9, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

pitdicker commented Nov 10, 2017 • edited Loading

pitdicker commented Nov 10, 2017 •

edited

Loading