Allow sampling from a closed integer range #2

pitdicker · 2017-08-30T06:39:39Z

This does not compile, but I wanted to get your opinion.

The last few days I tried just about every trick to make sampling from a range of integers faster. Like making the zone a power of 2 (so mod becomes &), and multiplication with mersenne numbers. In the end, you are just trading the modulus for more times to call the prng (18~25% more). Because prng's can be much slower than the Xorshift used in the benchmarks, this is not a promising route to take...

It also took me a lot of time to understand your method to calculate the zone. So I tried to come up with something myself, and it is exactly the same. I guess that proves there are no more off-by-one errors :-).

What I could do was add some extra comments to explain what is happening here. Also the modulus could be optimised a very little bit thanks to this trick: skipping it for the small chance v falls in the target range, making it a few percent faster.

What I would like to know your opinion about: is it useful to expose an inclusive range? As per this comment. It fit a little better with my mental model when writing the code.

dhardy

Looks like most of the changes are to handle the case where unsigned_max is within the range, and yet it's still not possible to construct such a range.

I like the code but won't merge yet.

What do you think? We could add a Range::closed or new_range_inclusive constructor maybe?

dhardy · 2017-08-30T11:30:14Z

src/distributions/range2.rs

+                // the type, it has to store `unsigned_max + 1`, which can't be
+                // represented. But a range of size 0 can't exist, and a
+                // modulus op `unsigned_max + 1` is a no-op. So we treat this as
+                // a special case. Wrapping arithmetic makes representing


Grammar: "even" before "makes", not "simple"

dhardy · 2017-08-30T11:30:45Z

src/distributions/range2.rs

+                // `unsigned_max + 1` as 0 even simple.
+                //
+                // We don't calculate zone directly, but first calculate the
+                // number of integers to reject first. With a wrikle to handle


What's a "wrikle"?

pitdicker · 2017-08-30T14:24:04Z

I fixed the spelling in the comments, thank you.

About how or where to add a constructor: there are so many designs and questions in the RFC thread, I don't know where to start. And I now almost nothing about API design...
Probably best to leave some comments on the RFC tread first

dhardy · 2017-08-30T14:50:49Z

I'd rather address other stuff (the traits and crate split) on the RFC thread first; there's too much going on there at the moment.

Using inclusive ranges seems sensible to me but would be a breaking change, so maybe a second constructor/function.

pitdicker · 2017-08-30T18:44:40Z

Sorry for derailing the discussion on the RFC tread, I saw your comment to late.

I sure don't think changing the default from open to closed ranges is a good idea! It would be very easy to silently break the code of others. Even with good documentation.

pitdicker · 2017-09-01T13:22:45Z

I had two idea's that should help make the range code a bit faster in some cases:

If the type of the range is u8 or u16, we should pick a zone that fits in an u32. This greatly reduces the number of random integers that have to be rejected. The zone can still be stored in the RangeInt struct, thanks to a trick: the bits of the u32 that get truncated, happen to all be 1's.
There are two good techniques for this range code: the current one, that creates a zone as large as possible and reduces it with modulus. And one that reduces the random integers first to the power of two that is greater then the range, and then rejects everything greater than the range. What is optimal depends on how fast generating random numbers is compared to the modulus operation, and on how close a range is to its next power of two. We can just implement both techniques, and the pick the one that is probably best based on the size of the range. I think a good cutoff is when the range is more than 3/4 of the next power of two. Depending on the range, 0~25% of the numbers have to be rejected, 12.5% on average. But maybe 7/8 is better.

I'll report back when I have something.

pitdicker · 2017-09-02T17:48:17Z

I spend a few more hours trying to figure out why to code runs slower than it should. Without much success...

It turns out the modulus operation is slow, but not the real problem. Apparently rust adds a check before dividing by 0, and possibly panics. This slowed down the function by about 5%. If I leave out the loop from sample, it is 40% faster. And that while the loop almost always (99+%) runs just once.

The idea to pick a larger zone for u8 and u16 worked out nicely. But it got a little complicated with the rules for casting. The idea of providing two different techniques produced some nice and complicated code, but was not any faster. Benchmarks:
Before:

test distr_range2_i16        ... bench:       2,513 ns/iter (+/- 51) = 397 MB/s
test distr_range2_i8         ... bench:       2,941 ns/iter (+/- 30) = 340 MB/s
test distr_range2_int        ... bench:       3,104 ns/iter (+/- 37) = 2577 MB/s
test distr_range_int         ... bench:       3,102 ns/iter (+/- 29) = 2578 MB/s

After:

test distr_range2_i16        ... bench:       1,238 ns/iter (+/- 24) = 807 MB/s
test distr_range2_i8         ... bench:       1,231 ns/iter (+/- 34) = 812 MB/s
test distr_range2_int        ... bench:       2,961 ns/iter (+/- 55) = 2701 MB/s
test distr_range_int         ... bench:       3,120 ns/iter (+/- 43) = 2564 MB/s

I have added a function new_inclusive to RangeImpl, but not exposed the closed range methods further.

pitdicker · 2017-09-07T09:39:12Z

Rebased.
You found the strange performance difference I have been searching for for hours: the compiler became to smart with the benchmarks :-). Now it all looks a lot less nice...

Before:

test distr_range2_i8         ... bench:       5,706 ns/iter (+/- 52) = 175 MB/s
test distr_range2_i16        ... bench:       4,699 ns/iter (+/- 33) = 425 MB/s
test distr_range2_i32        ... bench:       5,325 ns/iter (+/- 46) = 751 MB/s
test distr_range2_i64        ... bench:      11,080 ns/iter (+/- 26) = 722 MB/s

After:

test distr_range2_i8         ... bench:       4,863 ns/iter (+/- 38) = 205 MB/s
test distr_range2_i16        ... bench:       4,861 ns/iter (+/- 38) = 411 MB/s
test distr_range2_i32        ... bench:       5,426 ns/iter (+/- 50) = 737 MB/s
test distr_range2_i64        ... bench:      10,925 ns/iter (+/- 24) = 732 MB/s

Some improvements, some losses. But it all depends very much on the size of the range, at least for i32 and i65.

It still think this pr is useful, as it adds support for closed ranges (e.g. handling ranges that can cover the entire range of the type). And the optimisation of small integers and extra comments.

dhardy · 2017-09-07T11:02:06Z

Yes, I had my head scratching why moving benchmarks from one module to another made a big difference, until I realised one used blackbox. Micro-benchmarks are tricky.

Can you add a Range::new_inclusive constructor?

pitdicker · 2017-09-28T19:32:24Z

Finally finished this. Sorry for taking so long.

Would it be okay if I make a PR that removes Range and replaces it with Range2?

These changes make it possible to sample from closed ranges, not only from open. Included is a small optimisation for the modulus operator, and an optimisation for the types i8/u8 and i16/u16.

dhardy · 2017-09-29T12:45:57Z

No problem.

Yes, I was planning on removing the original range; actually this PR is the reason I didn't yet. Go ahead.

pitdicker · 2017-09-30T17:43:21Z

Thank you. Removed the original range.

pitdicker · 2017-09-30T17:44:53Z

src/lib.rs

@@ -263,6 +264,7 @@ pub use thread_local::{ThreadRng, thread_rng, set_thread_rng, set_new_thread_rng
        random, random_with};

 use prng::IsaacWordRng;
+use distributions::range::Range;


Note: there is a second import around line 472: use distributions::range::SampleRange;.
I left it there to avoid a rebase.

pitdicker · 2017-10-14T14:47:15Z

Hi @dhardy. Is there a chance that you could merge this and the other PRs, or that we can move them along?

dhardy · 2017-10-14T16:28:16Z

Yeah, I guess. Sorry, I've been busy with some other work and travel the last couple of weeks, should have more time now.

pitdicker · 2017-10-16T04:39:36Z

Thank you!

dhardy reviewed Aug 30, 2017

View reviewed changes

pitdicker force-pushed the range_int branch from 188307a to c21dd5c Compare August 30, 2017 13:56

pitdicker force-pushed the range_int branch 3 times, most recently from d8facb2 to fc0a6e5 Compare September 2, 2017 17:32

pitdicker force-pushed the range_int branch from fc0a6e5 to 8412fe7 Compare September 7, 2017 09:08

Add benchmarks for ranges of i8, i16 and i32

31f964e

pitdicker force-pushed the range_int branch 2 times, most recently from 57b5c5b to 272b385 Compare September 7, 2017 09:32

pitdicker force-pushed the range_int branch from 272b385 to a755d89 Compare September 28, 2017 19:30

Allow sampling from a closed integer range

d8b8474

These changes make it possible to sample from closed ranges, not only from open. Included is a small optimisation for the modulus operator, and an optimisation for the types i8/u8 and i16/u16.

pitdicker force-pushed the range_int branch from a755d89 to d8b8474 Compare September 28, 2017 19:35

Remove range.rs

7edd06b

pitdicker force-pushed the range_int branch 3 times, most recently from b15a99d to fdf5141 Compare September 30, 2017 17:41

Replace range with range2

96503f7

pitdicker force-pushed the range_int branch from fdf5141 to 96503f7 Compare September 30, 2017 17:42

pitdicker commented Sep 30, 2017

View reviewed changes

dhardy merged commit 97ab178 into dhardy:master Oct 14, 2017

pitdicker deleted the range_int branch October 16, 2017 04:39

pitdicker mentioned this pull request Dec 12, 2017

Speed up range sampling. rust-random/rand#115

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Allow sampling from a closed integer range #2

Allow sampling from a closed integer range #2

pitdicker commented Aug 30, 2017

dhardy left a comment

dhardy Aug 30, 2017

dhardy Aug 30, 2017

pitdicker commented Aug 30, 2017

dhardy commented Aug 30, 2017

pitdicker commented Aug 30, 2017

pitdicker commented Sep 1, 2017

pitdicker commented Sep 2, 2017

pitdicker commented Sep 7, 2017

dhardy commented Sep 7, 2017

pitdicker commented Sep 28, 2017

dhardy commented Sep 29, 2017

pitdicker commented Sep 30, 2017

pitdicker Sep 30, 2017

pitdicker commented Oct 14, 2017

dhardy commented Oct 14, 2017

pitdicker commented Oct 16, 2017

Allow sampling from a closed integer range #2

Allow sampling from a closed integer range #2

Conversation

pitdicker commented Aug 30, 2017

dhardy left a comment

Choose a reason for hiding this comment

dhardy Aug 30, 2017

Choose a reason for hiding this comment

dhardy Aug 30, 2017

Choose a reason for hiding this comment

pitdicker commented Aug 30, 2017

dhardy commented Aug 30, 2017

pitdicker commented Aug 30, 2017

pitdicker commented Sep 1, 2017

pitdicker commented Sep 2, 2017

pitdicker commented Sep 7, 2017

dhardy commented Sep 7, 2017

pitdicker commented Sep 28, 2017

dhardy commented Sep 29, 2017

pitdicker commented Sep 30, 2017

pitdicker Sep 30, 2017

Choose a reason for hiding this comment

pitdicker commented Oct 14, 2017

dhardy commented Oct 14, 2017

pitdicker commented Oct 16, 2017