- Sponsor
-
Notifications
You must be signed in to change notification settings - Fork 458
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement From<std::ops::Range> for Range #382
Conversation
Very nice!
|
src/distributions/range.rs
Outdated
@@ -314,6 +314,19 @@ macro_rules! range_int_impl { | |||
} | |||
} | |||
} | |||
|
|||
impl From<::core::ops::Range<$ty>> for RangeInt<$ty> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you also implement From<::core::ops::Range<$ty>>
for RangeFloat
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure this is a good idea, since we don't have an implementation for closed ranges. On the other hand, this also affects Range::new
and Range::new_inclusive
, which are currently identical for floats.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And here it's annoying that the a..b
syntax is half-open — if it were closed it wouldn't be confusing using the syntax for sampling both float and int ranges. As it is though I don't know.
src/distributions/range.rs
Outdated
} | ||
} | ||
|
||
impl From<::core::ops::Range<$ty>> for Range<$ty> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe make this a generic implementation for all RangImpl
s?
impl<X> From<::core::ops::Range<X>> for Range<X>
where X: SampleRange,
<X as SampleRange>::Impl: From<::core::ops::Range<X>>
{
fn from(r: ::core::ops::Range<X>) -> Range<X> {
Range { inner: <X as SampleRange>::Impl::from(r) }
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thinking about it a bit more, a simple generic implementation is all that is needed, no implementation on RangeImpl
is necessary:
impl<X: SampleRange> From<::core::ops::Range<X>> for Range<X> {
fn from(r: ::core::ops::Range<X>) -> Range<X> {
Range::new(r.start, r.end)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure!
src/distributions/range.rs
Outdated
// Doing multiply before addition allows some architectures to | ||
// use a single instruction. | ||
value1_2 * self.scale + self.offset | ||
value1_2.mul_add(self.scale, self.offset) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mul_add
is much slower than a separate multiply and addition, because it tries to minimize the rounding error. LLVM can turn the two instructions into one for the platforms that support it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mul_add is much slower
Not according to the documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I read that and tried it before. Can you benchmark it otherwise, because I believe that documentation is wrong.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before:
test distr_range_f32 ... bench: 1,517 ns/iter (+/- 31) = 2636 MB/s
test distr_range_f64 ... bench: 3,854 ns/iter (+/- 80) = 2075 MB/s
After:
test distr_range_f32 ... bench: 2,329 ns/iter (+/- 12) = 1717 MB/s
test distr_range_f64 ... bench: 3,875 ns/iter (+/- 83) = 2064 MB/s
So it seems only slower for f32
. But certainly not faster.
Looking at the assembly, mul_add
generates vfmadd213ss xmm0, xmm1, xmm2
while a * b + c
generates vmulss xmm0, xmm0, xmm1; vaddss xmm0, xmm0, xmm2
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another problem: it does not work with no_std
(CI fails).
B.t.w. if you rebase on master the CI should work again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, when using target-cpu=native
, mul_add
is faster for f32
:
test distr_range_f32 ... bench: 1,163 ns/iter (+/- 17) = 3439 MB/s
test distr_range_f64 ... bench: 3,852 ns/iter (+/- 6) = 2076 MB/s
How would that work? |
I edited the comment, I'm afraid I'll never learn... |
It doesn't work with `no_std` and it is sometimes slower. This was documented in a comment.
I removed |
What did you think of my comment to not implement range for the |
Sorry, I missed that comment. I like it! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vks Sorry for making things difficult, but two more comments
@@ -454,11 +466,18 @@ macro_rules! range_float_impl { | |||
// Generate a value in the range [1, 2) | |||
let value1_2 = (rng.$next_u() >> $bits_to_discard) | |||
.into_float_with_exponent(0); | |||
// Doing multiply before addition allows some architectures to | |||
// use a single instruction. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Before #274 we would do r.low + r.range * rng.gen::<$ty>()
. I think it is nice to preserve this comment that explains why we choose a different order.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you please be more specific? I'm not sure the previous comment was correct. Without fast-math
it is not correct if the compiler uses FMA.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, there is an instruction that does a true FMA, with the benefit of extra precision, and a simple instruction that produuces exactly the same results as multiply and addition. At least on ARM.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks, I'll add a comment!
src/distributions/range.rs
Outdated
value1_2 * self.scale + self.offset | ||
} | ||
} | ||
|
||
impl From<::core::ops::Range<$ty>> for RangeFloat<$ty> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't see a point in keeping these when there also is the more generic implementation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removing this would break RangeFloat::from(range)
, wouldn't it? I would rather keep it, if RangeFloat
stays public.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
RangeFloat
is supposed to be an implementation detail, only there to make it easier to implement Range
for user types. No one is supposed to use it directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Then it should be private, right? If we make it private (or hide the docs) I would agree that the other impls should be removed.
Edit: Can't be private, because we are leaking the type via the RangeImpl
trait.
This allows the following syntax: