-
Notifications
You must be signed in to change notification settings - Fork 88
Add reductions #83
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add reductions #83
Conversation
3a925a0
to
b642dd0
Compare
These will be complicated further by rust-lang/rust#83984 |
This reverts commit 1ea2f12.
b642dd0
to
e2fa502
Compare
All ready to go with #80 merged. |
Are these reducing ops horizontal ops? This needs more docs on the methods to explain what's happening. |
Yes they are, though I thought my documentation was pretty decent. Do you have any specific examples of what you mean? |
I would include the word "horizontal" so that the difference is more noteable at a glance, and so that search results in rustdoc let you see it sooner (since they show item path and first line of docs but not type). |
I'd also suggest that horizontal should be in the name, so that it can be more easily grouped when describing in e.g. whatever guide we end up writing on how to write good/efficient SIMD code. (E.g. One of the guidelines here would be to, where possible, try to avoid these in batch processing by deferring them until the end, because they're comparatively inefficient on most hardware — often Also, it makes it easy to search for if you know what you want. |
I kind of didn't want to use horizontal because to me it's lingo (intel specific maybe?) and not really descriptive, and I think any documentation using it would need to define what that means anyway. I just looked at the documentation for arm |
I think reduce is potentially a more universal term (but |
The term horizontal predates Intel's stuff certainly. It is lingo/jargon, but so is stuff like saturating. Reduce can mean a lot of different things, and seems way more overloaded to me. I could imagine someone expecting reducing a vector to mean reducing the number of lanes, as just an example. Horizontal, even if it is lingo/jargon, is pretty specific to SIMD, is defined in our guide (briefly) and the guide that was written for packed_simd (in depth), and is something of a critical concept to understand when writing SIMD — I suspect we have to come up with a name for it and provide a definition somewhere (and the docs can say "for more information on horizontal operations, see |
Do we maybe want all of the reductions to be named |
while that's certainly possible, consider that:
That said, i got up from a nap like, after you posted your most recent reply 14 minutes ago, so i could be kinda half-asleep rigtht now. |
Updated the function names and docs, I think the new names are more consistent and the docs a little clearer. |
PPC errors are spurious. |
Applicable Patchdiff --git a/crates/core_simd/src/reduction.rs b/crates/core_simd/src/reduction.rs
index e728f8a..118d1d8 100644
--- a/crates/core_simd/src/reduction.rs
+++ b/crates/core_simd/src/reduction.rs
@@ -4,46 +4,46 @@ macro_rules! impl_integer_reductions {
where
Self: crate::LanesAtMost32
{
- /// Horizontal wrapping add. Computes the sum of the lanes of the vector, with wrapping addition.
+ /// Horizontal wrapping add. Returns the sum of the lanes of the vector, with wrapping addition.
#[inline]
pub fn wrapping_sum(self) -> $scalar {
unsafe { crate::intrinsics::simd_reduce_add_ordered(self, 0) }
}
- /// Horizontal wrapping multiply. Computes the product of the lanes of the vector, with wrapping multiplication.
+ /// Horizontal wrapping multiply. Returns the product of the lanes of the vector, with wrapping multiplication.
#[inline]
pub fn wrapping_product(self) -> $scalar {
unsafe { crate::intrinsics::simd_reduce_mul_ordered(self, 1) }
}
- /// Horizontal bitwise "and". Computes the cumulative bitwise "and" across the lanes of
+ /// Horizontal bitwise "and". Returns the cumulative bitwise "and" across the lanes of
/// the vector.
#[inline]
pub fn horizontal_and(self) -> $scalar {
unsafe { crate::intrinsics::simd_reduce_and(self) }
}
- /// Horizontal bitwise "or". Computes the cumulative bitwise "or" across the lanes of
+ /// Horizontal bitwise "or". Returns the cumulative bitwise "or" across the lanes of
/// the vector.
#[inline]
pub fn horizontal_or(self) -> $scalar {
unsafe { crate::intrinsics::simd_reduce_or(self) }
}
- /// Horizontal bitwise "xor". Computes the cumulative bitwise "xor" across the lanes of
+ /// Horizontal bitwise "xor". Returns the cumulative bitwise "xor" across the lanes of
/// the vector.
#[inline]
pub fn horizontal_xor(self) -> $scalar {
unsafe { crate::intrinsics::simd_reduce_xor(self) }
}
- /// Horizontal maximum. Computes the maximum lane in the vector.
+ /// Horizontal maximum. Returns the maximum lane in the vector.
#[inline]
pub fn horizontal_max(self) -> $scalar {
unsafe { crate::intrinsics::simd_reduce_max(self) }
}
- /// Horizontal minimum. Computes the minimum lane in the vector.
+ /// Horizontal minimum. Returns the minimum lane in the vector.
#[inline]
pub fn horizontal_min(self) -> $scalar {
unsafe { crate::intrinsics::simd_reduce_min(self) }
@@ -59,7 +59,7 @@ macro_rules! impl_float_reductions {
Self: crate::LanesAtMost32
{
- /// Horizontal add. Computes the sum of the lanes of the vector.
+ /// Horizontal add. Returns the sum of the lanes of the vector.
#[inline]
pub fn sum(self) -> $scalar {
// LLVM sum is inaccurate on i586
@@ -70,7 +70,7 @@ macro_rules! impl_float_reductions {
}
}
- /// Horizontal multiply. Computes the sum of the lanes of the vector.
+ /// Horizontal multiply. Returns the sum of the lanes of the vector.
#[inline]
pub fn product(self) -> $scalar {
// LLVM product is inaccurate on i586
@@ -81,7 +81,7 @@ macro_rules! impl_float_reductions {
}
}
- /// Horizontal maximum. Computes the maximum lane in the vector.
+ /// Horizontal maximum. Returns the maximum lane in the vector.
///
/// Returns values based on equality, so a vector containing both `0.` and `-0.` may
/// return either. This function will not return `NaN` unless all lanes are `NaN`.
@@ -90,7 +90,7 @@ macro_rules! impl_float_reductions {
unsafe { crate::intrinsics::simd_reduce_max(self) }
}
- /// Horizontal minimum. Computes the minimum lane in the vector.
+ /// Horizontal minimum. Returns the minimum lane in the vector.
///
/// Returns values based on equality, so a vector containing both `0.` and `-0.` may
/// return either. This function will not return `NaN` unless all lanes are `NaN`. |
Re: nomenclature, vertical:lanewise::horizontal:vectorwise? |
Do we think this is good to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other than the possible name changes, looks solid.
crates/core_simd/src/reduction.rs
Outdated
{ | ||
/// Horizontal wrapping add. Returns the sum of the lanes of the vector, with wrapping addition. | ||
#[inline] | ||
pub fn wrapping_sum(self) -> $scalar { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should this say horizontal_
at the front of the name like the others?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it out of sum and product because I think the names are more self explanatory, since there's no "vertical" sum (and mirror the ones in std::iter
a bit, too).
Kind of like how it wouldn't really make sense to make it Mask{N}::horizontal_all
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
well, these types should implement the Sum and Product traits: https://doc.rust-lang.org/std/iter/trait.Sum.html but if you think it won't be confusing we can skip it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately we can't implement those traits since they only work on iterators. They get implemented on the iterator Item
, not the iterator itself.
More specifically you can already do vector.to_array().sum()
but it won't be SIMD.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
https://docs.rs/wide/0.6.4/wide/struct.f32x4.html#impl-Sum%3CRHS%3E you just do it like this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mmm, I think there's ways to get around that actually, where we can use our ability to interact with the underlying values of the iterator and transform calls into horizontal reductions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To elaborate on how that would look:
An iterator over a 16 byte vector (yon typical SSE/Neon register) that iterated one step (doing whatever) and then had product()
called on it could have the first lane set to 1 and then issue a horizontal mul instruction. Likewise, sum, but the additive identity is 0. This works because we can extract an ExactSizeIterator from a SIMD vector.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, first of all that would require different iterators from Sum and Product, which doesn't really make sense. But I think that's also just not a very good iterator for normal iteration, compared to an array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
how about having the iterator be something like:
struct Iter<SimdType: AsElements> {
value: SimdType,
range: Range<usize>,
}
impl<SimdType: AsElements> Iterator for Iter<SimdType> {
type Item = SimdType::Element;
fn next(&mut self) -> Option<Self::Item> {
let index = self.range.next()?;
unsafe { Some(self.value.get_element_unchecked(index)) }
}
}
impl<SimdType: AsElements> DoubleEndedIterator for Iter<SimdType> {
fn next_back(&mut self) -> Option<Self::Item> {
let index = self.range.next_back()?;
unsafe { Some(self.value.get_element_unchecked(index)) }
}
}
trait SumProductHelper<I: Iterator<Item=Self>> {
fn sum(iter: I) -> Self;
fn product(iter: I) -> Self;
}
impl<I: Iterator<Item=Self>> SumProductHelper<I> for i32 {
default fn sum(iter: I) -> Self {
iter.fold(0i32, Add::add)
}
default fn product(iter: I) -> Self {
iter.fold(1i32, Mul::mul)
}
}
impl<SimdType: AsElements<Element = Self>> SumProductHelper<Iter<SimdType>> for i32 {
fn sum(iter: Iter<SimdType>) -> Self {
iter.value.fill_outside_range(iter.range, 0i32).sum()
}
fn product(iter: Iter<SimdType>) -> Self {
iter.value.fill_outside_range(iter.range, 1i32).product()
}
}
impl Sum for i32 {
fn sum<I: Iterator<Item=Self>>(iter: I) -> Self {
SumProductHelper::sum(iter)
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose this would work, but it makes the iterator pretty big and wordy and I'm not convinced this is any better than just doing as_array
or as_slice
. Especially since accessing a slice is just getelementptr
which can optimize to insertelement
/extractelement
Updated all of those fn names |
@workingjubilee okay with naming etc? |
Looks good! |
Fixes #71.