Skip to content

Add reductions #83

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 17 commits into from
Apr 22, 2021
Merged

Add reductions #83

merged 17 commits into from
Apr 22, 2021

Conversation

calebzulawski
Copy link
Member

@calebzulawski calebzulawski commented Mar 8, 2021

Fixes #71.

@calebzulawski calebzulawski changed the title Reductions Add reductions Mar 8, 2021
@workingjubilee
Copy link
Member

These will be complicated further by rust-lang/rust#83984

@calebzulawski calebzulawski marked this pull request as ready for review April 9, 2021 20:33
@calebzulawski
Copy link
Member Author

All ready to go with #80 merged.

@Lokathor
Copy link
Contributor

Lokathor commented Apr 9, 2021

Are these reducing ops horizontal ops? This needs more docs on the methods to explain what's happening.

@calebzulawski
Copy link
Member Author

Yes they are, though I thought my documentation was pretty decent. Do you have any specific examples of what you mean?

@calebzulawski calebzulawski mentioned this pull request Apr 9, 2021
@Lokathor
Copy link
Contributor

Lokathor commented Apr 9, 2021

I would include the word "horizontal" so that the difference is more noteable at a glance, and so that search results in rustdoc let you see it sooner (since they show item path and first line of docs but not type).

@thomcc
Copy link
Member

thomcc commented Apr 9, 2021

I'd also suggest that horizontal should be in the name, so that it can be more easily grouped when describing in e.g. whatever guide we end up writing on how to write good/efficient SIMD code.

(E.g. One of the guidelines here would be to, where possible, try to avoid these in batch processing by deferring them until the end, because they're comparatively inefficient on most hardware — often O(log lanes) or O(lanes))

Also, it makes it easy to search for if you know what you want.

@calebzulawski
Copy link
Member Author

I kind of didn't want to use horizontal because to me it's lingo (intel specific maybe?) and not really descriptive, and I think any documentation using it would need to define what that means anyway. I just looked at the documentation for arm vaddv and it doesn't use the word horizontal, it just defines it as "add across vector".

@calebzulawski
Copy link
Member Author

I think reduce is potentially a more universal term (but sum might still be more obvious than something like reduce_add?)

@thomcc
Copy link
Member

thomcc commented Apr 10, 2021

The term horizontal predates Intel's stuff certainly. It is lingo/jargon, but so is stuff like saturating.

Reduce can mean a lot of different things, and seems way more overloaded to me. I could imagine someone expecting reducing a vector to mean reducing the number of lanes, as just an example.

Horizontal, even if it is lingo/jargon, is pretty specific to SIMD, is defined in our guide (briefly) and the guide that was written for packed_simd (in depth), and is something of a critical concept to understand when writing SIMD — I suspect we have to come up with a name for it and provide a definition somewhere (and the docs can say "for more information on horizontal operations, see [the simd guide's entry on the topic](good link)), so I feel like we should use the common terminology for that, even if it's not perfect.

@calebzulawski
Copy link
Member Author

Do we maybe want all of the reductions to be named horizontal_sum, horizontal_xor, horizontal_min, etc and leave the docs mostly as-is?

@Lokathor
Copy link
Contributor

while that's certainly possible, consider that:

  1. there's really no value in under-documenting a function. in core even the most obvious math functions have a full explanation of what it does along with a rustdoc example. Having less lines of rustdoc doesn't make your program run faster.
  2. you yourself pointed out that "horizontal" is a jargon-y term just a few posts ago. If we take that as true (and honestly I think we can) then simply putting a jargony term on the front of the method names does not do a service to out beginner users.

That said, i got up from a nap like, after you posted your most recent reply 14 minutes ago, so i could be kinda half-asleep rigtht now.

@calebzulawski
Copy link
Member Author

Updated the function names and docs, I think the new names are more consistent and the docs a little clearer.

@workingjubilee
Copy link
Member

workingjubilee commented Apr 12, 2021

PPC errors are spurious.
Blocked, functionally, on a decision regarding wasm testing (issue #92).

@workingjubilee
Copy link
Member

Computes -> Returns?

Applicable Patch
diff --git a/crates/core_simd/src/reduction.rs b/crates/core_simd/src/reduction.rs
index e728f8a..118d1d8 100644
--- a/crates/core_simd/src/reduction.rs
+++ b/crates/core_simd/src/reduction.rs
@@ -4,46 +4,46 @@ macro_rules! impl_integer_reductions {
         where
             Self: crate::LanesAtMost32
         {
-            /// Horizontal wrapping add.  Computes the sum of the lanes of the vector, with wrapping addition.
+            /// Horizontal wrapping add.  Returns the sum of the lanes of the vector, with wrapping addition.
             #[inline]
             pub fn wrapping_sum(self) -> $scalar {
                 unsafe { crate::intrinsics::simd_reduce_add_ordered(self, 0) }
             }
 
-            /// Horizontal wrapping multiply.  Computes the product of the lanes of the vector, with wrapping multiplication.
+            /// Horizontal wrapping multiply.  Returns the product of the lanes of the vector, with wrapping multiplication.
             #[inline]
             pub fn wrapping_product(self) -> $scalar {
                 unsafe { crate::intrinsics::simd_reduce_mul_ordered(self, 1) }
             }
 
-            /// Horizontal bitwise "and".  Computes the cumulative bitwise "and" across the lanes of
+            /// Horizontal bitwise "and".  Returns the cumulative bitwise "and" across the lanes of
             /// the vector.
             #[inline]
             pub fn horizontal_and(self) -> $scalar {
                 unsafe { crate::intrinsics::simd_reduce_and(self) }
             }
 
-            /// Horizontal bitwise "or".  Computes the cumulative bitwise "or" across the lanes of
+            /// Horizontal bitwise "or".  Returns the cumulative bitwise "or" across the lanes of
             /// the vector.
             #[inline]
             pub fn horizontal_or(self) -> $scalar {
                 unsafe { crate::intrinsics::simd_reduce_or(self) }
             }
 
-            /// Horizontal bitwise "xor".  Computes the cumulative bitwise "xor" across the lanes of
+            /// Horizontal bitwise "xor".  Returns the cumulative bitwise "xor" across the lanes of
             /// the vector.
             #[inline]
             pub fn horizontal_xor(self) -> $scalar {
                 unsafe { crate::intrinsics::simd_reduce_xor(self) }
             }
 
-            /// Horizontal maximum.  Computes the maximum lane in the vector.
+            /// Horizontal maximum.  Returns the maximum lane in the vector.
             #[inline]
             pub fn horizontal_max(self) -> $scalar {
                 unsafe { crate::intrinsics::simd_reduce_max(self) }
             }
 
-            /// Horizontal minimum.  Computes the minimum lane in the vector.
+            /// Horizontal minimum.  Returns the minimum lane in the vector.
             #[inline]
             pub fn horizontal_min(self) -> $scalar {
                 unsafe { crate::intrinsics::simd_reduce_min(self) }
@@ -59,7 +59,7 @@ macro_rules! impl_float_reductions {
             Self: crate::LanesAtMost32
         {
 
-            /// Horizontal add.  Computes the sum of the lanes of the vector.
+            /// Horizontal add.  Returns the sum of the lanes of the vector.
             #[inline]
             pub fn sum(self) -> $scalar {
                 // LLVM sum is inaccurate on i586
@@ -70,7 +70,7 @@ macro_rules! impl_float_reductions {
                 }
             }
 
-            /// Horizontal multiply.  Computes the sum of the lanes of the vector.
+            /// Horizontal multiply.  Returns the sum of the lanes of the vector.
             #[inline]
             pub fn product(self) -> $scalar {
                 // LLVM product is inaccurate on i586
@@ -81,7 +81,7 @@ macro_rules! impl_float_reductions {
                 }
             }
 
-            /// Horizontal maximum.  Computes the maximum lane in the vector.
+            /// Horizontal maximum.  Returns the maximum lane in the vector.
             ///
             /// Returns values based on equality, so a vector containing both `0.` and `-0.` may
             /// return either.  This function will not return `NaN` unless all lanes are `NaN`.
@@ -90,7 +90,7 @@ macro_rules! impl_float_reductions {
                 unsafe { crate::intrinsics::simd_reduce_max(self) }
             }
 
-            /// Horizontal minimum.  Computes the minimum lane in the vector.
+            /// Horizontal minimum.  Returns the minimum lane in the vector.
             ///
             /// Returns values based on equality, so a vector containing both `0.` and `-0.` may
             /// return either.  This function will not return `NaN` unless all lanes are `NaN`.

@workingjubilee
Copy link
Member

workingjubilee commented Apr 15, 2021

Re: nomenclature, vertical:lanewise::horizontal:vectorwise?

@calebzulawski
Copy link
Member Author

Do we think this is good to go?

Copy link
Contributor

@Lokathor Lokathor left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Other than the possible name changes, looks solid.

{
/// Horizontal wrapping add. Returns the sum of the lanes of the vector, with wrapping addition.
#[inline]
pub fn wrapping_sum(self) -> $scalar {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this say horizontal_ at the front of the name like the others?

Copy link
Member Author

@calebzulawski calebzulawski Apr 19, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it out of sum and product because I think the names are more self explanatory, since there's no "vertical" sum (and mirror the ones in std::iter a bit, too).

Kind of like how it wouldn't really make sense to make it Mask{N}::horizontal_all.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

well, these types should implement the Sum and Product traits: https://doc.rust-lang.org/std/iter/trait.Sum.html but if you think it won't be confusing we can skip it for now

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unfortunately we can't implement those traits since they only work on iterators. They get implemented on the iterator Item, not the iterator itself.

More specifically you can already do vector.to_array().sum() but it won't be SIMD.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

@workingjubilee workingjubilee Apr 21, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mmm, I think there's ways to get around that actually, where we can use our ability to interact with the underlying values of the iterator and transform calls into horizontal reductions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To elaborate on how that would look:

An iterator over a 16 byte vector (yon typical SSE/Neon register) that iterated one step (doing whatever) and then had product() called on it could have the first lane set to 1 and then issue a horizontal mul instruction. Likewise, sum, but the additive identity is 0. This works because we can extract an ExactSizeIterator from a SIMD vector.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, first of all that would require different iterators from Sum and Product, which doesn't really make sense. But I think that's also just not a very good iterator for normal iteration, compared to an array.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

how about having the iterator be something like:

struct Iter<SimdType: AsElements> {
    value: SimdType,
    range: Range<usize>,
}

impl<SimdType: AsElements> Iterator for Iter<SimdType> {
    type Item = SimdType::Element;
    fn next(&mut self) -> Option<Self::Item> {
        let index = self.range.next()?;
        unsafe { Some(self.value.get_element_unchecked(index)) }
    }
}

impl<SimdType: AsElements> DoubleEndedIterator for Iter<SimdType> {
    fn next_back(&mut self) -> Option<Self::Item> {
        let index = self.range.next_back()?;
        unsafe { Some(self.value.get_element_unchecked(index)) }
    }
}

trait SumProductHelper<I: Iterator<Item=Self>> {
    fn sum(iter: I) -> Self;
    fn product(iter: I) -> Self;
}

impl<I: Iterator<Item=Self>> SumProductHelper<I> for i32 {
    default fn sum(iter: I) -> Self {
        iter.fold(0i32, Add::add)
    }
    default fn product(iter: I) -> Self {
        iter.fold(1i32, Mul::mul)
    }
}

impl<SimdType: AsElements<Element = Self>> SumProductHelper<Iter<SimdType>> for i32 {
    fn sum(iter: Iter<SimdType>) -> Self {
        iter.value.fill_outside_range(iter.range, 0i32).sum()
    }
    fn product(iter: Iter<SimdType>) -> Self {
        iter.value.fill_outside_range(iter.range, 1i32).product()
    }
}

impl Sum for i32 {
    fn sum<I: Iterator<Item=Self>>(iter: I) -> Self {
        SumProductHelper::sum(iter)
    }
}

Copy link
Member Author

@calebzulawski calebzulawski Apr 22, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this would work, but it makes the iterator pretty big and wordy and I'm not convinced this is any better than just doing as_array or as_slice. Especially since accessing a slice is just getelementptr which can optimize to insertelement/extractelement

@calebzulawski
Copy link
Member Author

Updated all of those fn names

@calebzulawski
Copy link
Member Author

@workingjubilee okay with naming etc?

@workingjubilee
Copy link
Member

Looks good!

@workingjubilee workingjubilee merged commit 24ebae8 into master Apr 22, 2021
@calebzulawski calebzulawski deleted the feature/reductions branch August 7, 2021 18:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

any() and all() methods for mask types
5 participants