Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modulus ops into ArrowNativeTypeOp #2756

Merged
merged 7 commits into from
Oct 3, 2022
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
62 changes: 45 additions & 17 deletions arrow/src/compute/kernels/arithmetic.rs
Original file line number Diff line number Diff line change
Expand Up @@ -22,7 +22,7 @@
//! `RUSTFLAGS="-C target-feature=+avx2"` for example. See the documentation
//! [here](https://doc.rust-lang.org/stable/core/arch/) for more information.

use std::ops::{Div, Neg, Rem};
use std::ops::{Div, Neg};

use num::{One, Zero};

Expand Down Expand Up @@ -168,7 +168,7 @@ fn simd_checked_modulus<T: ArrowNumericType>(
right: T::Simd,
) -> Result<T::Simd>
where
T::Native: One + Zero,
T::Native: ArrowNativeTypeOp + One,
{
let zero = T::init(T::Native::zero());
let one = T::init(T::Native::one());
Expand Down Expand Up @@ -291,7 +291,7 @@ fn simd_checked_divide_op<T, SI, SC>(
) -> Result<PrimitiveArray<T>>
where
T: ArrowNumericType,
T::Native: One + Zero,
T::Native: ArrowNativeTypeOp,
SI: Fn(Option<u64>, T::Simd, T::Simd) -> Result<T::Simd>,
SC: Fn(T::Native, T::Native) -> T::Native,
{
Expand Down Expand Up @@ -1075,20 +1075,14 @@ pub fn modulus<T>(
) -> Result<PrimitiveArray<T>>
where
T: ArrowNumericType,
T::Native: Rem<Output = T::Native> + Zero + One,
T::Native: ArrowNativeTypeOp + One,
{
#[cfg(feature = "simd")]
return simd_checked_divide_op(&left, &right, simd_checked_modulus::<T>, |a, b| {
a % b
a.mod_wrapping(b)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMD op doesn't support checking/wrapping variants. You just change its scalar op. We should leave it as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a % b will panic at overflow, but a.wrapping_rem(b) won't:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6d04230ac4454c895c2fe9e417c13938

If we don't want simd_checked_divide_op to panic, I guess we should use mod_wrapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_checked_divide_op has two parts, one is SIMD op, one is scalar op. SIMD op doesn't support checking/wrapping variants, and I don't think we should change only scalar op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should change only scalar op.

Thank you for explanation. This makes sense. I will update the code and add tests to freeze the behaviour.

#[cfg(not(feature = "simd"))]
return try_binary(left, right, |a, b| {
if b.is_zero() {
Err(ArrowError::DivideByZero)
} else {
Ok(a % b)
}
});
return try_binary(left, right, |a, b| a.mod_checked_divide_by_zero(b));
}

/// Perform `left / right` operation on two arrays. If either left or right value is null
Expand Down Expand Up @@ -1191,13 +1185,13 @@ pub fn modulus_scalar<T>(
) -> Result<PrimitiveArray<T>>
where
T: ArrowNumericType,
T::Native: Rem<Output = T::Native> + Zero,
T::Native: ArrowNativeTypeOp,
{
if modulo.is_zero() {
return Err(ArrowError::DivideByZero);
}

Ok(unary(array, |a| a % modulo))
Ok(unary(array, |a| a.mod_wrapping(modulo)))
}

/// Divide every value in an array by a scalar. If any value in the array is null then the
Expand Down Expand Up @@ -1769,7 +1763,7 @@ mod tests {
}

#[test]
fn test_primitive_array_modulus() {
fn test_int_array_modulus() {
let a = Int32Array::from(vec![15, 15, 8, 1, 9]);
let b = Int32Array::from(vec![5, 6, 8, 9, 1]);
let c = modulus(&a, &b).unwrap();
Expand All @@ -1780,6 +1774,24 @@ mod tests {
assert_eq!(0, c.value(4));
}

#[test]
#[should_panic(
expected = "called `Result::unwrap()` on an `Err` value: DivideByZero"
)]
fn test_int_array_modulus_divide_by_zero() {
let a = Int32Array::from(vec![1]);
let b = Int32Array::from(vec![0]);
modulus(&a, &b).unwrap();
}

#[test]
fn test_int_array_modulus_overflow_wrapping() {
let a = Int32Array::from(vec![i32::MIN]);
let b = Int32Array::from(vec![-1]);
let result = modulus(&a, &b).unwrap();
assert_eq!(0, result.value(0))
}

#[test]
fn test_primitive_array_divide_scalar() {
let a = Int32Array::from(vec![15, 14, 9, 8, 1]);
Expand Down Expand Up @@ -1842,7 +1854,7 @@ mod tests {
}

#[test]
fn test_primitive_array_modulus_scalar() {
fn test_int_array_modulus_scalar() {
let a = Int32Array::from(vec![15, 14, 9, 8, 1]);
let b = 3;
let c = modulus_scalar(&a, b).unwrap();
Expand All @@ -1851,7 +1863,7 @@ mod tests {
}

#[test]
fn test_primitive_array_modulus_scalar_sliced() {
fn test_int_array_modulus_scalar_sliced() {
let a = Int32Array::from(vec![Some(15), None, Some(9), Some(8), None]);
let a = a.slice(1, 4);
let a = as_primitive_array(&a);
Expand All @@ -1860,6 +1872,22 @@ mod tests {
assert_eq!(actual, expected);
}

#[test]
#[should_panic(
expected = "called `Result::unwrap()` on an `Err` value: DivideByZero"
)]
fn test_int_array_modulus_scalar_divide_by_zero() {
let a = Int32Array::from(vec![1]);
modulus_scalar(&a, 0).unwrap();
}

#[test]
fn test_int_array_modulus_scalar_overflow_wrapping() {
let a = Int32Array::from(vec![i32::MIN]);
let result = modulus_scalar(&a, -1).unwrap();
assert_eq!(0, result.value(0))
}

#[test]
fn test_primitive_array_divide_sliced() {
let a = Int32Array::from(vec![0, 0, 0, 15, 15, 8, 1, 9, 0]);
Expand Down
42 changes: 41 additions & 1 deletion arrow/src/datatypes/native.rs
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ pub(crate) mod native_op {
use super::ArrowNativeType;
use crate::error::{ArrowError, Result};
use num::Zero;
use std::ops::{Add, Div, Mul, Sub};
use std::ops::{Add, Div, Mul, Rem, Sub};

/// Trait for ArrowNativeType to provide overflow-checking and non-overflow-checking
/// variants for arithmetic operations. For floating point types, this provides some
Expand All @@ -65,6 +65,7 @@ pub(crate) mod native_op {
+ Sub<Output = Self>
+ Mul<Output = Self>
+ Div<Output = Self>
+ Rem<Output = Self>
+ Zero
{
fn add_checked(self, rhs: Self) -> Result<Self> {
Expand Down Expand Up @@ -102,6 +103,28 @@ pub(crate) mod native_op {
fn div_wrapping(self, rhs: Self) -> Self {
self / rhs
}

/// Check `DivideByZero` error and `Overflow` error.
fn mod_fully_checked(self, rhs: Self) -> Result<Self> {
if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
Ok(self.mod_wrapping(rhs))
}
}

/// Only check `DivideByZero` error
fn mod_checked_divide_by_zero(self, rhs: Self) -> Result<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like to pollute the API arbitrarily. Could you keep two APIs (_wrapping, _checked) for mod?

Copy link
Contributor Author

@HaoYang670 HaoYang670 Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I add 3 APIs here is that mod and div is different from other ops.
For add, sub, mul ..., we only need to consider the Overflow error, so two APIs (_wrapping, _checked) are enough.
But for mod and div, there are 2 kinds of errors Overflow and DivideByZero, which means we need to add new APIs if we only want to check one of them. And in the arithmetic kernel, we have 3 different divide APIs (mod is similar ):

  • divide: doesn't check DivideByZero or Overflow (This uses _wrapping)
  • divide_opt: only checks the DivideByZero. (This is similar to the _checked_divided_by_zero API I add, except that it returns an Option but not `Result)
  • divide_checked: checks both DivideByZero error and Overflow error. (this uses _checked)

An alternative I think is that we could make mod_checked_divide_by_zero return an Option to match the behavior of div_opt / mod_opt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, it is very easy to do checking of division by zero only check (as did in divide_opt). Just need to do additional is_zero check along with calling wrapping API. Not to mention that this is special case (only divide and modulus). I don't see there is a need to add additional API there. I think it is better to keep a simple API instead of adding new ones when you can use existing APIs to do the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep a simple API instead of adding new ones when you can use existing APIs to do the same thing.

This makes sense. I will update the PR.

if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
Ok(self.mod_wrapping(rhs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ok(self.mod_wrapping(rhs))
Ok(self % rhs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, % is a fallible function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is default implementation for floating point values...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the panic only occurs on signed integers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not changed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

}
}

fn mod_wrapping(self, rhs: Self) -> Self {
self % rhs
}
}
}

Expand Down Expand Up @@ -163,6 +186,23 @@ macro_rules! native_type_op {
fn div_wrapping(self, rhs: Self) -> Self {
self.wrapping_div(rhs)
}

fn mod_fully_checked(self, rhs: Self) -> Result<Self> {
if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
self.checked_rem(rhs).ok_or_else(|| {
ArrowError::ComputeError(format!(
"Overflow happened on: {:?} % {:?}",
self, rhs
))
})
}
}

fn mod_wrapping(self, rhs: Self) -> Self {
self.wrapping_rem(rhs)
}
}
};
}
Expand Down