Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add modulus ops into ArrowNativeTypeOp #2756

Merged
merged 7 commits into from
Oct 3, 2022

Conversation

HaoYang670
Copy link
Contributor

@HaoYang670 HaoYang670 commented Sep 17, 2022

Signed-off-by: remzi 13716567376yh@gmail.com

Which issue does this PR close?

Closes #2753.

Rationale for this change

What changes are included in this PR?

  1. Add 2 kinds of modulus ops:
  • mod_wrapping: panics when divisor is 0 and returns 0 when overflow occurs (signed_int::MIN % -1)
  • mod_checked: checks the DivideByZero error and Overflow error, never panic.
  1. Add tests to check the behaviour of n % 0 and overflow.

Are there any user-facing changes?

Signed-off-by: remzi <13716567376yh@gmail.com>
@github-actions github-actions bot added the arrow Changes to the arrow crate label Sep 17, 2022
Signed-off-by: remzi <13716567376yh@gmail.com>
}

/// Only check `DivideByZero` error
fn mod_checked_divide_by_zero(self, rhs: Self) -> Result<Self> {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really like to pollute the API arbitrarily. Could you keep two APIs (_wrapping, _checked) for mod?

Copy link
Contributor Author

@HaoYang670 HaoYang670 Sep 18, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I add 3 APIs here is that mod and div is different from other ops.
For add, sub, mul ..., we only need to consider the Overflow error, so two APIs (_wrapping, _checked) are enough.
But for mod and div, there are 2 kinds of errors Overflow and DivideByZero, which means we need to add new APIs if we only want to check one of them. And in the arithmetic kernel, we have 3 different divide APIs (mod is similar ):

  • divide: doesn't check DivideByZero or Overflow (This uses _wrapping)
  • divide_opt: only checks the DivideByZero. (This is similar to the _checked_divided_by_zero API I add, except that it returns an Option but not `Result)
  • divide_checked: checks both DivideByZero error and Overflow error. (this uses _checked)

An alternative I think is that we could make mod_checked_divide_by_zero return an Option to match the behavior of div_opt / mod_opt

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you can see, it is very easy to do checking of division by zero only check (as did in divide_opt). Just need to do additional is_zero check along with calling wrapping API. Not to mention that this is special case (only divide and modulus). I don't see there is a need to add additional API there. I think it is better to keep a simple API instead of adding new ones when you can use existing APIs to do the same thing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is better to keep a simple API instead of adding new ones when you can use existing APIs to do the same thing.

This makes sense. I will update the PR.

Signed-off-by: remzi <13716567376yh@gmail.com>
@HaoYang670
Copy link
Contributor Author

Hi @viirya, could we merge this?

if rhs.is_zero() {
Err(ArrowError::DivideByZero)
} else {
Ok(self.mod_wrapping(rhs))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
Ok(self.mod_wrapping(rhs))
Ok(self % rhs)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ditto, % is a fallible function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is default implementation for floating point values...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the panic only occurs on signed integers

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not changed yet.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Comment on lines 1081 to 1083
return simd_checked_divide_op(&left, &right, simd_checked_modulus::<T>, |a, b| {
a % b
a.mod_wrapping(b)
});
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMD op doesn't support checking/wrapping variants. You just change its scalar op. We should leave it as it is.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

a % b will panic at overflow, but a.wrapping_rem(b) won't:
https://play.rust-lang.org/?version=stable&mode=debug&edition=2021&gist=6d04230ac4454c895c2fe9e417c13938

If we don't want simd_checked_divide_op to panic, I guess we should use mod_wrapping?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

simd_checked_divide_op has two parts, one is SIMD op, one is scalar op. SIMD op doesn't support checking/wrapping variants, and I don't think we should change only scalar op.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we should change only scalar op.

Thank you for explanation. This makes sense. I will update the code and add tests to freeze the behaviour.

@viirya
Copy link
Member

viirya commented Sep 22, 2022

I have two comments. Otherwise looks good for me.

@tustvold tustvold changed the title Add modulus ops into ArrawNativeTypeOp Add modulus ops into ArrowNativeTypeOp Sep 23, 2022
@alamb
Copy link
Contributor

alamb commented Oct 3, 2022

@HaoYang670 is this PR ready to merge?

@HaoYang670 HaoYang670 merged commit 191eaef into apache:master Oct 3, 2022
@HaoYang670
Copy link
Contributor Author

Thank you, @viirya @alamb for your review 😁

@HaoYang670 HaoYang670 deleted the 2753_add_mod branch October 3, 2022 23:26
@ursabot
Copy link

ursabot commented Oct 3, 2022

Benchmark runs are scheduled for baseline = 35c313b and contender = 191eaef. 191eaef is a master commit associated with this PR. Results will be available as each benchmark for each run completes.
Conbench compare runs links:
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ec2-t3-xlarge-us-east-2] ec2-t3-xlarge-us-east-2
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on test-mac-arm] test-mac-arm
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-i9-9960x] ursa-i9-9960x
[Skipped ⚠️ Benchmarking of arrow-rs-commits is not supported on ursa-thinkcentre-m75q] ursa-thinkcentre-m75q
Buildkite builds:
Supported benchmarks:
ec2-t3-xlarge-us-east-2: Supported benchmark langs: Python, R. Runs only benchmarks with cloud = True
test-mac-arm: Supported benchmark langs: C++, Python, R
ursa-i9-9960x: Supported benchmark langs: Python, R, JavaScript
ursa-thinkcentre-m75q: Supported benchmark langs: C++, Java

@alamb
Copy link
Contributor

alamb commented Oct 4, 2022

lol I didn't do anything except generate some alert spam :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrow Changes to the arrow crate
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add modulus op into ArrowNativeTypeOp
4 participants