Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) #8344

Merged
merged 23 commits into from
Oct 12, 2020

Conversation

Luminarys
Copy link
Contributor

No description provided.

@github-actions
Copy link

github-actions bot commented Oct 5, 2020

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

@Luminarys Luminarys changed the title Add BasicDecimal256 Multiplication Support Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) Oct 5, 2020
@Luminarys
Copy link
Contributor Author

Added benchmark, multiplication takes ~21ns.

@emkornfield
Copy link
Contributor

@Luminarys have you looked at the CI errors (I think there might be a few flaky things going on but wanted to check that you were ok merging)?

@Luminarys
Copy link
Contributor Author

I'll take a closer look tommorow, but we should also wait for feedback from @MingyuZhong before proceeding.

@Luminarys
Copy link
Contributor Author

I've looked through the CI failures, it seems there are a few kinds:

  1. aws connector failure (I think this isn't our issue)
  2. a python lint error (this should be fixed, but maybe not in this PR)
  3. Arrow Gandiva compile error (same as above)
  4. Some issue around the new constructor I defined (I'll investigate this)
  5. MinGW SDK not found (I think this isn't our issue)

cpp/src/arrow/util/basic_decimal.cc Show resolved Hide resolved
cpp/src/arrow/util/basic_decimal.cc Outdated Show resolved Hide resolved
// Multiply two N bit word components into a 2*N bit result, with high bits
// stored in hi and low bits in lo.
template <typename Word>
void ExtendAndMultiplyUint(Word x, Word y, Word* hi, Word* lo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it's simpler if this method handles only uint64_t, and there is another method that takes std::array<uint64_t, n> and uses for loops like https://github.com/google/zetasql/blob/master/zetasql/common/multiprecision_int.h#L723. This way, ExtendAndMultiplyUint128 doesn't need to repeat the similar pattern.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. This saves a lot of code, though does take 60 ns for multiplication as opposed to 20 ns prior.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try making ExtendAndMultiplyUint inline?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I realized that I wasn't using the native path prior, which is why the benchmark was so slow. Updated, new results are 32 ns when __uint128_t is used and 65 ns when uint64_t is used, which I think is more reasonable.

cpp/src/arrow/util/basic_decimal.cc Show resolved Hide resolved
cpp/src/arrow/util/basic_decimal.cc Outdated Show resolved Hide resolved
// Multiply two N bit word components into a 2*N bit result, with high bits
// stored in hi and low bits in lo.
template <typename Word>
void ExtendAndMultiplyUint(Word x, Word y, Word* hi, Word* lo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you try making ExtendAndMultiplyUint inline?

// Multiply two N bit word components into a 2*N bit result, with high bits
// stored in hi and low bits in lo.
template <typename Word>
inline void ExtendAndMultiplyUint(Word x, Word y, Word* hi, Word* lo) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now this method only needs to handle uint64 inputs, and it only needs to be defined in the #else block, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done,

#endif

// Multiplies two N * 64 bit unsigned integer types, represented by a uint64_t
// array into a same sized output. Overflow in multiplication is considered UB
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does UB mean?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Undefined Behavior, clarified in comments.

#endif

// Multiplies two N * 64 bit unsigned integer types, represented by a uint64_t
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please comment that the elements in the array inputs and output have little-endian order.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

__uint128_t val_;
};

uint128_t operator*(const uint128_t& left, const uint128_t& right) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please try defining operator*= instead of operator*. Maybe this can help the compiler generate more efficient code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This (or perhaps some other change I made) seems to have improved performance significantly, it takes 13 ns~ with native int128 and 40 ns~ with uint64 fallback.

@Luminarys
Copy link
Contributor Author

Luminarys commented Oct 12, 2020

It turns out one of the check failures is due to a compiler bug in Clang, I've tweaked the definition structure of the BasicDecimal256 header to handle this.

Copy link
Contributor

@MingyuZhong MingyuZhong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

// Multiplies two N * 64 bit unsigned integer types, represented by a uint64_t
// array into a same sized output. Elements in the array should be in
// little endian order, and output will be the same. Overflow in multiplication
// is considered undefined behavior and will not be reported.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it really undefined? Isn't the output the lower N * 64 bits of the actual result?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I say undefined here, I mean the value should not be relied on and is an implementation detail, i.e. people should only be calling this if they know the result will not overflow or do not care what happens if it does. Undefined Behavior maybe isn't correct because it implies the same kind of UB you get when you dereference a nullptr, etc.

I've tweaked the documentation though to reflect what actually happens since this file is the only consumer of the function anyways.

@emkornfield emkornfield merged commit ccd88e2 into apache:decimal256 Oct 12, 2020
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 15, 2020
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 17, 2020
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 19, 2020
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 21, 2020
emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants