Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) #8344

Luminarys · 2020-10-05T18:02:07Z

No description provided.

github-actions · 2020-10-05T18:14:39Z

Thanks for opening a pull request!

Could you open an issue for this pull request on JIRA?
https://issues.apache.org/jira/browse/ARROW

Then could you also rename pull request title in the following format?

ARROW-${JIRA_ID}: [${COMPONENT}] ${SUMMARY}

See also:

Luminarys · 2020-10-05T21:08:57Z

Added benchmark, multiplication takes ~21ns.

emkornfield · 2020-10-06T04:49:25Z

@Luminarys have you looked at the CI errors (I think there might be a few flaky things going on but wanted to check that you were ok merging)?

Luminarys · 2020-10-06T06:35:26Z

I'll take a closer look tommorow, but we should also wait for feedback from @MingyuZhong before proceeding.

Luminarys · 2020-10-06T18:14:53Z

I've looked through the CI failures, it seems there are a few kinds:

aws connector failure (I think this isn't our issue)
a python lint error (this should be fixed, but maybe not in this PR)
Arrow Gandiva compile error (same as above)
Some issue around the new constructor I defined (I'll investigate this)
MinGW SDK not found (I think this isn't our issue)

cpp/src/arrow/util/basic_decimal.cc

MingyuZhong · 2020-10-06T20:07:15Z

cpp/src/arrow/util/basic_decimal.cc

+// Multiply two N bit word components into a 2*N bit result, with high bits
+// stored in hi and low bits in lo.
+template <typename Word>
+void ExtendAndMultiplyUint(Word x, Word y, Word* hi, Word* lo) {


I think it's simpler if this method handles only uint64_t, and there is another method that takes std::array<uint64_t, n> and uses for loops like https://github.com/google/zetasql/blob/master/zetasql/common/multiprecision_int.h#L723. This way, ExtendAndMultiplyUint128 doesn't need to repeat the similar pattern.

Done. This saves a lot of code, though does take 60 ns for multiplication as opposed to 20 ns prior.

Can you try making ExtendAndMultiplyUint inline?

I realized that I wasn't using the native path prior, which is why the benchmark was so slow. Updated, new results are 32 ns when __uint128_t is used and 65 ns when uint64_t is used, which I think is more reasonable.

cpp/src/arrow/util/basic_decimal.cc

MingyuZhong · 2020-10-12T18:57:00Z

cpp/src/arrow/util/basic_decimal.cc

+// Multiply two N bit word components into a 2*N bit result, with high bits
+// stored in hi and low bits in lo.
+template <typename Word>
+void ExtendAndMultiplyUint(Word x, Word y, Word* hi, Word* lo) {


Can you try making ExtendAndMultiplyUint inline?

MingyuZhong · 2020-10-12T20:17:09Z

cpp/src/arrow/util/basic_decimal.cc

+// Multiply two N bit word components into a 2*N bit result, with high bits
+// stored in hi and low bits in lo.
+template <typename Word>
+inline void ExtendAndMultiplyUint(Word x, Word y, Word* hi, Word* lo) {


Now this method only needs to handle uint64 inputs, and it only needs to be defined in the #else block, right?

MingyuZhong · 2020-10-12T20:19:06Z

cpp/src/arrow/util/basic_decimal.cc

 #endif
+
+// Multiplies two N * 64 bit unsigned integer types, represented by a uint64_t
+// array into a same sized output. Overflow in multiplication is considered UB


What does UB mean?

Undefined Behavior, clarified in comments.

MingyuZhong · 2020-10-12T20:19:45Z

cpp/src/arrow/util/basic_decimal.cc

 #endif
+
+// Multiplies two N * 64 bit unsigned integer types, represented by a uint64_t


Please comment that the elements in the array inputs and output have little-endian order.

MingyuZhong · 2020-10-12T20:21:02Z

cpp/src/arrow/util/basic_decimal.cc

+  __uint128_t val_;
+};
+
+uint128_t operator*(const uint128_t& left, const uint128_t& right) {


Please try defining operator*= instead of operator*. Maybe this can help the compiler generate more efficient code.

This (or perhaps some other change I made) seems to have improved performance significantly, it takes 13 ns~ with native int128 and 40 ns~ with uint64 fallback.

Luminarys · 2020-10-12T21:22:01Z

It turns out one of the check failures is due to a compiler bug in Clang, I've tweaked the definition structure of the BasicDecimal256 header to handle this.

MingyuZhong

LGTM.

MingyuZhong · 2020-10-12T21:29:21Z

cpp/src/arrow/util/basic_decimal.cc

+// Multiplies two N * 64 bit unsigned integer types, represented by a uint64_t
+// array into a same sized output. Elements in the array should be in
+// little endian order, and output will be the same. Overflow in multiplication
+// is considered undefined behavior and will not be reported.


Is it really undefined? Isn't the output the lower N * 64 bits of the actual result?

When I say undefined here, I mean the value should not be relied on and is an implementation detail, i.e. people should only be calling this if they know the result will not overflow or do not care what happens if it does. Undefined Behavior maybe isn't correct because it implies the same kind of UB you get when you dereference a nullptr, etc.

I've tweaked the documentation though to reflect what actually happens since this file is the only consumer of the function anyways.

… not master) (apache#8344)

Luminarys added 15 commits October 2, 2020 16:31

Refactor BasicDecimal128 Multiplication to use unsigned helper

825f24e

Fix typo

43cec0e

Remove unused variable

7f22b19

Fix formatting

0725cf6

Make all variables const where possible

6c1b7b1

Rename helper methods

4db42dd

Include native 128 bit integer versions of multiplication

2dac4d6

Add initial Decimal256 multiplication support

8b272a9

Format

cd50114

Bug fixes

ac700d9

Convert some methods

3d5f74b

Cleanup test code

a21131b

Cleanup and refactor

cdacfaa

Add some docs

a5140d8

Improve Decimal256 Multiplication test

23abc2a

Luminarys changed the title ~~Add BasicDecimal256 Multiplication Support~~ Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) Oct 5, 2020

Luminarys added 2 commits October 5, 2020 13:46

Update docs

dbc7266

Add benchmark

40b4503

MingyuZhong suggested changes Oct 6, 2020

View reviewed changes

Use loop based multiplication

8774ef2

MingyuZhong suggested changes Oct 12, 2020

View reviewed changes

Ensure native __uint128_t is used in multiplication

0c6ab8e

MingyuZhong suggested changes Oct 12, 2020

View reviewed changes

Switch to *= multiplication, fix header to account for clang bug

0f0c907

Remove undef

1d3d624

MingyuZhong approved these changes Oct 12, 2020

View reviewed changes

Luminarys added 2 commits October 12, 2020 14:35

Update MultiplyUnsignedArray documentation

f2854a6

Update MultiplyUnsignedArray documentation again

41431e0

emkornfield merged commit ccd88e2 into apache:decimal256 Oct 12, 2020

emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 15, 2020

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch,…

4f01cc2

… not master) (apache#8344)

emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 17, 2020

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch,…

d2f06a7

… not master) (apache#8344)

emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 19, 2020

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch,…

9c4ef3d

… not master) (apache#8344)

emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 21, 2020

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch,…

04d3ac9

… not master) (apache#8344)

emkornfield pushed a commit to emkornfield/arrow that referenced this pull request Oct 23, 2020

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch,…

4cfba5d

… not master) (apache#8344)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) #8344

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) #8344

Luminarys commented Oct 5, 2020

github-actions bot commented Oct 5, 2020

Luminarys commented Oct 5, 2020

emkornfield commented Oct 6, 2020

Luminarys commented Oct 6, 2020

Luminarys commented Oct 6, 2020

MingyuZhong Oct 6, 2020

Luminarys Oct 10, 2020

MingyuZhong Oct 12, 2020

Luminarys Oct 12, 2020

MingyuZhong Oct 12, 2020

MingyuZhong Oct 12, 2020

Luminarys Oct 12, 2020

MingyuZhong Oct 12, 2020

Luminarys Oct 12, 2020

MingyuZhong Oct 12, 2020

Luminarys Oct 12, 2020

MingyuZhong Oct 12, 2020

Luminarys Oct 12, 2020

Luminarys commented Oct 12, 2020 •

edited

Loading

MingyuZhong left a comment

MingyuZhong Oct 12, 2020

Luminarys Oct 12, 2020

		#endif

		// Multiplies two N * 64 bit unsigned integer types, represented by a uint64_t

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) #8344

Add BasicDecimal256 Multiplication Support (PR for decimal256 branch, not master) #8344

Conversation

Luminarys commented Oct 5, 2020

github-actions bot commented Oct 5, 2020

Luminarys commented Oct 5, 2020

emkornfield commented Oct 6, 2020

Luminarys commented Oct 6, 2020

Luminarys commented Oct 6, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luminarys commented Oct 12, 2020 • edited Loading

MingyuZhong left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Luminarys commented Oct 12, 2020 •

edited

Loading