Closes #2156: Bigint stream benchmark #2157

stress-tess · 2023-02-17T23:14:10Z

This PR (closes #2156 and closes #2165) adds a bigint stream benchmark. The bigint stream does not scale well. I couldn't find any obvious optimizations

bigint_stream after this PR:

% ./benchmarks/run_benchmarks.py bigint_stream -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 22.1851 sec
Average bigint stream rate = 0.10 GiB/sec

% ./benchmarks/run_benchmarks.py bigint_stream -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 20.6594 sec
Average bigint stream rate = 0.11 GiB/sec

bigint_stream before this PR:

% ./benchmarks/run_benchmarks.py bigint_stream -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 57.1895 sec
Average bigint stream rate = 0.04 GiB/sec

% ./benchmarks/run_benchmarks.py bigint_stream -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 58.1247 sec
Average bigint stream rate = 0.04 GiB/sec

Compared to non-bigint stream:

% ./benchmarks/run_benchmarks.py stream                    
array size = 100,000,000
number of trials =  6
>>> arkouda float64 stream
numLocales = 1, N = 100,000,000
Average time = 0.2566 sec
Average rate = 8.71 GiB/sec

bigint bitwise binops after this PR:

% ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 9.9440 sec
Average bigint AND rate = 0.30 GiB/sec

Average bigint OR time = 14.2256 sec
Average bigint OR rate = 0.21 GiB/sec

Average bigint SHIFT time = 23.6632 sec
Average bigint SHIFT rate = 0.06 GiB/sec

% ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 9.2267 sec
Average bigint AND rate = 0.32 GiB/sec

Average bigint OR time = 14.2185 sec
Average bigint OR rate = 0.21 GiB/sec

Average bigint SHIFT time = 22.9780 sec
Average bigint SHIFT rate = 0.06 GiB/sec

bigint bitwise binops before this PR:

% ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 21.7934 sec
Average bigint AND rate = 0.14 GiB/sec

Average bigint OR time = 26.4953 sec
Average bigint OR rate = 0.11 GiB/sec

Average bigint SHIFT time = 39.9405 sec
Average bigint SHIFT rate = 0.04 GiB/sec

 ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 22.2118 sec
Average bigint AND rate = 0.13 GiB/sec

Average bigint OR time = 25.9856 sec
Average bigint OR rate = 0.11 GiB/sec

Average bigint SHIFT time = 39.0686 sec
Average bigint SHIFT rate = 0.04 GiB/sec

src/BinOp.chpl

stress-tess · 2023-02-21T00:45:42Z

I found some optimizations by refactoring code to try and favor inplace ops. So

// instead of
tmp = la * ra;
// do
tmp = la;
tmp *= ra;

The performance is better but still not great. I need to update doBigIntBinOpsv and the bool return methods. I also think I'm gonna go ahead and add the logical/arithmetic benchmark in this PR to get a better idea of the perf boost these changes provide. Converting to a draft until I wrap that up

ronawho · 2023-02-23T18:58:07Z

I also see fairly poor performance with 16-node-cs-hdr, but I see much better performance when enabling parallel array deinit (chapel-lang/chapel#21670), which we saw benefit other bigint operations:

chapel 1.29:

>>> arkouda float64 stream
Average time = 0.0597 sec
Average rate = 599.50 GiB/sec

>>> arkouda bigint stream
Average bigint stream time = 14.5178 sec
Average bigint stream rate = 2.46 GiB/sec

chapel main w/ chapel-lang/chapel#21670 (parallel deinit):

>>> arkouda float64 stream
numLocales = 16, N = 1,600,000,000
Average time = 0.0589 sec
Average rate = 607.65 GiB/sec

>>> arkouda bigint stream
Average bigint stream time = 0.8836 sec
Average bigint stream rate = 40.47 GiB/sec

So ~20x improvement from parallel deinit. Still ~15x off from int/float stream, which is a little higher than I might expect, but not terrible.

stress-tess · 2023-02-23T19:03:44Z

So ~20x improvement from parallel deinit. Still ~15x off from int/float stream, which is a little higher than I might expect, but not terrible.

This is great news!!! So the bigint code will be much more performant in 1.30! Thanks for looking into this @ronawho!

ronawho · 2023-02-23T19:10:31Z

So the bigint code will be much more performant in 1.30!

Yeah, I think we'll highly highly recommend 1.30 for anybody using bigints when it's released (given the performance improvements, bug fixes, and implementation cleanup)

stress-tess · 2023-02-23T21:28:39Z

This code should be ready to review. I will put up my most updated perf comparison either tonight or tomorrow but it's too pretty not to be outside rn

This PR (closes Bears-R-Us#2156 and closes Bears-R-Us#2165) adds a bigint stream and bigint bitwise binops benchmark

Ethan-DeBandi99

Nothing jumps out as an issue to me.

joshmarshall1

Looks good to me

ronawho · 2023-02-27T20:39:25Z

src/BinOp.chpl

@@ -1148,7 +1174,12 @@ module BinOp
            var divideBy = makeDistArray(la.size, bigint);
            divideBy = 1:bigint;


@pierce314159 we're seeing bigint_bitwise_binops.py timeout during nightly testing for >>. I think the problem is probably here with assigning all elements to a bigint 1 that lives on locale 0. I think you could get rid of the tmp array and just do:

forall t in tmp with (var dB = (1:bigint) << val, var local_max_size = max_size) {

Whoops, I meant this as a comment for the binopVS case. For the binopVV cases you should be able to do something like:

forall (t, ri) in zip(tmp, ra) with (var local_max_size = max_size) { var dB = 1:bigint; dB <<= ri; t /= dB;

Opened #2178

Now that bigint is using the shared code with the other types for indexing implemented in Bears-R-Us#2081, there was a bug uncovered in the bigint module failing with remote assignments that will be fixed in 1.30. For 1.29, using task-private copies is a suitable workaround to avoid the remote assignmnets. Also, this will result in a performance improvement for bigint indexing, as discussed in Bears-R-Us#2157 (comment).

Now that bigint is using the shared code with the other types for indexing implemented in #2081, there was a bug uncovered in the bigint module failing with remote assignments that will be fixed in 1.30. For 1.29, using task-private copies is a suitable workaround to avoid the remote assignmnets. Also, this will result in a performance improvement for bigint indexing, as discussed in #2157 (comment).

stress-tess requested review from ronawho, Ethan-DeBandi99, joshmarshall1 and jaketrookman February 17, 2023 23:14

stress-tess commented Feb 17, 2023

View reviewed changes

src/BinOp.chpl Outdated Show resolved Hide resolved

stress-tess marked this pull request as draft February 21, 2023 00:42

stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 42adc22 to 976458d Compare February 23, 2023 21:02

stress-tess marked this pull request as ready for review February 23, 2023 21:28

stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 976458d to 7a46057 Compare February 24, 2023 16:35

Closes Bears-R-Us#2156: Bigint stream benchmark

b8f7f86

This PR (closes Bears-R-Us#2156 and closes Bears-R-Us#2165) adds a bigint stream and bigint bitwise binops benchmark

stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 7a46057 to b8f7f86 Compare February 24, 2023 18:38

restructure to do if has_max_bits inside forall

e792ec0

stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 55a8b7f to e792ec0 Compare February 24, 2023 20:08

Ethan-DeBandi99 approved these changes Feb 24, 2023

View reviewed changes

jaketrookman approved these changes Feb 24, 2023

View reviewed changes

joshmarshall1 approved these changes Feb 24, 2023

View reviewed changes

ronawho approved these changes Feb 25, 2023

View reviewed changes

Ethan-DeBandi99 added this pull request to the merge queue Feb 27, 2023

Merged via the queue into Bears-R-Us:master with commit ff291e3 Feb 27, 2023

stress-tess deleted the 2156_bigint_stream_benchmark branch February 27, 2023 14:59

ronawho reviewed Feb 27, 2023

View reviewed changes

bmcdonald3 mentioned this pull request Mar 10, 2023

Create task private copies of values for pdarray=value #2207

Closed

bmcdonald3 mentioned this pull request Mar 10, 2023

Closes #2207: Create task private copies of values for pdarray=value #2208

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Closes #2156: Bigint stream benchmark #2157

Closes #2156: Bigint stream benchmark #2157

stress-tess commented Feb 17, 2023 •

edited

Loading

stress-tess commented Feb 21, 2023 •

edited

Loading

ronawho commented Feb 23, 2023

stress-tess commented Feb 23, 2023

ronawho commented Feb 23, 2023

stress-tess commented Feb 23, 2023

Ethan-DeBandi99 left a comment

joshmarshall1 left a comment

ronawho Feb 27, 2023

ronawho Feb 27, 2023

ronawho Feb 27, 2023

		@@ -1148,7 +1174,12 @@ module BinOp
		var divideBy = makeDistArray(la.size, bigint);
		divideBy = 1:bigint;

Closes #2156: Bigint stream benchmark #2157

Closes #2156: Bigint stream benchmark #2157

Conversation

stress-tess commented Feb 17, 2023 • edited Loading

stress-tess commented Feb 21, 2023 • edited Loading

ronawho commented Feb 23, 2023

stress-tess commented Feb 23, 2023

ronawho commented Feb 23, 2023

stress-tess commented Feb 23, 2023

Ethan-DeBandi99 left a comment

Choose a reason for hiding this comment

joshmarshall1 left a comment

Choose a reason for hiding this comment

ronawho Feb 27, 2023

Choose a reason for hiding this comment

ronawho Feb 27, 2023

Choose a reason for hiding this comment

ronawho Feb 27, 2023

Choose a reason for hiding this comment

stress-tess commented Feb 17, 2023 •

edited

Loading

stress-tess commented Feb 21, 2023 •

edited

Loading