Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closes #2156: Bigint stream benchmark #2157

Merged

Conversation

stress-tess
Copy link
Member

@stress-tess stress-tess commented Feb 17, 2023

This PR (closes #2156 and closes #2165) adds a bigint stream benchmark. The bigint stream does not scale well. I couldn't find any obvious optimizations

bigint_stream after this PR:

% ./benchmarks/run_benchmarks.py bigint_stream -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 22.1851 sec
Average bigint stream rate = 0.10 GiB/sec

% ./benchmarks/run_benchmarks.py bigint_stream -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 20.6594 sec
Average bigint stream rate = 0.11 GiB/sec

bigint_stream before this PR:

% ./benchmarks/run_benchmarks.py bigint_stream -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 57.1895 sec
Average bigint stream rate = 0.04 GiB/sec

% ./benchmarks/run_benchmarks.py bigint_stream -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint stream
numLocales = 1, N = 100,000,000
Average bigint stream time = 58.1247 sec
Average bigint stream rate = 0.04 GiB/sec

Compared to non-bigint stream:

% ./benchmarks/run_benchmarks.py stream                    
array size = 100,000,000
number of trials =  6
>>> arkouda float64 stream
numLocales = 1, N = 100,000,000
Average time = 0.2566 sec
Average rate = 8.71 GiB/sec

bigint bitwise binops after this PR:

% ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 9.9440 sec
Average bigint AND rate = 0.30 GiB/sec

Average bigint OR time = 14.2256 sec
Average bigint OR rate = 0.21 GiB/sec

Average bigint SHIFT time = 23.6632 sec
Average bigint SHIFT rate = 0.06 GiB/sec

% ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 9.2267 sec
Average bigint AND rate = 0.32 GiB/sec

Average bigint OR time = 14.2185 sec
Average bigint OR rate = 0.21 GiB/sec

Average bigint SHIFT time = 22.9780 sec
Average bigint SHIFT rate = 0.06 GiB/sec

bigint bitwise binops before this PR:

% ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 21.7934 sec
Average bigint AND rate = 0.14 GiB/sec

Average bigint OR time = 26.4953 sec
Average bigint OR rate = 0.11 GiB/sec

Average bigint SHIFT time = 39.9405 sec
Average bigint SHIFT rate = 0.04 GiB/sec

 ./benchmarks/run_benchmarks.py bigint_bitwise_binops -t 1 --max-bits=64
array size = 100,000,000
number of trials =  1
>>> arkouda bigint bitwise binops
numLocales = 1, N = 100,000,000
Average bigint AND time = 22.2118 sec
Average bigint AND rate = 0.13 GiB/sec

Average bigint OR time = 25.9856 sec
Average bigint OR rate = 0.11 GiB/sec

Average bigint SHIFT time = 39.0686 sec
Average bigint SHIFT rate = 0.04 GiB/sec

src/BinOp.chpl Outdated Show resolved Hide resolved
@stress-tess stress-tess marked this pull request as draft February 21, 2023 00:42
@stress-tess
Copy link
Member Author

stress-tess commented Feb 21, 2023

I found some optimizations by refactoring code to try and favor inplace ops. So

// instead of
tmp = la * ra;
// do
tmp = la;
tmp *= ra;

The performance is better but still not great. I need to update doBigIntBinOpsv and the bool return methods. I also think I'm gonna go ahead and add the logical/arithmetic benchmark in this PR to get a better idea of the perf boost these changes provide. Converting to a draft until I wrap that up

@ronawho
Copy link
Contributor

ronawho commented Feb 23, 2023

I also see fairly poor performance with 16-node-cs-hdr, but I see much better performance when enabling parallel array deinit (chapel-lang/chapel#21670), which we saw benefit other bigint operations:

chapel 1.29:

>>> arkouda float64 stream
Average time = 0.0597 sec
Average rate = 599.50 GiB/sec

>>> arkouda bigint stream
Average bigint stream time = 14.5178 sec
Average bigint stream rate = 2.46 GiB/sec

chapel main w/ chapel-lang/chapel#21670 (parallel deinit):

>>> arkouda float64 stream
numLocales = 16, N = 1,600,000,000
Average time = 0.0589 sec
Average rate = 607.65 GiB/sec

>>> arkouda bigint stream
Average bigint stream time = 0.8836 sec
Average bigint stream rate = 40.47 GiB/sec

So ~20x improvement from parallel deinit. Still ~15x off from int/float stream, which is a little higher than I might expect, but not terrible.

@stress-tess
Copy link
Member Author

So ~20x improvement from parallel deinit. Still ~15x off from int/float stream, which is a little higher than I might expect, but not terrible.

This is great news!!! So the bigint code will be much more performant in 1.30! Thanks for looking into this @ronawho!

@ronawho
Copy link
Contributor

ronawho commented Feb 23, 2023

So the bigint code will be much more performant in 1.30!

Yeah, I think we'll highly highly recommend 1.30 for anybody using bigints when it's released (given the performance improvements, bug fixes, and implementation cleanup)

@stress-tess stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 42adc22 to 976458d Compare February 23, 2023 21:02
@stress-tess
Copy link
Member Author

This code should be ready to review. I will put up my most updated perf comparison either tonight or tomorrow but it's too pretty not to be outside rn

@stress-tess stress-tess marked this pull request as ready for review February 23, 2023 21:28
@stress-tess stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 976458d to 7a46057 Compare February 24, 2023 16:35
This PR (closes Bears-R-Us#2156 and closes Bears-R-Us#2165) adds a bigint stream and bigint bitwise binops benchmark
@stress-tess stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 7a46057 to b8f7f86 Compare February 24, 2023 18:38
@stress-tess stress-tess force-pushed the 2156_bigint_stream_benchmark branch from 55a8b7f to e792ec0 Compare February 24, 2023 20:08
Copy link
Contributor

@Ethan-DeBandi99 Ethan-DeBandi99 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing jumps out as an issue to me.

Copy link
Contributor

@joshmarshall1 joshmarshall1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@Ethan-DeBandi99 Ethan-DeBandi99 added this pull request to the merge queue Feb 27, 2023
Merged via the queue into Bears-R-Us:master with commit ff291e3 Feb 27, 2023
@stress-tess stress-tess deleted the 2156_bigint_stream_benchmark branch February 27, 2023 14:59
@@ -1148,7 +1174,12 @@ module BinOp
var divideBy = makeDistArray(la.size, bigint);
divideBy = 1:bigint;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pierce314159 we're seeing bigint_bitwise_binops.py timeout during nightly testing for >>. I think the problem is probably here with assigning all elements to a bigint 1 that lives on locale 0. I think you could get rid of the tmp array and just do:

forall t in tmp with (var dB = (1:bigint) << val, var local_max_size = max_size) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Whoops, I meant this as a comment for the binopVS case. For the binopVV cases you should be able to do something like:

forall (t, ri) in zip(tmp, ra) with (var local_max_size = max_size) {
  var dB = 1:bigint;
  dB <<= ri;
  t /= dB;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Opened #2178

bmcdonald3 added a commit to bmcdonald3/arkouda that referenced this pull request Mar 10, 2023
Now that bigint is using the shared code with the other types for
indexing implemented in Bears-R-Us#2081, there was a bug uncovered in the
bigint module failing with remote assignments that will be fixed
in 1.30. For 1.29, using task-private copies is a suitable
workaround to avoid the remote assignmnets.

Also, this will result in a performance improvement for bigint
indexing, as discussed in
Bears-R-Us#2157 (comment).
stress-tess pushed a commit that referenced this pull request Mar 10, 2023
Now that bigint is using the shared code with the other types for
indexing implemented in #2081, there was a bug uncovered in the
bigint module failing with remote assignments that will be fixed
in 1.30. For 1.29, using task-private copies is a suitable
workaround to avoid the remote assignmnets.

Also, this will result in a performance improvement for bigint
indexing, as discussed in
#2157 (comment).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bigint bitwise binops benchmark bigint stream benchmark
5 participants