-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Closes #2156: Bigint stream benchmark #2157
Closes #2156: Bigint stream benchmark #2157
Conversation
I found some optimizations by refactoring code to try and favor inplace ops. So // instead of
tmp = la * ra;
// do
tmp = la;
tmp *= ra; The performance is better but still not great. I need to update |
I also see fairly poor performance with 16-node-cs-hdr, but I see much better performance when enabling parallel array deinit (chapel-lang/chapel#21670), which we saw benefit other bigint operations: chapel 1.29:
chapel main w/ chapel-lang/chapel#21670 (parallel deinit):
So ~20x improvement from parallel deinit. Still ~15x off from int/float stream, which is a little higher than I might expect, but not terrible. |
This is great news!!! So the bigint code will be much more performant in 1.30! Thanks for looking into this @ronawho! |
Yeah, I think we'll highly highly recommend 1.30 for anybody using bigints when it's released (given the performance improvements, bug fixes, and implementation cleanup) |
42adc22
to
976458d
Compare
This code should be ready to review. I will put up my most updated perf comparison either tonight or tomorrow but it's too pretty not to be outside rn |
976458d
to
7a46057
Compare
This PR (closes Bears-R-Us#2156 and closes Bears-R-Us#2165) adds a bigint stream and bigint bitwise binops benchmark
7a46057
to
b8f7f86
Compare
55a8b7f
to
e792ec0
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nothing jumps out as an issue to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
@@ -1148,7 +1174,12 @@ module BinOp | |||
var divideBy = makeDistArray(la.size, bigint); | |||
divideBy = 1:bigint; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@pierce314159 we're seeing bigint_bitwise_binops.py
timeout during nightly testing for >>
. I think the problem is probably here with assigning all elements to a bigint 1
that lives on locale 0. I think you could get rid of the tmp array and just do:
forall t in tmp with (var dB = (1:bigint) << val, var local_max_size = max_size) {
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Whoops, I meant this as a comment for the binopVS case. For the binopVV cases you should be able to do something like:
forall (t, ri) in zip(tmp, ra) with (var local_max_size = max_size) {
var dB = 1:bigint;
dB <<= ri;
t /= dB;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Opened #2178
Now that bigint is using the shared code with the other types for indexing implemented in Bears-R-Us#2081, there was a bug uncovered in the bigint module failing with remote assignments that will be fixed in 1.30. For 1.29, using task-private copies is a suitable workaround to avoid the remote assignmnets. Also, this will result in a performance improvement for bigint indexing, as discussed in Bears-R-Us#2157 (comment).
Now that bigint is using the shared code with the other types for indexing implemented in #2081, there was a bug uncovered in the bigint module failing with remote assignments that will be fixed in 1.30. For 1.29, using task-private copies is a suitable workaround to avoid the remote assignmnets. Also, this will result in a performance improvement for bigint indexing, as discussed in #2157 (comment).
This PR (closes #2156 and closes #2165) adds a bigint stream benchmark. The bigint stream does not scale well. I couldn't find any obvious optimizations
bigint_stream
after this PR:bigint_stream
before this PR:Compared to non-bigint
stream
:bigint bitwise binops after this PR:
bigint bitwise binops before this PR: