Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parallelize array deinitialization #21670

Merged
merged 1 commit into from
Feb 27, 2023

Conversation

ronawho
Copy link
Contributor

@ronawho ronawho commented Feb 23, 2023

Historically, we have deintialized array elements serially. For many codes this isn't an issue because POD types don't need any element deinit, but for types with non-trivial deinit (typically things that deallocate) like arrays-of-strings or arrays-of-bigints this can cause non-trivial slowdowns.

This changes array deinit to be parallel using the same heuristics as parallel init. This ensures we don't create a lot of tasks for small arrays and that affinity will be similar between init and deinit.

This speeds up deinit for most arrays whose elements require deinit, but slows down array-of-arrays since the performance hit from contentions on reference counters exceeds the speedup for parallel deallocation. Future improvements to speed up reference counting and do bulk counting for AoA is captured in Cray/chapel-private#1378 and Cray/chapel-private#4362. And while it is a regression, it just makes AoA deinit as slow as init.

Performance for arrayInitDeinitPerf on chapcs:

config before after
AoA init 12.56s 12.54s
AoA deinit 2.86s 10.35s
AoR init 0.28s 0.28s
AoR deinit 0.99s 0.04s

For an arkouda bigint_conversion test (which most recently motivated this change) on 16-node-cs-hdr:

config before after
bigint_from_uint 2.19s 0.18s
bigint_to_uint 2.63s 0.27s

Resolves #15215

Historically, we have deintialized array elements serially. For many
codes this isn't an issue because POD types don't need any element
deinit, but for types with non-trivial deinit (typically things that
deallocate) like arrays-of-strings or arrays-of-bigints this can cause
non-trivial slowdowns.

This changes array deinit to be parallel using the same heuristics as
parallel init. This ensures we don't create a lot of tasks for small
arrays and that affinity will be similar between init and deinit.

This speeds up deinit for most arrays whose elements require deinit, but
slows down array-of-arrays since the performance hit from contentions on
reference counters exceeds the speedup for parallel deallocation. Future
improvements to speed up reference counting and do bulk counting for AoA
is captured in Cray/chapel-private 1378 and Cray/chapel-private 4362.
And while it is a regression, it just makes AoA deinit as slow as init.

Performance for arrayInitDeinitPerf on chapcs:

| config     | before | after  |
| ---------- | -----: | -----: |
| AoA   init | 12.56s | 12.54s |
| AoA deinit |  2.86s | 10.35s |
| AoR   init |  0.28s |  0.28s |
| AoR deinit |  0.99s |  0.04s |

For an arkouda bigint_conversion test (which most recently motivated
this change) on 16-node-cs-hdr:

| config           | before | after  |
| ---------------- | -----: | -----: |
| bigint_from_uint | 2.19s  | 0.18s  |
| bigint_to_uint   | 2.63s  | 0.27s  |

Resolves 15215

Signed-off-by: Elliot Ronaghan <ronawho@gmail.com>
@ronawho ronawho merged commit bcd8b26 into chapel-lang:main Feb 27, 2023
@ronawho ronawho deleted the parallel-array-deinit branch February 27, 2023 14:47
@bradcray
Copy link
Member

Nice to see this go in!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Should/can array element deinitialization be parallelized?
4 participants