Parallelize array deinitialization #21670
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Historically, we have deintialized array elements serially. For many codes this isn't an issue because POD types don't need any element deinit, but for types with non-trivial deinit (typically things that deallocate) like arrays-of-strings or arrays-of-bigints this can cause non-trivial slowdowns.
This changes array deinit to be parallel using the same heuristics as parallel init. This ensures we don't create a lot of tasks for small arrays and that affinity will be similar between init and deinit.
This speeds up deinit for most arrays whose elements require deinit, but slows down array-of-arrays since the performance hit from contentions on reference counters exceeds the speedup for parallel deallocation. Future improvements to speed up reference counting and do bulk counting for AoA is captured in Cray/chapel-private#1378 and Cray/chapel-private#4362. And while it is a regression, it just makes AoA deinit as slow as init.
Performance for arrayInitDeinitPerf on chapcs:
For an arkouda bigint_conversion test (which most recently motivated this change) on 16-node-cs-hdr:
Resolves #15215