Parallelize array deinitialization #21670

ronawho · 2023-02-23T17:19:00Z

Historically, we have deintialized array elements serially. For many codes this isn't an issue because POD types don't need any element deinit, but for types with non-trivial deinit (typically things that deallocate) like arrays-of-strings or arrays-of-bigints this can cause non-trivial slowdowns.

This changes array deinit to be parallel using the same heuristics as parallel init. This ensures we don't create a lot of tasks for small arrays and that affinity will be similar between init and deinit.

This speeds up deinit for most arrays whose elements require deinit, but slows down array-of-arrays since the performance hit from contentions on reference counters exceeds the speedup for parallel deallocation. Future improvements to speed up reference counting and do bulk counting for AoA is captured in Cray/chapel-private#1378 and Cray/chapel-private#4362. And while it is a regression, it just makes AoA deinit as slow as init.

Performance for arrayInitDeinitPerf on chapcs:

config	before	after
AoA init	12.56s	12.54s
AoA deinit	2.86s	10.35s
AoR init	0.28s	0.28s
AoR deinit	0.99s	0.04s

For an arkouda bigint_conversion test (which most recently motivated this change) on 16-node-cs-hdr:

config	before	after
bigint_from_uint	2.19s	0.18s
bigint_to_uint	2.63s	0.27s

Resolves #15215

Historically, we have deintialized array elements serially. For many codes this isn't an issue because POD types don't need any element deinit, but for types with non-trivial deinit (typically things that deallocate) like arrays-of-strings or arrays-of-bigints this can cause non-trivial slowdowns. This changes array deinit to be parallel using the same heuristics as parallel init. This ensures we don't create a lot of tasks for small arrays and that affinity will be similar between init and deinit. This speeds up deinit for most arrays whose elements require deinit, but slows down array-of-arrays since the performance hit from contentions on reference counters exceeds the speedup for parallel deallocation. Future improvements to speed up reference counting and do bulk counting for AoA is captured in Cray/chapel-private 1378 and Cray/chapel-private 4362. And while it is a regression, it just makes AoA deinit as slow as init. Performance for arrayInitDeinitPerf on chapcs: | config | before | after | | ---------- | -----: | -----: | | AoA init | 12.56s | 12.54s | | AoA deinit | 2.86s | 10.35s | | AoR init | 0.28s | 0.28s | | AoR deinit | 0.99s | 0.04s | For an arkouda bigint_conversion test (which most recently motivated this change) on 16-node-cs-hdr: | config | before | after | | ---------------- | -----: | -----: | | bigint_from_uint | 2.19s | 0.18s | | bigint_to_uint | 2.63s | 0.27s | Resolves 15215 Signed-off-by: Elliot Ronaghan <ronawho@gmail.com>

bradcray · 2023-02-28T01:28:47Z

Nice to see this go in!

ronawho requested review from benharsh and e-kayrakli February 23, 2023 17:19

benharsh approved these changes Feb 23, 2023

View reviewed changes

e-kayrakli approved these changes Feb 23, 2023

View reviewed changes

ronawho mentioned this pull request Feb 23, 2023

Closes #2156: Bigint stream benchmark Bears-R-Us/arkouda#2157

Merged

ronawho merged commit bcd8b26 into chapel-lang:main Feb 27, 2023

ronawho deleted the parallel-array-deinit branch February 27, 2023 14:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Parallelize array deinitialization #21670

Parallelize array deinitialization #21670

ronawho commented Feb 23, 2023

bradcray commented Feb 28, 2023

Parallelize array deinitialization #21670

Parallelize array deinitialization #21670

Conversation

ronawho commented Feb 23, 2023

bradcray commented Feb 28, 2023