Greatly improve performance of sorting dictionaries #5168

tgoyne · 2022-01-13T16:19:32Z

I noticed in #5163 that dictionary sorting is rather slow and took a stab at improving it.

Dictionary lookup by index is slow, so read all of the keys/values from the dictionary into a vector and sort that instead. Using a dictionary iterator to read the values replaces O(N log N) ClusterTree lookups with O(log N), so this is asymptotically faster. The benchmark is sped up from 3.77 seconds to 21.62 milliseconds, but more reasonably sized Dictionaries will see proportionally smaller benefits.

The downside of this is that the array of Mixed requires 24 bytes of scratch space per element in the dictionary. We already require 8 bytes per element to store the results, so this is just a constant factor increase rather than an asymptotic change in memory use and it's probably not a problem.

bmunkholm · 2022-09-19T15:45:46Z

@tgoyne @jedelbo Any downsides to getting this reviewed and merged?

tgoyne · 2022-09-19T16:04:27Z

#5780 might make this unnecessary as it significantly changes the performance characteristics of dictionaries. If this is still a meaningful gain then there's probably no downside; the increased scratch memory usage really isn't signficant.

tgoyne · 2022-09-19T21:15:31Z

Still seems worth doing:

next-major:
Req runs:    5  SortIntList (MemOnly, EncryptionOff):         min 179.36ms     max 182.32ms     median 180.32ms     avg 180.77ms     stddev   1.46ms
Req runs:    5  SortIntDictionary (MemOnly, EncryptionOff):   min 687.99ms     max 702.65ms     median 692.61ms     avg 693.63ms     stddev   5.79ms

tg/dict-sort:
Req runs:    5  SortIntList (MemOnly, EncryptionOff):         min 190.65ms     max 202.14ms     median 201.05ms     avg 197.78ms     stddev   5.46ms
Req runs:   10  SortIntDictionary (MemOnly, EncryptionOff):   min  46.27ms     max  48.36ms     median  47.08ms     avg  47.26ms     stddev    762us

next-major + tg/dict-sort:
Req runs:    5  SortIntList (MemOnly, EncryptionOff):         min 182.83ms     max 194.63ms     median 191.08ms     avg 189.38ms     stddev   4.59ms
Req runs:   14  SortIntDictionary (MemOnly, EncryptionOff):   min  33.83ms     max  37.43ms     median  34.86ms     avg  35.24ms     stddev   1.13ms

jedelbo

Great! Let's do it.

Dictionary lookup by index is slow, so read all of the keys/values from the dictionary into a vector and sort that instead. Using a dictionary iterator to read the values replaces O(N log N) ClusterTree lookups with O(log N), so this is asymptotically faster. The benchmark is sped up from 3.77 seconds to 21.62 milliseconds, but more reasonably sized Dictionaries will see proportionally smaller benefits. The downside of this is that the array of Mixed requires 24 bytes of scratch space per element in the dictionary. We already require 8 bytes per element to store the results, so this is just a constant factor increase rather than an aysmtotic change in memory use and it's probably not a problem.

…nification * origin/master: Fix a data race in notifier packaging (#5892) Install util/http.hpp (#5893) Greatly improve performance of sorting dictionaries (#5168) Sync client shall not block user writes (#5844) update err message check (#5884)

tgoyne self-assigned this Jan 13, 2022

cla-bot bot added the cla: yes label Jan 13, 2022

Base automatically changed from tg/sort to master January 14, 2022 16:49

tgoyne force-pushed the tg/dict-sort branch from 49b5f47 to b68dff4 Compare January 15, 2022 00:04

tgoyne force-pushed the tg/dict-sort branch from b68dff4 to 3f23468 Compare July 14, 2022 17:54

bmunkholm requested a review from jedelbo September 19, 2022 15:45

tgoyne force-pushed the tg/dict-sort branch from 3f23468 to df83018 Compare September 19, 2022 21:15

jedelbo approved these changes Sep 20, 2022

View reviewed changes

tgoyne force-pushed the tg/dict-sort branch from df83018 to 364bc5d Compare September 27, 2022 15:09

tgoyne merged commit f81b3ce into master Sep 27, 2022

tgoyne deleted the tg/dict-sort branch September 27, 2022 17:53

github-actions bot locked as resolved and limited conversation to collaborators Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Greatly improve performance of sorting dictionaries #5168

Greatly improve performance of sorting dictionaries #5168

tgoyne commented Jan 13, 2022

bmunkholm commented Sep 19, 2022

tgoyne commented Sep 19, 2022

tgoyne commented Sep 19, 2022

jedelbo left a comment

Greatly improve performance of sorting dictionaries #5168

Greatly improve performance of sorting dictionaries #5168

Conversation

tgoyne commented Jan 13, 2022

bmunkholm commented Sep 19, 2022

tgoyne commented Sep 19, 2022

tgoyne commented Sep 19, 2022

jedelbo left a comment

Choose a reason for hiding this comment