Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Mas d34 ms.i446 plusplus #448

Merged
merged 11 commits into from
Sep 5, 2024
Merged

Conversation

martinsumner
Copy link
Owner

Within the leveled_sst, a previous attempt to avoid the inefficiency of using [relatively large list] ++ [relatively small list] led to lost of reversing an re-reversing of lists.

This refactoring does a better job of enforcing the efficient appending of lists (as measured by perf_SUITE), and is potentially simpler to follow.

Try to reduce the calls to ++, and ensure that where possible the shorted list is being copied.
So that the list can be accumulated efficiently, without an additional copy to add back the accumulator at the end.
Code review to make sure we prepend to accumulators everywhere, to reduce the copying involved.

attempt to further optimise in leveled_sst (where most expensive ++ occurs).  This optimises for the case when Acc is [], and enforces a series of '++' to start from the right, prepending in turn.  Some shell testing indicated that this was not necessarily the case (although this doesn't seem tobe consistently reproducible).

```
6> element(1, timer:tc(fun() -> KL1 ++ KL2 ++ KL3 ++ KL4 end)).
28
7> element(1, timer:tc(fun() -> KL1 ++ KL2 ++ KL3 ++ KL4 end)).
174
8> element(1, timer:tc(fun() -> KL1 ++ KL2 ++ KL3 ++ KL4 end)).
96
9> element(1, timer:tc(fun() -> KL1 ++ KL2 ++ KL3 ++ KL4 end)).
106
10> element(1, timer:tc(fun() -> KL1 ++ KL2 ++ KL3 ++ KL4 end)).
112

17> element(1, timer:tc(fun() -> lists:foldr(fun(KL0, KLAcc) -> KL0 ++ KLAcc end, [], [KL1, KL2, KL3, KL4]) end)).
21
18> element(1, timer:tc(fun() -> lists:foldr(fun(KL0, KLAcc) -> KL0 ++ KLAcc end, [], [KL1, KL2, KL3, KL4]) end)).
17
19> element(1, timer:tc(fun() -> lists:foldr(fun(KL0, KLAcc) -> KL0 ++ KLAcc end, [], [KL1, KL2, KL3, KL4]) end)).
12
20> element(1, timer:tc(fun() -> lists:foldr(fun(KL0, KLAcc) -> KL0 ++ KLAcc end, [], [KL1, KL2, KL3, KL4]) end)).
11
```

running eprof indicates that '++' and lists:reverse have been reduced (however impact had only previously been 1-2%)
@martinsumner
Copy link
Owner Author

Profile following changes (comparison to #445 (comment))

Profile load:

FUNCTION                                    CALLS        %      TIME  [uS / CALLS]
--------                                    -----  -------      ----  [----------]
dict:fold_seg/4                           2040000     0.45    162230  [      0.08]
leveled_sst:build_all_slots/7               54954     0.46    166511  [      3.03]
lists:umerge3_1/7                         4095163     0.48    173991  [      0.04]
leveled_codec:strip_to_indexdetails/1     3253654     0.48    174186  [      0.05]
lists:umerge3_2/7                         4058837     0.48    174968  [      0.04]
lists:rumerge3_1/6                        4129569     0.48    174995  [      0.04]
lists:rumerge3_2/7                        4089654     0.49    176760  [      0.04]
testutil:new_v1/2                          120000     0.51    183486  [      1.53]
erlang:setelement/3                       3210092     0.66    237682  [      0.07]
leveled_ebloom:add_hashlist/5             5071868     0.70    253723  [      0.05]
leveled_sst:update_buildtimings/2          213773     0.71    256419  [      1.20]
leveled_codec:to_lookup/1                 7531694     0.81    294948  [      0.04]
ets:match_object/2                            231     0.84    302548  [   1309.73]
prim_file:sync_nif/2                          218     1.00    361881  [   1660.00]
leveled_codec:count_tombs/2              10131238     1.11    401185  [      0.04]
gen:do_call/4                              365463     1.26    458313  [      1.25]
lists:split/3                            10484477     1.34    485634  [      0.05]
leveled_util:hash1/2                     12240000     1.39    504271  [      0.04]
lists:ukeymerge2_1/7                      4283889     1.51    548187  [      0.13]
leveled_ebloom:map_hashes/3               2684548     1.68    607763  [      0.23]
prim_file:pwrite_nif/3                     120000     1.74    629537  [      5.25]
leveled_sst:accumulate_positions/2        3279090     2.19    792239  [      0.24]
ets:insert/2                               240000     2.67    966294  [      4.03]
leveled_sst:deserialise_checkedblock/2     260834     2.72    984879  [      3.78]
leveled_sst:serialise_block/2              273810     2.73    989347  [      3.61]
leveled_sst:form_slot/7                  10250538     2.98   1080289  [      0.11]
leveled_sst:key_dominates/3              10204554     3.10   1124462  [      0.11]
erlang:term_to_binary/2                    240177     4.39   1591935  [      6.63]
zstd:quick_decompress/1                    234165     5.00   1810079  [      7.73]
erlang:term_to_binary/1                    633851     7.43   2693130  [      4.25]
erlang:binary_to_term/1                    381009     8.84   3203675  [      8.41]
zstd:quick_compress/2                      367141    18.23   6606272  [     17.99]
--------------------------------------  ---------  -------  --------  [----------]
Total:                                  174700285  100.00%  36231772  [      0.21]


Profile head:

FUNCTION                                    CALLS        %      TIME  [uS / CALLS]
--------                                    -----  -------      ----  [----------]
gen_server:handle_msg/6                   1206158     0.41    161313  [      0.13]
leveled_penciller:timed_sst_get/4          632797     0.41    163105  [      0.26]
erlang:put/2                              3038682     0.42    165220  [      0.05]
leveled_monitor:maybe_time/1              1838517     0.45    178242  [      0.10]
leveled_head:riak_metadata_to_binary/2     600000     0.47    184775  [      0.31]
leveled_sst:check_blocks/8                 600952     0.48    190715  [      0.32]
leveled_sst:extract_header/2               640764     0.48    191595  [      0.30]
leveled_pmanifest:key_lookup/3            2496641     0.50    198475  [      0.08]
erlang:term_to_binary/1                     20274     0.55    216482  [     10.68]
leveled_bookie:fetch_head/4                600000     0.55    216732  [      0.36]
erlang:monitor/2                          1845268     0.61    243768  [      0.13]
rand:seed_put/1                           3038639     0.63    248499  [      0.08]
os:timestamp/0                            3497072     0.63    249623  [      0.07]
array:get_1/3                             5614118     0.64    255710  [      0.05]
leveled_pmanifest:key_lookup_level/3      3704870     0.65    258224  [      0.07]
leveled_ebloom:match_hash/3               1916015     0.69    274942  [      0.14]
leveled_penciller:fetch/5                 2496641     0.71    280014  [      0.11]
zstd:quick_compress/2                       20267     0.71    283085  [     13.97]
leveled_sst:spawn_check_block/3            600853     0.73    287833  [      0.48]
dict:find_val/2                           7307553     0.75    295500  [      0.04]
gen_statem:loop_receive/3                  638980     0.80    315671  [      0.49]
prim_file:read_nif/2                        10687     0.82    323619  [     30.28]
gen_server:loop/7                         1206158     0.86    341319  [      0.28]
crypto:hash_nif/2                          600000     0.88    348074  [      0.58]
leveled_bookie:handle_call/3               600000     0.88    350109  [      0.58]
ets:lookup/2                               600000     0.89    351046  [      0.59]
erlang:demonitor/2                        1845282     0.89    354237  [      0.19]
erlang:spawn_link/3                        600853     1.00    395808  [      0.66]
leveled_tree:search/3                     1929145     1.22    483488  [      0.25]
leveled_tree:iterator_from/3              7262521     1.22    483734  [      0.07]
leveled_sst:fetch/12                       632504     1.23    486279  [      0.77]
rand:uniform/1                            3038603     1.24    493348  [      0.16]
gen:reply/2                               1844840     1.33    528745  [      0.29]
leveled_tree:lookup_best/2               10130842     2.08    825669  [      0.08]
gen:do_call/4                             1845268     2.56   1013346  [      0.55]
prim_file:pread_nif/3                      601121     3.61   1432317  [      2.38]
leveled_sst:find_posint/4                80598090     7.67   3042136  [      0.04]
zstd:quick_decompress/1                    620052    18.09   7172742  [     11.57]
erlang:binary_to_term/1                    621896    18.49   7330591  [     11.79]
--------------------------------------  ---------  -------  --------  [----------]
Total:                                  293410693  100.00%  39651813  [      0.14]


Profile get:

FUNCTION                          CALLS        %      TIME  [uS / CALLS]
--------                          -----  -------      ----  [----------]
leveled_cdb:calc_crc/2           300000     0.43    165752  [      0.55]
prim_file:position_1/3          1220893     0.43    166462  [      0.14]
os:timestamp/0                  2437017     0.43    167392  [      0.07]
crypto:hash_nif/2                300000     0.44    170998  [      0.57]
ets:lookup/2                     300000     0.44    171984  [      0.57]
erlang:spawn_link/3              300438     0.46    178853  [      0.60]
leveled_tree:search/3            964947     0.62    242506  [      0.25]
leveled_sst:fetch/12             316430     0.63    245374  [      0.78]
leveled_tree:iterator_from/3    3620205     0.64    247537  [      0.07]
prim_file:read/2                1566530     0.64    248731  [      0.16]
rand:uniform/1                  1516454     0.67    258616  [      0.17]
gen_server:loop/7                900048     0.68    262185  [      0.29]
gen_statem:loop_receive/3        616461     0.73    282520  [      0.46]
erlang:demonitor/2              1516431     0.75    289253  [      0.19]
gen:reply/2                     1516430     1.03    400709  [      0.26]
leveled_tree:lookup_best/2      5050187     1.15    446052  [      0.09]
prim_file:pread_nif/3            300438     1.88    728757  [      2.43]
gen:do_call/4                   1516431     2.16    837834  [      0.55]
prim_file:seek_nif/3            1220893     2.42    940353  [      0.77]
leveled_util:hash1/2           30599967     2.93   1137679  [      0.04]
leveled_sst:find_posint/4      40313509     3.93   1524284  [      0.04]
erlang:binary_to_term/1          646075     9.22   3579378  [      5.54]
zstd:quick_decompress/1          600438    13.10   5083753  [      8.47]
prim_file:read_nif/2            1566530    31.31  12150544  [      7.76]
----------------------------  ---------  -------  --------  [----------]
Total:                        218927518  100.00%  38813154  [      0.18]


Profile query:

FUNCTION                                          CALLS        %     TIME  [uS / CALLS]
--------                                          -----  -------     ----  [----------]
leveled_penciller:'-find_nextkeys/6-fun-1-'/1   4668566     1.64   163250  [      0.03]
perf_SUITE:'-random_queries/6-fun-0-'/3         4800134     1.72   171177  [      0.04]
leveled_sst:'-in_range/3-fun-1-'/2              2036140     1.80   179093  [      0.09]
leveled_codec:'-accumulate_index/2-fun-1-'/4    4800134     1.80   179405  [      0.04]
lists:takewhile_1/2                             1920108     2.13   212520  [      0.11]
gen:do_call/4                                    340784     2.23   221778  [      0.65]
prim_file:pread_nif/3                             73654     2.87   285591  [      3.88]
lists:last/2                                    9200943     3.44   342238  [      0.04]
leveled_codec:endkey_passed/2                   2805993     3.60   358309  [      0.13]
leveled_codec:maybe_accumulate/5                4970397     3.79   377324  [      0.08]
maps:update_with/3                              4838829     4.02   400541  [      0.08]
zstd:quick_decompress/1                          253698     6.35   632766  [      2.49]
leveled_penciller:find_nextkeys/6              10149760     6.70   667514  [      0.07]
erlang:binary_to_term/1                          253698    26.48  2638521  [     10.40]
---------------------------------------------  --------  -------  -------  [----------]
Total:                                         88593906  100.00%  9963076  [      0.11]


Profile mini_query:

FUNCTION                                               CALLS        %      TIME  [uS / CALLS]
--------                                               -----  -------      ----  [----------]
leveled_tree:iterator_from/3                         1744551     1.03    165699  [      0.09]
leveled_penciller:find_nextkeys/6                    1967985     1.06    170508  [      0.09]
leveled_tree:'-idxtlookup_range_start/4-fun-0-'/2    2475924     1.42    227331  [      0.09]
leveled_sst:'-in_range/3-fun-0-'/2                   2960742     1.48    237003  [      0.08]
erlang:demonitor/2                                    977405     1.53    244777  [      0.25]
lists:splitwith_1/3                                  3365784     1.92    308007  [      0.09]
lists:takewhile_1/2                                  2961153     2.07    331207  [      0.11]
prim_file:pread_nif/3                                 189506     2.15    345063  [      1.82]
gen:reply/2                                           877672     2.39    383895  [      0.44]
lists:dropwhile_1/2                                  5398647     2.46    394203  [      0.07]
leveled_sst:'-in_range/3-fun-1-'/2                   5207524     2.90    464509  [      0.09]
gen:do_call/4                                         877673     3.44    551911  [      0.63]
leveled_codec:endkey_passed/2                        5059009     3.93    630759  [      0.12]
lists:last/2                                        16925531     3.98    638470  [      0.04]
zstd:quick_decompress/1                               422920     6.77   1085777  [      2.57]
erlang:binary_to_term/1                               422920    26.68   4277948  [     10.12]
-------------------------------------------------  ---------  -------  --------  [----------]
Total:                                             125970149  100.00%  16034005  [      0.13]


Profile regex_query:

FUNCTION                                           CALLS        %      TIME  [uS / CALLS]
--------                                           -----  -------      ----  [----------]
prim_file:pread_nif/3                              11259     0.37    160753  [     14.28]
leveled_sst:deserialise_checkedblock/2            206548     0.56    239585  [      1.16]
leveled_penciller:'-find_nextkeys/6-fun-1-'/1   10020889     0.91    390655  [      0.04]
leveled_codec:'-accumulate_index/2-fun-2-'/6    10343771     1.86    800282  [      0.08]
leveled_codec:maybe_accumulate/5                10667414     1.89    813243  [      0.08]
maps:update_with/3                              10344532     2.24    960453  [      0.09]
zstd:quick_decompress/1                           206548     4.59   1970956  [      9.54]
erlang:binary_to_term/1                           206548     8.22   3527347  [     17.08]
leveled_penciller:find_nextkeys/6               44846310     9.93   4263838  [      0.10]
re:run/2                                        10343771    67.62  29027710  [      2.81]
---------------------------------------------  ---------  -------  --------  [----------]
Total:                                         103928132  100.00%  42930560  [      0.41]


Profile full:

FUNCTION                                             CALLS        %      TIME  [uS / CALLS]
--------                                             -----  -------      ----  [----------]
leveled_penciller:keyfolder/6                       832540     0.23    193209  [      0.23]
erlang:crc32/1                                     1248720     0.34    286944  [      0.23]
prim_file:pread_nif/3                                52180     0.45    380554  [      7.29]
gen:do_call/4                                        52961     0.61    517360  [      9.77]
perf_SUITE:'-counter/2-fun-0-'/4                  26640000     1.11    942645  [      0.04]
leveled_head:build_head/2                         26640000     1.13    957690  [      0.04]
leveled_codec:striphead_to_v1details/1            26640000     1.15    976709  [      0.04]
leveled_codec:from_ledgerkey/1                    26640000     1.22   1035438  [      0.04]
leveled_penciller:'-find_nextkeys/6-fun-1-'/1     25807500     1.22   1038884  [      0.04]
leveled_head:get_size/2                           26640000     1.48   1258681  [      0.05]
leveled_codec:to_ledgerkey/3                      26640040     1.53   1299416  [      0.05]
leveled_codec:maybe_accumulate/5                  27472520     2.72   2305554  [      0.08]
maps:update_with/3                                26640020     3.12   2644553  [      0.10]
leveled_runner:'-accumulate_objects/4-fun-0-'/7   26640000     5.55   4709991  [      0.18]
leveled_sst:deserialise_checkedblock/2             1040500     5.73   4859224  [      4.67]
leveled_head:riak_metadata_to_binary/2            26640000     6.69   5672519  [      0.21]
leveled_codec:return_proxy/4                      26640000     6.91   5863320  [      0.22]
erlang:term_to_binary/1                           26640000    12.73  10797138  [      0.41]
erlang:binary_to_term/1                            1040500    13.08  11095902  [     10.66]
zstd:quick_decompress/1                            1040500    14.98  12706854  [     12.21]
leveled_penciller:find_nextkeys/6                148457420    16.01  13583667  [      0.09]
-----------------------------------------------  ---------  -------  --------  [----------]
Total:                                           518222998  100.00%  84831315  [      0.16]


Profile guess:

FUNCTION                                  CALLS        %      TIME  [uS / CALLS]
--------                                  -----  -------      ----  [----------]
leveled_sst:'-read_slots/5-fun-0-'/8    1041100     1.74    212499  [      0.20]
lists:map_1/2                           2088700     1.87    228696  [      0.11]
gen:do_call/4                            264801     2.21    270887  [      1.02]
prim_file:pread_nif/3                    130146     2.44    298983  [      2.30]
zstd:quick_decompress/1                  130146    12.21   1494909  [     11.49]
erlang:binary_to_term/1                  130146    12.56   1536960  [     11.81]
leveled_sst:find_posmlt/6             134200900    43.23   5291718  [      0.04]
------------------------------------  ---------  -------  --------  [----------]
Total:                                176600043  100.00%  12240574  [      0.07]


Profile estimate:

FUNCTION                               CALLS        %      TIME  [uS / CALLS]
--------                               -----  -------      ----  [----------]
leveled_penciller:find_nextkeys/6    1535589     1.51    187090  [      0.12]
gen:do_call/4                         132401     1.57    193970  [      1.47]
prim_file:pread_nif/3                 260808     4.62    571863  [      2.19]
leveled_sst:find_posmlt/6           67100450    21.85   2705928  [      0.04]
zstd:quick_decompress/1               260808    24.24   3002052  [     11.51]
erlang:binary_to_term/1               260808    24.62   3048774  [     11.69]
---------------------------------  ---------  -------  --------  [----------]
Total:                             101501275  100.00%  12383396  [      0.12]


Profile update:

FUNCTION                                    CALLS        %      TIME  [uS / CALLS]
--------                                    -----  -------      ----  [----------]
lists:keyfind/3                           3393307     0.82    160781  [      0.05]
lists:split/3                             4047833     0.95    187372  [      0.05]
lists:ukeymerge2_1/7                      1906662     0.98    193297  [      0.10]
io_lib_format:collect_cseq/2               720205     1.06    208367  [      0.29]
io_lib:get_option/3                       2880792     1.15    226552  [      0.08]
leveled_sst:find_posint/4                 5918930     1.17    230965  [      0.04]
leveled_ebloom:map_hashes/3                828060     1.33    262694  [      0.32]
leveled_sst:accumulate_positions/2        1042950     1.34    263056  [      0.25]
lists:do_flatten/2                        4271930     1.35    265726  [      0.06]
gen:do_call/4                              348765     1.36    268672  [      0.77]
leveled_sst:serialise_block/2              103034     1.57    309691  [      3.01]
prim_file:pwrite_nif/3                      60000     1.59    314064  [      5.23]
leveled_sst:deserialise_checkedblock/2     144998     1.60    314894  [      2.17]
leveled_sst:key_dominates/3               4087054     2.02    397962  [      0.10]
ets:insert/2                               120000     2.03    400103  [      3.33]
leveled_sst:form_slot/7                   4104281     2.23    438959  [      0.11]
erlang:term_to_binary/2                    120545     3.37    664805  [      5.51]
zstd:quick_decompress/1                    130807     5.05    994054  [      7.60]
erlang:term_to_binary/1                    283047     5.21   1026705  [      3.63]
erlang:binary_to_term/1                    301064     9.06   1784587  [      5.93]
zstd:quick_compress/2                      152745    11.84   2331531  [     15.26]
--------------------------------------  ---------  -------  --------  [----------]
Total:                                  134328130  100.00%  19700075  [      0.15]

No difference in unit test with/without inline compilation, so this has been removed
martinsumner and others added 7 commits September 4, 2024 17:38
These functions had previously used inline compilation - but this didn't appear to improve performance

Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
Also fix code coverage issues
Co-authored-by: Thomas Arts <thomas.arts@quviq.com>
@martinsumner martinsumner merged commit 5db277b into develop-3.4 Sep 5, 2024
2 checks passed
@martinsumner martinsumner deleted the mas-d34-ms.i446-plusplus branch September 5, 2024 14:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants