Optimize Baseline interpreter by using padded code #315

chfast · 2021-05-04T13:40:46Z

During code analysis the code is copied and padded with 33 zero bytes. This guarantees that push data is always available in the code buffer and the code ends with STOP. This allows optimizing the interpreter loop and PUSH instructions.

Performance gains are up to ~5%.

This is also needed to enable other optimization: implementing interpreter with "computed goto" or "tail calls".

codecov · 2021-05-31T21:19:28Z

Codecov Report

Merging #315 (5bf1e76) into master (3a2dbeb) will increase coverage by 0.00%.
The diff coverage is 100.00%.

❗ Current head 5bf1e76 differs from pull request most recent head 6538e6a. Consider uploading reports for the commit 6538e6a to get more accurate results

@@           Coverage Diff           @@
##           master     #315   +/-   ##
=======================================
  Coverage   99.78%   99.78%           
=======================================
  Files          29       29           
  Lines        4108     4112    +4     
=======================================
+ Hits         4099     4103    +4     
  Misses          9        9

Flag	Coverage Δ
consensus	`91.08% <93.02%> (-0.12%)`	⬇️
unittests	`99.78% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
lib/evmone/baseline.cpp	`99.82% <100.00%> (+<0.01%)`	⬆️

chfast · 2021-06-01T08:03:37Z

Haswell 4.4 GHz, clang-12

Comparing o/b-master to o/b-padded
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
baseline/analyse/main/blake2b_huff_mean                               +0.0094         +0.0094             5             5             5             5
baseline/execute/main/blake2b_huff/empty_mean                         +0.0067         +0.0067            21            21            21            21
baseline/execute/main/blake2b_huff/2805nulls_mean                     -0.0306         -0.0306           342           332           342           332
baseline/execute/main/blake2b_huff/5610nulls_mean                     -0.0339         -0.0339           664           642           664           642
baseline/execute/main/blake2b_huff/8415nulls_mean                     -0.0301         -0.0302           969           940           969           940
baseline/execute/main/blake2b_huff/65536nulls_mean                    -0.0342         -0.0342          7501          7245          7501          7245
baseline/analyse/main/blake2b_shifts_mean                             +0.0557         +0.0557             3             3             3             3
baseline/execute/main/blake2b_shifts/2805nulls_mean                   -0.0475         -0.0475          3504          3337          3504          3337
baseline/execute/main/blake2b_shifts/5610nulls_mean                   -0.0473         -0.0473          7181          6841          7181          6841
baseline/execute/main/blake2b_shifts/8415nulls_mean                   -0.0521         -0.0522         10804         10241         10804         10241
baseline/execute/main/blake2b_shifts/65536nulls_mean                  -0.0534         -0.0533         86116         81517         86110         81516
baseline/analyse/main/sha1_divs_mean                                  +0.1272         +0.1272             0             1             0             1
baseline/execute/main/sha1_divs/empty_mean                            -0.0270         -0.0270            57            55            57            55
baseline/execute/main/sha1_divs/1351_mean                             -0.0322         -0.0322          1169          1131          1169          1131
baseline/execute/main/sha1_divs/2737_mean                             -0.0274         -0.0274          2272          2210          2272          2210
baseline/execute/main/sha1_divs/5311_mean                             -0.0274         -0.0274          4436          4314          4436          4314
baseline/execute/main/sha1_divs/65536_mean                            -0.0324         -0.0324         54213         52456         54214         52456
baseline/analyse/main/sha1_shifts_mean                                +0.1276         +0.1276             0             1             0             1
baseline/execute/main/sha1_shifts/empty_mean                          -0.0464         -0.0464            35            34            35            34
baseline/execute/main/sha1_shifts/1351_mean                           -0.0423         -0.0423           729           698           729           698
baseline/execute/main/sha1_shifts/2737_mean                           -0.0463         -0.0463          1425          1359          1425          1359
baseline/execute/main/sha1_shifts/5311_mean                           -0.0429         -0.0429          2778          2659          2778          2659
baseline/execute/main/sha1_shifts/65536_mean                          -0.0432         -0.0432         33867         32403         33867         32403
baseline/analyse/main/weierstrudel_mean                               +0.0547         +0.0547             7             7             7             7
baseline/execute/main/weierstrudel/0_mean                             -0.0259         -0.0259           184           179           184           179
baseline/execute/main/weierstrudel/1_mean                             -0.0490         -0.0490           411           391           411           391
baseline/execute/main/weierstrudel/3_mean                             -0.0525         -0.0525           641           607           641           607
baseline/execute/main/weierstrudel/9_mean                             -0.0589         -0.0589          1324          1246          1324          1246
baseline/execute/main/weierstrudel/14_mean                            -0.0598         -0.0598          1898          1785          1898          1785
baseline/analyse/micro/beginsub_push1s_0xffff_mean                    +0.0612         +0.0612            57            60            57            60
baseline/execute/micro/beginsub_push1s_0xffff_mean                    +0.0660         +0.0660            57            61            57            61
baseline/analyse/micro/beginsubs_0xffff_mean                          +0.1499         +0.1499            23            26            23            26
baseline/execute/micro/beginsubs_0xffff_mean                          +0.1569         +0.1570            23            27            23            27
baseline/analyse/micro/jumpdests_0xffff_mean                          +0.0934         +0.0934            79            87            79            87
baseline/execute/micro/jumpdests_0xffff_mean                          -0.0085         -0.0085           197           196           197           196
baseline/analyse/micro/loop_with_many_jumpdests_mean                  +0.0916         +0.0916            30            32            30            32
baseline/execute/micro/loop_with_many_jumpdests_mean                  -0.0336         -0.0336         13879         13413         13879         13413
baseline/analyse/micro/push1s_0xffff_mean                             +0.0561         +0.0561            60            63            60            63
baseline/execute/micro/push1s_0xffff_mean                             +0.0617         +0.0617            61            64            61            64
baseline/analyse/micro/push32s_0xffff_mean                            +0.8915         +0.8915             4             7             4             7
baseline/execute/micro/push32s_0xffff_mean                            +0.8083         +0.8083             5             8             5             8
baseline/analyse/micro/zeros_0xffff_mean                              +0.1475         +0.1475            23            26            23            26
baseline/execute/micro/zeros_0xffff_mean                              +0.1590         +0.1590            23            27            23            27

chfast · 2021-06-01T08:11:23Z

AMD Zen3, GCC-9

Comparing o/b-master to o/b-padded
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
baseline/execute/main/blake2b_huff/empty_mean                         +0.0166         +0.0165            21            22            21            22
baseline/execute/main/blake2b_huff/2805nulls_mean                     -0.0229         -0.0229           364           356           364           356
baseline/execute/main/blake2b_huff/5610nulls_mean                     +0.0003         +0.0003           702           702           701           702
baseline/execute/main/blake2b_huff/8415nulls_mean                     +0.0165         +0.0165          1017          1034          1017          1033
baseline/execute/main/blake2b_huff/65536nulls_mean                    -0.0313         -0.0313          7964          7714          7962          7713
baseline/analyse/main/blake2b_shifts_mean                             +0.0203         +0.0204             3             3             3             3
baseline/execute/main/blake2b_shifts/2805nulls_mean                   -0.0331         -0.0331          3464          3350          3463          3349
baseline/execute/main/blake2b_shifts/5610nulls_mean                   -0.0173         -0.0173          6988          6867          6986          6865
baseline/execute/main/blake2b_shifts/8415nulls_mean                   +0.0079         +0.0080         10440         10522         10436         10520
baseline/execute/main/blake2b_shifts/65536nulls_mean                  +0.0137         +0.0136         81730         82851         81702         82814
baseline/analyse/main/sha1_divs_mean                                  +0.1506         +0.1506             0             0             0             0
baseline/execute/main/sha1_divs/empty_mean                            +0.0048         +0.0048            62            62            62            62
baseline/execute/main/sha1_divs/1351_mean                             +0.0503         +0.0503          1225          1287          1225          1287
baseline/execute/main/sha1_divs/2737_mean                             -0.0202         -0.0202          2528          2477          2527          2476
baseline/execute/main/sha1_divs/5311_mean                             +0.0218         +0.0218          4763          4867          4762          4865
baseline/execute/main/sha1_divs/65536_mean                            +0.0189         +0.0189         57662         58750         57648         58736
baseline/analyse/main/sha1_shifts_mean                                +0.0789         +0.0789             0             0             0             0
baseline/execute/main/sha1_shifts/empty_mean                          -0.0820         -0.0820            42            39            42            39
baseline/execute/main/sha1_shifts/1351_mean                           -0.0306         -0.0306           816           791           816           791
baseline/execute/main/sha1_shifts/2737_mean                           -0.0421         -0.0422          1636          1567          1636          1567
baseline/execute/main/sha1_shifts/5311_mean                           -0.0703         -0.0703          3263          3033          3262          3033
baseline/execute/main/sha1_shifts/65536_mean                          -0.0854         -0.0854         39815         36414         39805         36406
baseline/analyse/main/weierstrudel_mean                               +0.0393         +0.0393             6             6             6             6
baseline/execute/main/weierstrudel/0_mean                             +0.0061         +0.0061           183           184           183           184
baseline/execute/main/weierstrudel/1_mean                             -0.0151         -0.0150           384           378           383           378
baseline/execute/main/weierstrudel/3_mean                             -0.0270         -0.0270           599           582           599           582
baseline/execute/main/weierstrudel/9_mean                             -0.0310         -0.0310          1265          1226          1265          1225
baseline/execute/main/weierstrudel/14_mean                            +0.0151         +0.0150          1740          1766          1739          1766
baseline/analyse/micro/beginsub_push1s_0xffff_mean                    +0.0484         +0.0483            47            49            47            49
baseline/execute/micro/beginsub_push1s_0xffff_mean                    +0.0317         +0.0317            47            49            47            49
baseline/analyse/micro/beginsubs_0xffff_mean                          -0.1700         -0.1700            29            24            29            24
baseline/execute/micro/beginsubs_0xffff_mean                          -0.1863         -0.1864            29            23            29            23
baseline/analyse/micro/jumpdests_0xffff_mean                          +0.1298         +0.1298            52            58            52            58
baseline/execute/micro/jumpdests_0xffff_mean                          +0.3098         +0.3098           136           178           136           178
baseline/analyse/micro/loop_with_many_jumpdests_mean                  +0.1358         +0.1358            20            22            20            22
baseline/execute/micro/loop_with_many_jumpdests_mean                  -0.0678         -0.0679         15723         14656         15720         14653
baseline/analyse/micro/push1s_0xffff_mean                             +0.0289         +0.0289            50            51            50            51
baseline/execute/micro/push1s_0xffff_mean                             +0.0382         +0.0382            50            52            50            52
baseline/analyse/micro/push32s_0xffff_mean                            +0.5021         +0.5021             3             5             3             5
baseline/execute/micro/push32s_0xffff_mean                            +0.5170         +0.5169             3             5             3             5
baseline/analyse/micro/zeros_0xffff_mean                              -0.1850         -0.1851            29            23            29            23
baseline/execute/micro/zeros_0xffff_mean                              -0.1842         -0.1842            29            24            29            24

chfast · 2021-06-01T08:21:20Z

AMD Zen3, clang-12

Comparing o/b-master to o/b-padded
Benchmark                                                                Time             CPU      Time Old      Time New       CPU Old       CPU New
-----------------------------------------------------------------------------------------------------------------------------------------------------
baseline/execute/main/blake2b_huff/empty_mean                         -0.0370         -0.0370            19            19            19            19
baseline/execute/main/blake2b_huff/2805nulls_mean                     -0.0683         -0.0683           302           281           302           281
baseline/execute/main/blake2b_huff/5610nulls_mean                     -0.0627         -0.0627           589           552           589           552
baseline/execute/main/blake2b_huff/8415nulls_mean                     -0.0531         -0.0531           849           804           849           804
baseline/execute/main/blake2b_huff/65536nulls_mean                    -0.0865         -0.0866          6762          6176          6760          6175
baseline/analyse/main/blake2b_shifts_mean                             +0.1486         +0.1485             3             3             3             3
baseline/execute/main/blake2b_shifts/2805nulls_mean                   -0.0525         -0.0525          3040          2880          3039          2879
baseline/execute/main/blake2b_shifts/5610nulls_mean                   -0.0184         -0.0184          5966          5856          5964          5854
baseline/execute/main/blake2b_shifts/8415nulls_mean                   -0.0462         -0.0462          9185          8761          9182          8758
baseline/execute/main/blake2b_shifts/65536nulls_mean                  -0.0633         -0.0633         71635         67103         71606         67076
baseline/analyse/main/sha1_divs_mean                                  +0.0992         +0.0993             0             0             0             0
baseline/execute/main/sha1_divs/empty_mean                            -0.0591         -0.0591            52            49            52            49
baseline/execute/main/sha1_divs/1351_mean                             -0.0138         -0.0138          1053          1038          1052          1038
baseline/execute/main/sha1_divs/2737_mean                             -0.0530         -0.0530          2053          1945          2053          1944
baseline/execute/main/sha1_divs/5311_mean                             -0.0611         -0.0611          4058          3810          4057          3809
baseline/execute/main/sha1_divs/65536_mean                            -0.0602         -0.0601         49490         46512         49475         46500
baseline/analyse/main/sha1_shifts_mean                                +0.1668         +0.1668             0             0             0             0
baseline/execute/main/sha1_shifts/empty_mean                          -0.0005         -0.0005            30            30            30            30
baseline/execute/main/sha1_shifts/1351_mean                           -0.0641         -0.0641           653           611           653           611
baseline/execute/main/sha1_shifts/2737_mean                           -0.0561         -0.0560          1264          1193          1263          1193
baseline/execute/main/sha1_shifts/5311_mean                           -0.0489         -0.0489          2438          2318          2437          2318
baseline/execute/main/sha1_shifts/65536_mean                          +0.0386         +0.0386         28738         29848         28730         29840
baseline/analyse/main/weierstrudel_mean                               +0.2038         +0.2038             5             6             5             6
baseline/execute/main/weierstrudel/0_mean                             +0.0210         +0.0209           147           150           147           150
baseline/execute/main/weierstrudel/1_mean                             -0.0305         -0.0305           340           330           340           330
baseline/execute/main/weierstrudel/3_mean                             -0.0107         -0.0107           505           500           505           500
baseline/execute/main/weierstrudel/9_mean                             -0.0164         -0.0164          1032          1015          1032          1015
baseline/execute/main/weierstrudel/14_mean                            -0.0370         -0.0370          1508          1452          1507          1451
baseline/analyse/micro/beginsub_push1s_0xffff_mean                    +0.0256         +0.0256            47            49            47            49
baseline/execute/micro/beginsub_push1s_0xffff_mean                    +0.1753         +0.1752            48            56            48            56
baseline/analyse/micro/beginsubs_0xffff_mean                          +0.3575         +0.3576            22            30            22            30
baseline/execute/micro/beginsubs_0xffff_mean                          +0.0418         +0.0418            29            31            29            31
baseline/analyse/micro/jumpdests_0xffff_mean                          +0.0521         +0.0521            89            94            89            94
baseline/execute/micro/jumpdests_0xffff_mean                          +0.0916         +0.0916           182           199           182           198
baseline/analyse/micro/loop_with_many_jumpdests_mean                  +0.0563         +0.0563            34            36            34            36
baseline/execute/micro/loop_with_many_jumpdests_mean                  -0.0137         -0.0137         12623         12450         12620         12447
baseline/analyse/micro/push1s_0xffff_mean                             +0.0290         +0.0291            50            51            50            51
baseline/execute/micro/push1s_0xffff_mean                             +0.1634         +0.1635            51            59            51            59
baseline/analyse/micro/push32s_0xffff_mean                            +0.4993         +0.4993             3             5             3             5
baseline/execute/micro/push32s_0xffff_mean                            +0.5775         +0.5776             4             6             4             6
baseline/analyse/micro/zeros_0xffff_mean                              +0.3677         +0.3678            22            30            22            30
baseline/execute/micro/zeros_0xffff_mean                              +0.0356         +0.0357            30            31            30            31

lib/evmone/baseline.cpp

gumb0 · 2021-06-01T15:25:55Z

lib/evmone/baseline.cpp

    uint8_t buffer[Len];
+    // This cannot overflow code buffer because code ends with valid STOP instruction.


I think here it's not relevant that it's a STOP?

Suggested change

// This cannot overflow code buffer because code ends with valid STOP instruction.

// This cannot overflow code buffer because code is padded with 0s.

I think the original comment is more correct, as the loop is looking for STOP

I thought the comment here is supposed to explain why memcpy cannot overflow.

gumb0 · 2021-06-01T15:26:24Z

lib/evmone/baseline.cpp

    auto pc = code;
-    while (pc != code_end)
+    while (true)  // Guaranteed to terminate because code must end with STOP.


Suggested change

while (true) // Guaranteed to terminate because code must end with STOP.

while (true) // Guaranteed to terminate because padded code ends with STOP.

test/bench/helpers.hpp

gumb0 · 2021-06-03T16:37:37Z

lib/evmone/baseline.cpp

-    return CodeAnalysis{std::move(map)};
+
+    // i is the needed code size including the last push data (can be bigger than code_size).
+    std::unique_ptr<uint8_t[]> padded_code{new uint8_t[i + 1]};


make_unique would zero-initialize, as opposed to this?

I would maybe add a comment that this leaves it unitialized.

lib/evmone/baseline.cpp

gumb0

Just a random side-thought: but with a static +33 bytes always allocated it potentially could be optimized to do a single allocation in analyze for both JumpdestMap and padded code together.

Guaranteed to terminate because padded code ends with STOP.

This cannot overflow code buffer because code ends with valid STOP instruction.

chfast · 2021-06-04T12:30:31Z

Just a random side-thought: but with a static +33 bytes always allocated it potentially could be optimized to do a single allocation in analyze for both JumpdestMap and padded code together.

Yes, this was suppose to be TODO, so I have added it now.

This was my original plan, but there are additional complications:

We need custom bitmap implementation or custom allocator for std::vector<bool>.
The data for bitmap must be additionally 4 of 8 byte aligned. Now I can see it may be easier to put the jumpdest bitmap at the from of the allocated buffer.

chfast marked this pull request as draft May 4, 2021 13:40

chfast force-pushed the baseline_api branch from d174afc to 77ec2ea Compare May 4, 2021 15:14

Base automatically changed from baseline_api to master May 4, 2021 15:33

chfast force-pushed the baseline_padded_code branch 3 times, most recently from 1085606 to 915e7b5 Compare May 31, 2021 21:11

chfast marked this pull request as ready for review May 31, 2021 21:14

chfast requested review from axic and gumb0 May 31, 2021 21:14

chfast force-pushed the baseline_padded_code branch from 915e7b5 to fbf2d80 Compare June 1, 2021 07:30

chfast mentioned this pull request Jun 1, 2021

PoC: assume code ends with STOP #295

Closed

gumb0 reviewed Jun 1, 2021

View reviewed changes

chfast force-pushed the baseline_padded_code branch 3 times, most recently from ea36730 to 76a3fb9 Compare June 2, 2021 14:26

gumb0 reviewed Jun 3, 2021

View reviewed changes

lib/evmone/baseline.cpp Show resolved Hide resolved

gumb0 approved these changes Jun 3, 2021

View reviewed changes

chfast added 4 commits June 4, 2021 14:25

baseline: Copy and pad the code before execution

8c6c3cd

baseline: Use padded code for execution

dd43701

baseline: Use "infinite" interpreter loop

5d7cade

Guaranteed to terminate because padded code ends with STOP.

baseline: Do not check for out-of-buffer push load

6538e6a

This cannot overflow code buffer because code ends with valid STOP instruction.

chfast force-pushed the baseline_padded_code branch from 76a3fb9 to 6538e6a Compare June 4, 2021 12:25

chfast merged commit 9182e3d into master Jun 4, 2021

chfast deleted the baseline_padded_code branch June 4, 2021 12:49

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize Baseline interpreter by using padded code #315

Optimize Baseline interpreter by using padded code #315

chfast commented May 4, 2021 •

edited

Loading

codecov bot commented May 31, 2021 •

edited

Loading

chfast commented Jun 1, 2021

chfast commented Jun 1, 2021

chfast commented Jun 1, 2021

gumb0 Jun 1, 2021

axic Jun 1, 2021

gumb0 Jun 1, 2021

gumb0 Jun 1, 2021

gumb0 Jun 3, 2021

chfast Jun 3, 2021

gumb0 Jun 4, 2021

gumb0 left a comment

chfast commented Jun 4, 2021

		uint8_t buffer[Len];
		// This cannot overflow code buffer because code ends with valid STOP instruction.

	while (true) // Guaranteed to terminate because code must end with STOP.
	while (true) // Guaranteed to terminate because padded code ends with STOP.

Optimize Baseline interpreter by using padded code #315

Optimize Baseline interpreter by using padded code #315

Conversation

chfast commented May 4, 2021 • edited Loading

codecov bot commented May 31, 2021 • edited Loading

Codecov Report

chfast commented Jun 1, 2021

chfast commented Jun 1, 2021

chfast commented Jun 1, 2021

gumb0 Jun 1, 2021

Choose a reason for hiding this comment

axic Jun 1, 2021

Choose a reason for hiding this comment

gumb0 Jun 1, 2021

Choose a reason for hiding this comment

gumb0 Jun 1, 2021

Choose a reason for hiding this comment

gumb0 Jun 3, 2021

Choose a reason for hiding this comment

chfast Jun 3, 2021

Choose a reason for hiding this comment

gumb0 Jun 4, 2021

Choose a reason for hiding this comment

gumb0 left a comment

Choose a reason for hiding this comment

chfast commented Jun 4, 2021

chfast commented May 4, 2021 •

edited

Loading

codecov bot commented May 31, 2021 •

edited

Loading