Call optimization #554

chfast · 2020-09-25T14:54:16Z

~~Results may be wrong as some benchmarks don't use calls.~~ Confirmed for GCC10/LTO.

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.1128         -0.1128            87            78            87            78                                                                                 
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.1149         -0.1148          1329          1176          1329          1176                                                                                 
fizzy/execute/ecpairing/onepoint_mean                             -0.0918         -0.0918        411770        373976        411774        373980                                                                                 
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0856         -0.0856           105            96           105            96                                                                                 
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0999         -0.0999          1548          1393          1548          1393                                                                                 
fizzy/execute/memset/256_bytes_mean                               -0.1481         -0.1481             7             6             7             6                                                                                 
fizzy/execute/memset/60000_bytes_mean                             -0.1510         -0.1510          1623          1378          1623          1378                                                                                 
fizzy/execute/mul256_opt0/input0_mean                             -0.1334         -0.1334            29            25            29            25                                                                                 
fizzy/execute/mul256_opt0/input1_mean                             -0.1338         -0.1338            29            25            29            25                                                                                 
fizzy/execute/ramanujan_pi/33_runs_mean                           -0.1255         -0.1255           138           120           138           120                                                                                 
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.1141         -0.1141            94            84            94            84                                                                                 
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.1163         -0.1163          1318          1164          1318          1164                                                                                 
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.1193         -0.1193            96            84            96            84                                                                                 
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.1228         -0.1228          1326          1163          1326          1163                                                                                 
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0392         -0.0392         41668         40036         41669         40036                                                                                 
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.1528         -0.1528             0             0             0             0                                                                                 
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.1269         -0.1269             5             4             5             4                                                                                 
fizzy/execute/micro/factorial/10_mean                             -0.1033         -0.1032             0             0             0             0                                                                                 
fizzy/execute/micro/factorial/20_mean                             -0.1066         -0.1066             1             0             1             0                                                                                 
fizzy/execute/micro/fibonacci/24_mean                             -0.0947         -0.0947          5230          4735          5230          4735                                                                                 
fizzy/execute/micro/host_adler32/1_mean                           -0.0603         -0.0603             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         -0.0753         -0.0753             3             3             3             3
fizzy/execute/micro/host_adler32/1000_mean                        -0.0549         -0.0549            31            29            31            29
fizzy/execute/micro/spinner/1_mean                                -0.1555         -0.1555             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.1487         -0.1487            10             9            10             9

axic · 2020-09-30T23:52:23Z

Rebased this locally on #562, much more easier to read it grouped with execute.

axic · 2020-10-05T14:58:27Z

lib/fizzy/execute.cpp

+    assert(stack.size() >= num_args);
+    span<const Value> call_args{stack.rend() - num_args, num_args};
+
+    const auto ret = execute(instance, func_idx, call_args.begin(), depth + 1);


Actually, why do we have both .data() and .begin() when they point to the same thing?

Not in std::span.

codecov · 2020-10-05T15:21:39Z

Codecov Report

Merging #554 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #554   +/-   ##
=======================================
  Coverage   98.25%   98.25%           
=======================================
  Files          63       63           
  Lines        9224     9231    +7     
=======================================
+ Hits         9063     9070    +7     
  Misses        161      161

chfast · 2020-10-05T15:24:20Z

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     +0.1395         +0.1395            77            88            77            88
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    +0.1386         +0.1386          1164          1325          1164          1325
fizzy/execute/ecpairing/onepoint_mean                             +0.0746         +0.0746        384289        412951        384291        412955
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.1103         +0.1103            94           105            94           105
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0938         +0.0938          1387          1517          1387          1517
fizzy/execute/memset/256_bytes_mean                               +0.1888         +0.1888             6             7             6             7
fizzy/execute/memset/60000_bytes_mean                             +0.1996         +0.1996          1374          1649          1374          1649
fizzy/execute/mul256_opt0/input1_mean                             +0.1312         +0.1312            25            29            25            29
fizzy/execute/ramanujan_pi/33_runs_mean                           +0.1020         +0.1020           118           131           118           131
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.1375         +0.1375            84            95            84            95
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.1394         +0.1394          1163          1325          1163          1325
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.1316         +0.1316            84            96            84            96
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.1378         +0.1378          1164          1325          1164          1325
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      +0.0312         +0.0312         40031         41279         40032         41280
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.1739         +0.1739             4             5             4             5
fizzy/execute/micro/factorial/20_mean                             +0.0374         +0.0374             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             +0.0843         +0.0843          4957          5375          4957          5375
fizzy/execute/micro/host_adler32/1_mean                           +0.1161         +0.1161             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                        +0.1519         +0.1519            29            34            29            34
fizzy/execute/micro/spinner/1_mean                                +0.0550         +0.0550             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             +0.1180         +0.1180             9            10             9            10

I think we were just lucky with #552, because now whatever I change around invoke I get 10% regression. The same story is in #574.

chfast · 2020-10-20T09:57:48Z

Replaced by #602. The remaining code copy has no effect.

axic mentioned this pull request Oct 1, 2020

Pass arguments as const Value* #552

Merged

chfast mentioned this pull request Oct 1, 2020

Calls optimization plan #563

Open

axic force-pushed the call_optimization branch from c2d6235 to e70c686 Compare October 5, 2020 14:56

axic reviewed Oct 5, 2020

View reviewed changes

axic force-pushed the call_optimization branch from e70c686 to e0d1d78 Compare October 5, 2020 15:01

axic added the optimization Performance optimization label Oct 9, 2020

chfast and others added 3 commits October 13, 2020 19:07

Unwrap invoke_function

11a3ee4

Optimize result push

f9d4dbf

Further optimisation

9eaffe6

chfast force-pushed the call_optimization branch from 5c601fc to 9eaffe6 Compare October 13, 2020 17:07

axic mentioned this pull request Oct 13, 2020

Optimize call result push to the stack #602

Open

chfast closed this Oct 20, 2020

axic deleted the call_optimization branch November 6, 2020 17:21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Call optimization #554

Call optimization #554

chfast commented Sep 25, 2020 •

edited

Loading

axic commented Sep 30, 2020

axic Oct 5, 2020 •

edited

Loading

chfast Oct 5, 2020

codecov bot commented Oct 5, 2020 •

edited

Loading

chfast commented Oct 5, 2020

chfast commented Oct 20, 2020

Call optimization #554

Call optimization #554

Conversation

chfast commented Sep 25, 2020 • edited Loading

axic commented Sep 30, 2020

axic Oct 5, 2020 • edited Loading

Choose a reason for hiding this comment

chfast Oct 5, 2020

Choose a reason for hiding this comment

codecov bot commented Oct 5, 2020 • edited Loading

Codecov Report

chfast commented Oct 5, 2020

chfast commented Oct 20, 2020

chfast commented Sep 25, 2020 •

edited

Loading

axic Oct 5, 2020 •

edited

Loading

codecov bot commented Oct 5, 2020 •

edited

Loading