Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pass function args by reference using span<> #359

Merged
merged 4 commits into from
Jul 14, 2020
Merged

Pass function args by reference using span<> #359

merged 4 commits into from
Jul 14, 2020

Conversation

chfast
Copy link
Collaborator

@chfast chfast commented May 28, 2020

This passes function arguments by reference (instead of by value with std::vector). This eliminates one allocation per call because - previously one allocation was done to create the vector, then another to create locals.

Changing std::vector to const std::vector& may have been enough, but span is more generic and easier to refactor with.

The snap does not have unit tests as it is trivial and not all methods are currently used after full transition.

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0834         -0.0835            85            78            85            78
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0758         -0.0758          1262          1166          1262          1166
fizzy/execute/ecpairing/onepoint_mean                             -0.1312         -0.1312        482229        418964        482231        418967
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0796         -0.0796            97            90            97            90
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0627         -0.0627          1391          1304          1391          1304
fizzy/execute/memset/256_bytes_mean                               -0.0782         -0.0786             8             7             8             7
fizzy/execute/memset/60000_bytes_mean                             -0.0921         -0.0921          1582          1436          1582          1436
fizzy/execute/mul256_opt0/input0_mean                             -0.1303         -0.1302            28            24            28            24
fizzy/execute/mul256_opt0/input1_mean                             -0.1234         -0.1234            28            24            28            25
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.1152         -0.1152            89            79            89            79
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.1149         -0.1149          1220          1080          1220          1080
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.1615         -0.1615            86            72            86            72
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.1501         -0.1501          1152           979          1152           979
fizzy/execute/micro/factorial/10_mean                             -0.0534         -0.0531             1             1             1             1
fizzy/execute/micro/factorial/20_mean                             -0.0798         -0.0801             2             2             2             2
fizzy/execute/micro/fibonacci/24_mean                             -0.0374         -0.0374         14579         14033         14579         14033
fizzy/execute/micro/host_adler32/1_mean                           -0.0425         -0.0435             1             1             1             1
fizzy/execute/micro/host_adler32/100_mean                         -0.2825         -0.2827             7             5             7             5
fizzy/execute/micro/host_adler32/1000_mean                        -0.3044         -0.3044            64            44            64            44
fizzy/execute/micro/spinner/1_mean                                -0.0070         -0.0074             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0017         -0.0020            11            11            11            11

@chfast chfast force-pushed the args_span branch 2 times, most recently from d8fe1cd to dcf6c97 Compare May 28, 2020 18:24
@codecov
Copy link

codecov bot commented May 28, 2020

Codecov Report

Merging #359 into master will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #359   +/-   ##
=======================================
  Coverage   99.17%   99.17%           
=======================================
  Files          49       49           
  Lines       13232    13236    +4     
=======================================
+ Hits        13123    13127    +4     
  Misses        109      109           

@chfast chfast marked this pull request as ready for review May 28, 2020 18:59
@chfast chfast requested review from gumb0 and axic May 28, 2020 18:59
Copy link
Member

@axic axic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good, but please add a basic unit test for span.

Perhaps also two test cases: one using std::vector as input and one using fizzy::stack.

std::vector<uint64_t> locals = std::move(args);
std::vector<uint64_t> locals;
locals.reserve(args.size() + code.local_count);
std::copy_n(args.begin(), args.size(), std::back_inserter(locals));
locals.resize(locals.size() + code.local_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need reserve + resize or can we stay with one?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also this seems to now expand it to args.size() + local_count + local_count.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could first resize here, then just copy_n without back_inserter

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is "minimal work" version.

  1. Firstly we allocate memory for both args and locals (locals.size() is still 0 at this point).
  2. We copy args.
  3. We fill locals with zero.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the end I used the what @gumb0 suggested. Micro-benchmarks included.

/// https://en.cppreference.com/w/cpp/container/span
/// Only `const T` is supported.
template <typename T, typename = typename std::enable_if<std::is_const_v<T>>::type>
struct span
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

any reason why it's a struct and not class?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As it's not a standard method, I think it should be called Span like our other classes like Stack and OperandStask

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is from C++20: https://en.cppreference.com/w/cpp/container/span.
But it is a class so I will change to class.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not, because it's not in std.

But it's just a nitpick, I don't care much.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will leave it as is to be a drop-in replacement for standardized "container".

lib/fizzy/span.hpp Outdated Show resolved Hide resolved
@chfast
Copy link
Collaborator Author

chfast commented Jun 3, 2020

New benchmarks

Haswell / clang-10

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     +0.0344         +0.0344            79            82            79            82
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    +0.0373         +0.0373          1174          1218          1174          1218
fizzy/execute/ecpairing/onepoint_mean                             -0.0805         -0.0805        466278        428763        466281        428766
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.0141         +0.0141            92            93            92            93
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0052         +0.0052          1341          1347          1341          1348
fizzy/execute/memset/256_bytes_mean                               +0.0126         +0.0128             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             +0.0330         +0.0330          1418          1465          1418          1465
fizzy/execute/mul256_opt0/input0_mean                             -0.0495         -0.0495            25            24            25            24
fizzy/execute/mul256_opt0/input1_mean                             -0.0476         -0.0478            25            23            25            23
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.0170         +0.0170            85            87            85            87
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0326         +0.0326          1164          1202          1164          1202
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.0343         +0.0343            76            79            76            79
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.0403         +0.0403          1033          1075          1033          1075
fizzy/execute/micro/factorial/10_mean                             -0.0724         -0.0720             1             1             1             1
fizzy/execute/micro/factorial/20_mean                             -0.0966         -0.0966             2             2             2             2
fizzy/execute/micro/fibonacci/24_mean                             -0.0492         -0.0492         11653         11080         11654         11080
fizzy/execute/micro/host_adler32/1_mean                           -0.0815         -0.0821             1             1             1             1
fizzy/execute/micro/host_adler32/100_mean                         -0.3114         -0.3112             6             4             6             4
fizzy/execute/micro/host_adler32/1000_mean                        -0.3286         -0.3285            58            39            58            39
fizzy/execute/micro/spinner/1_mean                                -0.0242         -0.0227             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0251         -0.0253            10             9            10             9

Cloud / gcc-9

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     +0.0836         +0.0835           104           113           104           113
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    +0.0600         +0.0600          1572          1666          1571          1666
fizzy/execute/ecpairing/onepoint_mean                             -0.0321         -0.0322        628236        608044        628141        607938
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.0622         +0.0621           128           136           128           136
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0510         +0.0510          1873          1969          1873          1969
fizzy/execute/memset/256_bytes_mean                               +0.1120         +0.1125             9            10             9            10
fizzy/execute/memset/60000_bytes_mean                             +0.1190         +0.1189          1816          2032          1816          2032
fizzy/execute/mul256_opt0/input0_mean                             +0.0534         +0.0532            34            35            34            35
fizzy/execute/mul256_opt0/input1_mean                             +0.0290         +0.0286            34            35            34            35
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.1258         +0.1255           107           121           107           121
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.1098         +0.1098          1534          1702          1534          1702
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.1132         +0.1128           111           123           111           123
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.1491         +0.1492          1500          1724          1500          1724
fizzy/execute/micro/factorial/10_mean                             +0.0141         +0.0136             2             2             2             2
fizzy/execute/micro/factorial/20_mean                             -0.0369         -0.0366             3             3             3             3
fizzy/execute/micro/fibonacci/24_mean                             -0.0212         -0.0210         16086         15746         16082         15744
fizzy/execute/micro/host_adler32/1_mean                           -0.0668         -0.0654             1             1             1             1
fizzy/execute/micro/host_adler32/100_mean                         -0.2675         -0.2685             9             6             9             6
fizzy/execute/micro/host_adler32/1000_mean                        -0.2713         -0.2711            80            58            80            59
fizzy/execute/micro/spinner/1_mean                                +0.0079         +0.0081             1             1             1             1
fizzy/execute/micro/spinner/1000_mean                             +0.1207         +0.1192            13            15            14            15

TEST(span, vector)
{
std::vector<uint64_t> vec{1, 2, 3, 4, 5, 6};
span<const uint64_t> s(&vec[1], 3);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
span<const uint64_t> s(&vec[1], 3);
const span<const uint64_t> s(&vec[1], 3);

stack.push(13);

constexpr auto num_items = 2;
span<const uint64_t> s(stack.rend() - num_items, num_items);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
span<const uint64_t> s(stack.rend() - num_items, num_items);
const span<const uint64_t> s(stack.rend() - num_items, num_items);

@axic
Copy link
Member

axic commented Jun 3, 2020

Not happy about regressions.

Can you also benchmark the bench-bls12 and ewasm-bench-bls12 branches?

@chfast chfast mentioned this pull request Jun 3, 2020
@chfast chfast marked this pull request as draft June 3, 2020 15:23
@chfast chfast force-pushed the args_span branch 4 times, most recently from 13f3967 to 0a2cdfd Compare June 9, 2020 19:35
@chfast
Copy link
Collaborator Author

chfast commented Jun 9, 2020

Final benchmarks:

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0954         -0.0954            92            83            92            83
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0970         -0.0970          1378          1245          1378          1245
fizzy/execute/ecpairing/onepoint_mean                             -0.1328         -0.1328        495862        430033        495865        430037
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0645         -0.0645           109           102           109           102
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0712         -0.0712          1605          1490          1605          1490
fizzy/execute/memset/256_bytes_mean                               -0.1045         -0.1042             8             7             8             7
fizzy/execute/memset/60000_bytes_mean                             -0.1149         -0.1149          1682          1489          1682          1489
fizzy/execute/mul256_opt0/input0_mean                             -0.0214         -0.0213            29            28            29            28
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.1034         -0.1034            99            89            99            89
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.0949         -0.0949          1360          1231          1360          1231
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.0683         -0.0683            98            92            98            92
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0719         -0.0719          1351          1254          1351          1254
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.0441         -0.0421             1             1             1             1
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.1081         -0.1085             6             5             6             5
fizzy/execute/micro/factorial/10_mean                             -0.0329         -0.0330             1             1             1             1
fizzy/execute/micro/factorial/20_mean                             -0.0484         -0.0486             2             2             2             2
fizzy/execute/micro/fibonacci/24_mean                             -0.0624         -0.0624         10202          9565         10202          9565
fizzy/execute/micro/host_adler32/1_mean                           -0.0664         -0.0683             1             1             1             1
fizzy/execute/micro/host_adler32/100_mean                         -0.2509         -0.2502             7             5             7             5
fizzy/execute/micro/host_adler32/1000_mean                        -0.2698         -0.2698            63            46            63            46
fizzy/execute/micro/spinner/1_mean                                -0.0064         -0.0055             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.1412         -0.1413            11            10            11            10

The last commit alone (locals initialization micro-optimization):

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0444         -0.0444            87            83            87            83
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0442         -0.0442          1302          1245          1302          1245
fizzy/execute/ecpairing/onepoint_mean                             -0.0412         -0.0412        448503        430033        448508        430037
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0485         -0.0485           108           102           108           102
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0530         -0.0530          1574          1490          1574          1490
fizzy/execute/memset/256_bytes_mean                               -0.0603         -0.0608             8             7             8             7
fizzy/execute/memset/60000_bytes_mean                             -0.0671         -0.0671          1596          1489          1596          1489
fizzy/execute/mul256_opt0/input0_mean                             +0.0232         +0.0229            28            28            28            28
fizzy/execute/mul256_opt0/input1_mean                             +0.0464         +0.0461            28            29            28            29
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0432         -0.0433            93            89            93            89
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.0438         -0.0438          1288          1231          1288          1231
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.0382         -0.0382            95            92            95            92
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0389         -0.0389          1305          1254          1305          1254
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.0120         -0.0106             1             1             1             1
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.0928         -0.0929             5             5             5             5
fizzy/execute/micro/factorial/10_mean                             -0.0217         -0.0218             1             1             1             1
fizzy/execute/micro/factorial/20_mean                             -0.0297         -0.0296             2             2             2             2
fizzy/execute/micro/fibonacci/24_mean                             -0.0322         -0.0322          9883          9565          9883          9565
fizzy/execute/micro/host_adler32/1_mean                           -0.0065         -0.0073             1             1             1             1
fizzy/execute/micro/host_adler32/100_mean                         +0.0168         +0.0181             5             5             5             5
fizzy/execute/micro/host_adler32/1000_mean                        +0.0018         +0.0016            46            46            46            46
fizzy/execute/micro/spinner/1_mean                                -0.0055         -0.0051             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.1121         -0.1127            11            10            11            10

And the init_locals micro benchmark. I used init_locals_2 version in the end.

2020-06-09 22:07:02
Running bin/fizzy-bench-internal
Run on (8 X 4400 MHz CPU s)
CPU Caches:
  L1 Data 32K (x4)
  L1 Instruction 32K (x4)
  L2 Unified 256K (x4)
  L3 Unified 8192K (x1)
Load Average: 0.37, 0.21, 0.29
---------------------------------------------------------------------------------------------------------
Benchmark                                                               Time             CPU   Iterations
---------------------------------------------------------------------------------------------------------
init_locals<std::vector<uint64_t>, init_locals_1>/0/0                3.50 ns         3.50 ns    399855907
init_locals<std::vector<uint64_t>, init_locals_1>/2/4                20.3 ns         20.3 ns     68914016
init_locals<std::vector<uint64_t>, init_locals_1>/2/38               21.7 ns         21.7 ns     65088967
init_locals<std::vector<uint64_t>, init_locals_1>/3/4                20.8 ns         20.8 ns     67351818
init_locals<std::vector<uint64_t>, init_locals_1>/3/8                20.8 ns         20.8 ns     67335267
init_locals<std::vector<uint64_t>, init_locals_1>/3/13               21.1 ns         21.1 ns     66367891
init_locals<std::vector<uint64_t>, init_locals_1>/5/30               23.0 ns         23.0 ns     60847178
init_locals<std::vector<uint64_t>, init_locals_1>/10/100             30.0 ns         30.0 ns     46648840
init_locals<std::vector<uint64_t>, init_locals_2>/0/0                3.02 ns         3.02 ns    466498082
init_locals<std::vector<uint64_t>, init_locals_2>/2/4                16.0 ns         16.0 ns     87704726
init_locals<std::vector<uint64_t>, init_locals_2>/2/38               17.3 ns         17.3 ns     81128366
init_locals<std::vector<uint64_t>, init_locals_2>/3/4                16.0 ns         16.0 ns     87719616
init_locals<std::vector<uint64_t>, init_locals_2>/3/8                26.3 ns         26.3 ns     53315818
init_locals<std::vector<uint64_t>, init_locals_2>/3/13               16.2 ns         16.2 ns     86340827
init_locals<std::vector<uint64_t>, init_locals_2>/5/30               17.3 ns         17.3 ns     81127571
init_locals<std::vector<uint64_t>, init_locals_2>/10/100             31.3 ns         31.3 ns     44782443
init_locals<std::vector<uint64_t>, init_locals_3>/0/0                5.11 ns         5.11 ns    274121448
init_locals<std::vector<uint64_t>, init_locals_3>/2/4                16.9 ns         16.9 ns     82546816
init_locals<std::vector<uint64_t>, init_locals_3>/2/38               18.0 ns         18.0 ns     78514857
init_locals<std::vector<uint64_t>, init_locals_3>/3/4                16.8 ns         16.8 ns     83104246
init_locals<std::vector<uint64_t>, init_locals_3>/3/8                20.1 ns         20.1 ns     69973124
init_locals<std::vector<uint64_t>, init_locals_3>/3/13               17.1 ns         17.1 ns     81763828
init_locals<std::vector<uint64_t>, init_locals_3>/5/30               18.3 ns         18.3 ns     76449607
init_locals<std::vector<uint64_t>, init_locals_3>/10/100             23.5 ns         23.5 ns     59551870
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/0/0          14.0 ns         14.0 ns     99962961
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/2/4          19.7 ns         19.7 ns     71189979
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/2/38         21.8 ns         21.8 ns     64390511
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/3/4          19.7 ns         19.7 ns     71192263
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/3/8          28.5 ns         28.5 ns     49105661
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/3/13         20.0 ns         20.0 ns     70165413
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/5/30         21.9 ns         21.9 ns     64009928
init_locals<std::unique_ptr<uint64_t[]>, init_locals_4>/10/100       42.0 ns         42.0 ns     33371801
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/0/0          18.1 ns         18.1 ns     77590018
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/2/4          20.4 ns         20.4 ns     68694953
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/2/38         20.1 ns         20.1 ns     69414188
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/3/4          20.4 ns         20.4 ns     68624295
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/3/8          22.3 ns         22.3 ns     62897744
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/3/13         20.2 ns         20.2 ns     69367344
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/5/30         21.5 ns         21.5 ns     65248845
init_locals<std::unique_ptr<uint64_t[]>, init_locals_5>/10/100       26.3 ns         26.3 ns     53309696

Maybe someone wants to explain why 3/8 case is outstanding?

@chfast chfast marked this pull request as ready for review June 9, 2020 20:09
@chfast chfast requested review from axic and gumb0 June 9, 2020 20:12
Copy link
Collaborator

@gumb0 gumb0 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good.

@gumb0
Copy link
Collaborator

gumb0 commented Jun 10, 2020

Maybe someone wants to explain why 3/8 case is outstanding?

3/8 is the smallest one that doesn't fit into the cache line, right? That would explain why it's much slower than 3/4 and 2/4. But no idea how it's slower than 3/13 😕

@axic axic changed the base branch from master to execute_helpers July 8, 2020 23:18
@@ -7,6 +7,7 @@
#include <gtest/gtest.h>
#include <lib/fizzy/limits.hpp>
#include <test/utils/asserts.hpp>
#include <test/utils/execute_helpers.hpp>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this file needs it?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It has the overload for execute(..., {...}).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But no new test was added, why wasn't this needed before?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Previously, the args were passed by std::vector which has constructor for {}.

@@ -33,7 +33,8 @@ TEST(end_to_end, milestone1)
const auto module = parse(wasm);
auto instance = instantiate(module);

EXPECT_THAT(execute(*instance, 0, {20, 22}), Result(20 + 22 + 20));
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This wouldn't work with span?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, because std::span is a reference type and {20, 22} is a temporary object.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So in the "external API" we should have a wrapper then.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, we can have this in public API. But, I'm not convinced this is super useful. In normal usage you rarely pass constant args to wasm. They come usually from other sources like CLI, etc.

For now I placed this in fizzy::test.

Base automatically changed from execute_helpers to master July 9, 2020 08:40
@chfast chfast force-pushed the args_span branch 3 times, most recently from 27b8366 to 70a4507 Compare July 10, 2020 13:17
@chfast
Copy link
Collaborator Author

chfast commented Jul 10, 2020

The latest benchmark with GCC10, Haswell, 4 GHz.

Comparing master to span
Benchmark                                                            Time             CPU      Time Old      Time New       CPU Old       CPU New
-------------------------------------------------------------------------------------------------------------------------------------------------
fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0503         -0.0503            91            87            91            87
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0581         -0.0581          1389          1309          1389          1309
fizzy/execute/ecpairing/onepoint_mean                             -0.0644         -0.0644        470893        440582        470896        440584
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.0076         +0.0076           102           103           102           103
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0027         +0.0027          1497          1501          1497          1501
fizzy/execute/memset/256_bytes_mean                               -0.0282         -0.0282             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             -0.0230         -0.0230          1607          1570          1607          1570
fizzy/execute/mul256_opt0/input0_mean                             +0.0469         +0.0469            27            28            27            28
fizzy/execute/mul256_opt0/input1_mean                             +0.0464         +0.0464            27            28            27            28
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0143         -0.0143            94            93            94            93
fizzy/execute/sha1/512_bytes_rounds_16_mean                       -0.0253         -0.0253          1314          1281          1314          1281
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.0324         -0.0324            98            95            98            95
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0419         -0.0419          1345          1288          1345          1288
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.2509         -0.2509             0             0             0             0
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.0378         -0.0378             5             5             5             5
fizzy/execute/micro/factorial/10_mean                             -0.0397         -0.0397             1             1             1             1
fizzy/execute/micro/factorial/20_mean                             -0.0289         -0.0289             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             -0.0384         -0.0384         10311          9916         10312          9916
fizzy/execute/micro/host_adler32/1_mean                           -0.3352         -0.3352             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         -0.2988         -0.2988             6             5             6             5
fizzy/execute/micro/host_adler32/1000_mean                        -0.2877         -0.2877            64            46            64            46
fizzy/execute/micro/spinner/1_mean                                -0.0380         -0.0380             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0243         -0.0243            10            10            10            10


namespace fizzy::test
{
inline execution_result execute(const Module& module, FuncIdx func_idx, std::vector<uint64_t> args)
inline execution_result execute(
Instance& instance, FuncIdx func_idx, std::initializer_list<uint64_t> args)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does it not help in tests where you declare args as arrays?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not included this overload in end_to_end tests where it has small number of case that could be converted to std::array.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's 4 cases, I'd still use it.

@@ -613,8 +611,8 @@ execution_result execute(
const auto& code = instance.module.codesec[code_idx];
auto* const memory = instance.memory.get();

std::vector<uint64_t> locals = std::move(args);
locals.resize(locals.size() + code.local_count);
std::vector<uint64_t> locals(args.size() + code.local_count, 0);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this zeroing it out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. First we allocate and zero for args + locals. Then we overwrite args with proper values.

@@ -1003,9 +1004,10 @@ TEST(execute, reuse_args)
"0061736d01000000010b0260027e7e017e6000017e03030200010a19020e002000200180210120002001820b08"
"004217420510000b");

const std::vector<uint64_t> args{20, 3};
const std::array<uint64_t, 2> args{20, 3};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this change needed?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not needed, but no need to use std::vector to allocate 2 args.


[[gnu::noinline]] auto init_locals_2(fizzy::span<const uint64_t> args, uint32_t local_count)
{
std::vector<uint64_t> locals(args.size() + local_count);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this the same as locals(args.size() + local_count, 0); e.g. the one used in execute.cpp?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. The 0 value is default.

Instance&, std::vector<uint64_t> args, int) -> execution_result {
return fizzy::execute(*instance1, *func_idx, std::move(args));
Instance&, span<const uint64_t> args, int) -> execution_result {
return fizzy::execute(*instance1, *func_idx, std::vector(args.begin(), args.end()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this?
It seems args passed to this lambda should live long enough on the caller side.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A mistake.

@chfast chfast merged commit 9ad07fa into master Jul 14, 2020
@chfast chfast deleted the args_span branch July 14, 2020 10:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants