Change implementation of fnearest() #504

chfast · 2020-08-23T16:25:29Z

This implementation does not need to modify rounding direction to work
correctly under any rounding direction.

lib/fizzy/execute.cpp

axic · 2020-08-23T16:47:32Z

lib/fizzy/execute.cpp

-
-    return result;
+    const auto t = std::trunc(value);
+    if (const auto diff = std::abs(value - t); diff > 0.5f || (diff == 0.5f && !is_even(t)))


Wouldn't it be less branching to have two if cases than using abs? Or is abs constexpr and the compiler can optimise all this out?

The abs (or fabs in C) is recognized and replaced with __builtin_abs or something. This just masks out the sign bit: x & 0x7fffffff.

codecov · 2020-08-23T16:55:37Z

Codecov Report

Merging #504 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #504      +/-   ##
==========================================
- Coverage   99.69%   99.69%   -0.01%     
==========================================
  Files          54       54              
  Lines       17198    17180      -18     
==========================================
- Hits        17146    17128      -18     
  Misses         52       52

chfast · 2020-08-23T17:50:27Z

I also have another implementation based on bit manipulation which looks to be faster but I did not perform proper benchmarks yet.
I decided to start with this one, because it does not need special utils like class FP.

We may wait with that for the next release until having more FP test cases.

lib/fizzy/execute.cpp

chfast · 2020-09-02T12:22:50Z

Tested with #511: https://app.circleci.com/pipelines/github/wasmx/fizzy/4401/workflows/c600c27b-1f5e-4d6e-a2db-71613aa947d5

chfast · 2020-09-02T13:05:09Z

I'm checking some additional options because there are some performance regressions.

This implementation does not need to modify rounding direction to work correctly under any rounding direction.

chfast · 2020-09-02T14:56:41Z

GCC:

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0166         -0.0166            86            85            86            85
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0143         -0.0143          1308          1289          1308          1289
fizzy/execute/ecpairing/onepoint_mean                             -0.0481         -0.0481        439794        418619        439798        418623
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0126         -0.0126           104           102           104           102
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0132         -0.0132          1520          1500          1520          1500
fizzy/execute/memset/256_bytes_mean                               -0.0312         -0.0312             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             -0.0294         -0.0294          1576          1529          1576          1529
fizzy/execute/mul256_opt0/input0_mean                             -0.0820         -0.0820            28            26            28            26
fizzy/execute/mul256_opt0/input1_mean                             -0.0858         -0.0858            29            26            29            26
fizzy/execute/ramanujan_pi/33_runs_mean                           -0.0174         -0.0174           135           132           135           132
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0021         -0.0021            93            93            93            93
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0003         +0.0004          1297          1297          1297          1297
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.0058         -0.0058            94            94            94            94
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0018         -0.0018          1297          1295          1297          1295
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0665         -0.0665         42825         39978         42825         39978
fizzy/execute/micro/eli_interpreter/halt_mean                     +0.0229         +0.0229             0             0             0             0
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.0775         +0.0775             5             5             5             5
fizzy/execute/micro/factorial/10_mean                             +0.0131         +0.0131             0             0             0             0
fizzy/execute/micro/factorial/20_mean                             +0.0096         +0.0097             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             +0.0060         +0.0060          7502          7547          7502          7547
fizzy/execute/micro/host_adler32/1_mean                           +0.0236         +0.0236             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         +0.0102         +0.0102             3             3             3             3
fizzy/execute/micro/host_adler32/1000_mean                        -0.0106         -0.0106            30            29            30            29
fizzy/execute/micro/spinner/1_mean                                +0.0115         +0.0115             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0243         -0.0243            10            10            10            10

GCC+LTO:

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0146         -0.0146            84            83            84            83
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0174         -0.0174          1277          1255          1277          1255
fizzy/execute/ecpairing/onepoint_mean                             +0.0066         +0.0066        424685        427480        424689        427484
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.0286         +0.0286            98           101            98           101
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0227         +0.0227          1431          1464          1431          1464
fizzy/execute/memset/256_bytes_mean                               +0.0057         +0.0057             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             +0.0091         +0.0091          1544          1558          1544          1558
fizzy/execute/mul256_opt0/input0_mean                             +0.0105         +0.0105            26            26            26            26
fizzy/execute/mul256_opt0/input1_mean                             +0.0134         +0.0134            26            26            26            26
fizzy/execute/ramanujan_pi/33_runs_mean                           +0.0385         +0.0385           126           131           126           131
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.0085         +0.0085            90            91            90            91
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0118         +0.0118          1252          1267          1252          1267
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.0138         +0.0138            91            92            91            92
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.0116         +0.0116          1252          1267          1252          1267
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0223         -0.0223         42176         41237         42177         41238
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.0304         -0.0304             0             0             0             0
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.0399         +0.0399             5             5             5             5
fizzy/execute/micro/factorial/10_mean                             -0.0135         -0.0135             0             0             0             0
fizzy/execute/micro/factorial/20_mean                             -0.0191         -0.0191             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             +0.0026         +0.0026          7488          7507          7488          7507
fizzy/execute/micro/host_adler32/1_mean                           -0.0062         -0.0062             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         +0.0285         +0.0285             3             3             3             3
fizzy/execute/micro/host_adler32/1000_mean                        -0.0057         -0.0057            30            30            30            30
fizzy/execute/micro/spinner/1_mean                                +0.0172         +0.0172             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             +0.0434         +0.0434            10            11            10            11

So not really significant difference (none of the benchmarks uses this instruction). The change is not driven by performance anyway.

When marking the function with noinline and cold attributes, it gives some performance improvement, but I'm leaving that for some other time.

axic reviewed Aug 23, 2020

View reviewed changes

lib/fizzy/execute.cpp Show resolved Hide resolved

axic reviewed Aug 23, 2020

View reviewed changes

chfast force-pushed the better_nearest branch from 991b4db to 05b277f Compare August 23, 2020 17:47

gumb0 reviewed Sep 1, 2020

View reviewed changes

lib/fizzy/execute.cpp Outdated Show resolved Hide resolved

chfast force-pushed the better_nearest branch from 05b277f to 2f9edfe Compare September 2, 2020 07:31

axic reviewed Sep 2, 2020

View reviewed changes

lib/fizzy/execute.cpp Outdated Show resolved Hide resolved

chfast force-pushed the better_nearest branch from 2f9edfe to db4421c Compare September 2, 2020 10:03

gumb0 approved these changes Sep 2, 2020

View reviewed changes

Change implementation of fnearest()

4e42e95

This implementation does not need to modify rounding direction to work correctly under any rounding direction.

chfast force-pushed the better_nearest branch from 4e072bd to c413a4c Compare September 2, 2020 14:04

chfast force-pushed the better_nearest branch from c413a4c to 4e42e95 Compare September 2, 2020 14:57

axic approved these changes Sep 2, 2020

View reviewed changes

chfast merged commit 3dbd671 into master Sep 2, 2020

chfast deleted the better_nearest branch September 2, 2020 16:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Change implementation of fnearest() #504

Change implementation of fnearest() #504

chfast commented Aug 23, 2020

axic Aug 23, 2020

chfast Aug 23, 2020

codecov bot commented Aug 23, 2020 •

edited

Loading

chfast commented Aug 23, 2020

chfast commented Sep 2, 2020

chfast commented Sep 2, 2020

chfast commented Sep 2, 2020

Change implementation of fnearest() #504

Change implementation of fnearest() #504

Conversation

chfast commented Aug 23, 2020

axic Aug 23, 2020

Choose a reason for hiding this comment

chfast Aug 23, 2020

Choose a reason for hiding this comment

codecov bot commented Aug 23, 2020 • edited Loading

Codecov Report

chfast commented Aug 23, 2020

chfast commented Sep 2, 2020

chfast commented Sep 2, 2020

chfast commented Sep 2, 2020

codecov bot commented Aug 23, 2020 •

edited

Loading