Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Change implementation of fnearest() #504

Merged
merged 1 commit into from
Sep 2, 2020
Merged

Change implementation of fnearest() #504

merged 1 commit into from
Sep 2, 2020

Conversation

chfast
Copy link
Collaborator

@chfast chfast commented Aug 23, 2020

This implementation does not need to modify rounding direction to work
correctly under any rounding direction.


return result;
const auto t = std::trunc(value);
if (const auto diff = std::abs(value - t); diff > 0.5f || (diff == 0.5f && !is_even(t)))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be less branching to have two if cases than using abs? Or is abs constexpr and the compiler can optimise all this out?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The abs (or fabs in C) is recognized and replaced with __builtin_abs or something. This just masks out the sign bit: x & 0x7fffffff.

@codecov
Copy link

codecov bot commented Aug 23, 2020

Codecov Report

Merging #504 into master will decrease coverage by 0.00%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #504      +/-   ##
==========================================
- Coverage   99.69%   99.69%   -0.01%     
==========================================
  Files          54       54              
  Lines       17198    17180      -18     
==========================================
- Hits        17146    17128      -18     
  Misses         52       52              

@chfast
Copy link
Collaborator Author

chfast commented Aug 23, 2020

I also have another implementation based on bit manipulation which looks to be faster but I did not perform proper benchmarks yet.
I decided to start with this one, because it does not need special utils like class FP.

We may wait with that for the next release until having more FP test cases.

lib/fizzy/execute.cpp Outdated Show resolved Hide resolved
lib/fizzy/execute.cpp Outdated Show resolved Hide resolved
@chfast
Copy link
Collaborator Author

chfast commented Sep 2, 2020

@chfast
Copy link
Collaborator Author

chfast commented Sep 2, 2020

I'm checking some additional options because there are some performance regressions.

This implementation does not need to modify rounding direction to work
correctly under any rounding direction.
@chfast
Copy link
Collaborator Author

chfast commented Sep 2, 2020

GCC:

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0166         -0.0166            86            85            86            85
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0143         -0.0143          1308          1289          1308          1289
fizzy/execute/ecpairing/onepoint_mean                             -0.0481         -0.0481        439794        418619        439798        418623
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   -0.0126         -0.0126           104           102           104           102
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  -0.0132         -0.0132          1520          1500          1520          1500
fizzy/execute/memset/256_bytes_mean                               -0.0312         -0.0312             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             -0.0294         -0.0294          1576          1529          1576          1529
fizzy/execute/mul256_opt0/input0_mean                             -0.0820         -0.0820            28            26            28            26
fizzy/execute/mul256_opt0/input1_mean                             -0.0858         -0.0858            29            26            29            26
fizzy/execute/ramanujan_pi/33_runs_mean                           -0.0174         -0.0174           135           132           135           132
fizzy/execute/sha1/512_bytes_rounds_1_mean                        -0.0021         -0.0021            93            93            93            93
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0003         +0.0004          1297          1297          1297          1297
fizzy/execute/sha256/512_bytes_rounds_1_mean                      -0.0058         -0.0058            94            94            94            94
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0018         -0.0018          1297          1295          1297          1295
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0665         -0.0665         42825         39978         42825         39978
fizzy/execute/micro/eli_interpreter/halt_mean                     +0.0229         +0.0229             0             0             0             0
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.0775         +0.0775             5             5             5             5
fizzy/execute/micro/factorial/10_mean                             +0.0131         +0.0131             0             0             0             0
fizzy/execute/micro/factorial/20_mean                             +0.0096         +0.0097             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             +0.0060         +0.0060          7502          7547          7502          7547
fizzy/execute/micro/host_adler32/1_mean                           +0.0236         +0.0236             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         +0.0102         +0.0102             3             3             3             3
fizzy/execute/micro/host_adler32/1000_mean                        -0.0106         -0.0106            30            29            30            29
fizzy/execute/micro/spinner/1_mean                                +0.0115         +0.0115             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             -0.0243         -0.0243            10            10            10            10

GCC+LTO:

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     -0.0146         -0.0146            84            83            84            83
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    -0.0174         -0.0174          1277          1255          1277          1255
fizzy/execute/ecpairing/onepoint_mean                             +0.0066         +0.0066        424685        427480        424689        427484
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.0286         +0.0286            98           101            98           101
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.0227         +0.0227          1431          1464          1431          1464
fizzy/execute/memset/256_bytes_mean                               +0.0057         +0.0057             7             7             7             7
fizzy/execute/memset/60000_bytes_mean                             +0.0091         +0.0091          1544          1558          1544          1558
fizzy/execute/mul256_opt0/input0_mean                             +0.0105         +0.0105            26            26            26            26
fizzy/execute/mul256_opt0/input1_mean                             +0.0134         +0.0134            26            26            26            26
fizzy/execute/ramanujan_pi/33_runs_mean                           +0.0385         +0.0385           126           131           126           131
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.0085         +0.0085            90            91            90            91
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0118         +0.0118          1252          1267          1252          1267
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.0138         +0.0138            91            92            91            92
fizzy/execute/sha256/512_bytes_rounds_16_mean                     +0.0116         +0.0116          1252          1267          1252          1267
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      -0.0223         -0.0223         42176         41237         42177         41238
fizzy/execute/micro/eli_interpreter/halt_mean                     -0.0304         -0.0304             0             0             0             0
fizzy/execute/micro/eli_interpreter/exec105_mean                  +0.0399         +0.0399             5             5             5             5
fizzy/execute/micro/factorial/10_mean                             -0.0135         -0.0135             0             0             0             0
fizzy/execute/micro/factorial/20_mean                             -0.0191         -0.0191             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             +0.0026         +0.0026          7488          7507          7488          7507
fizzy/execute/micro/host_adler32/1_mean                           -0.0062         -0.0062             0             0             0             0
fizzy/execute/micro/host_adler32/100_mean                         +0.0285         +0.0285             3             3             3             3
fizzy/execute/micro/host_adler32/1000_mean                        -0.0057         -0.0057            30            30            30            30
fizzy/execute/micro/spinner/1_mean                                +0.0172         +0.0172             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             +0.0434         +0.0434            10            11            10            11

So not really significant difference (none of the benchmarks uses this instruction). The change is not driven by performance anyway.

When marking the function with noinline and cold attributes, it gives some performance improvement, but I'm leaving that for some other time.

@chfast chfast merged commit 3dbd671 into master Sep 2, 2020
@chfast chfast deleted the better_nearest branch September 2, 2020 16:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants