Potential floating point optimizations #592

axic · 2020-10-09T11:43:58Z

No description provided.

axic · 2020-10-09T11:44:29Z

lib/fizzy/execute.cpp

+
+inline bool signbit(float value) noexcept
+{
+    return (bit_cast<uint32_t>(value) & F32SignMask) != 0;


This generates two very simple instructions with GCC, but clang detects this pattern and replaces it with the SSE instruction.

axic · 2020-10-09T11:45:06Z

lib/fizzy/execute.cpp

@@ -356,7 +388,7 @@ inline T ffloor(T value) noexcept
    // the __builtin_floor() outputs -0 where it should +0.
    // The following workarounds the issue by using the fact that the sign of
    // the output must always match the sign of the input value.
-    return std::copysign(result, value);
+    return fcopysign(result, value);


This and the other cases may be faster or may be slower if the rest of the function already uses SSE registers.

axic · 2020-10-09T11:45:58Z

lib/fizzy/execute.cpp

@@ -408,7 +440,7 @@ inline T fmin(T a, T b) noexcept
    if (std::isnan(a) || std::isnan(b))
        return std::numeric_limits<T>::quiet_NaN();  // Positive canonical NaN.

-    if (a == 0 && b == 0 && (std::signbit(a) == 1 || std::signbit(b) == 1))


@chfast std::signbit actually returns a bool, why did use integer literals here and below?

I don't remember. Probably it is more clear that signbit is 1, while the meaning of true bit is confusing.

lib/fizzy/execute.cpp

codecov · 2020-10-14T22:48:18Z

Codecov Report

Merging #592 (c5ae9aa) into master (556a526) will increase coverage by 0.00%.
The diff coverage is 100.00%.

@@           Coverage Diff           @@
##           master     #592   +/-   ##
=======================================
  Coverage   99.27%   99.27%           
=======================================
  Files          88       88           
  Lines       13154    13158    +4     
=======================================
+ Hits        13058    13062    +4     
  Misses         96       96

Flag	Coverage Δ
rust	`98.47% <ø> (ø)`
spectests	`90.00% <100.00%> (+0.01%)`	⬆️
unittests	`99.21% <100.00%> (+<0.01%)`	⬆️
unittests-32	`99.31% <100.00%> (+<0.01%)`	⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Impacted Files	Coverage Δ
lib/fizzy/execute.cpp	`99.29% <100.00%> (+<0.01%)`	⬆️

axic · 2020-12-29T22:10:06Z

I think this is ready, just needs benchmarking to show which changes are useful.

axic · 2022-05-23T17:40:38Z

Rebased.

chfast

Benchmarking situation does not look good. I suspect the performance is related to the code layout in the binary. I can "stabilize" it in clang with -mbranches-within-32B-boundaries but not in GCC.

fizzy/execute/blake2b/512_bytes_rounds_1_mean                     +0.2276         +0.2276            68            83            68            83
fizzy/execute/blake2b/512_bytes_rounds_16_mean                    +0.2006         +0.2006          1022          1228          1022          1228
fizzy/execute/ecpairing/onepoint_mean                             +0.2205         +0.2205        328759        401248        328762        401251
fizzy/execute/keccak256/512_bytes_rounds_1_mean                   +0.4017         +0.4017            74           103            74           103
fizzy/execute/keccak256/512_bytes_rounds_16_mean                  +0.3864         +0.3864          1073          1488          1073          1488
fizzy/execute/memset/256_bytes_mean                               +0.0524         +0.0524             6             6             6             6
fizzy/execute/memset/60000_bytes_mean                             +0.0534         +0.0534          1279          1347          1279          1347
fizzy/execute/mul256_opt0/input1_mean                             +0.0047         +0.0047            25            25            25            25
fizzy/execute/ramanujan_pi/33_runs_mean                           +0.0093         +0.0093           101           102           101           102
fizzy/execute/sha1/512_bytes_rounds_1_mean                        +0.0147         +0.0147            76            77            76            77
fizzy/execute/sha1/512_bytes_rounds_16_mean                       +0.0149         +0.0149          1061          1076          1061          1076
fizzy/execute/sha256/512_bytes_rounds_1_mean                      +0.0016         +0.0016            73            73            73            73
fizzy/execute/sha256/512_bytes_rounds_16_mean                     -0.0024         -0.0024          1004          1002          1004          1002
fizzy/execute/taylor_pi/pi_1000000_runs_mean                      +0.0001         +0.0001         38213         38216         38213         38216
fizzy/execute/micro/eli_interpreter/exec105_mean                  -0.0059         -0.0059             4             4             4             4
fizzy/execute/micro/factorial/20_mean                             -0.0023         -0.0023             1             1             1             1
fizzy/execute/micro/fibonacci/24_mean                             +0.0091         +0.0091          4378          4417          4378          4417
fizzy/execute/micro/host_adler32/1_mean                           +0.0121         +0.0121             0             0             0             0
fizzy/execute/micro/host_adler32/1000_mean                        -0.0041         -0.0041            26            26            26            26
fizzy/execute/micro/icall_hash/1000_steps_mean                    +0.0374         +0.0374            61            63            61            63
fizzy/execute/micro/spinner/1_mean                                +0.0155         +0.0155             0             0             0             0
fizzy/execute/micro/spinner/1000_mean                             +0.0334         +0.0334             8             8             8             8
OVERALL_GEOMEAN                                                   +0.0691         +0.0691             0             0             0             0

chfast · 2022-05-24T12:57:17Z

lib/fizzy/execute.cpp

+template <typename T>
+T fcopysign(T a, T b) noexcept = delete;
+
+template <>


These should not be template specializations. Function overloading is good enough.

Didn't add this, just moved this function. Can remove.

Still good time to improve it.

These should not be template specializations. Function overloading is good enough.

Actually, I've learned this is not enough because of the way this is used. Ignore this request.

Yes, learned that yesterday too, but was lazy to close the PR 😓

chfast · 2022-05-24T12:57:56Z

lib/fizzy/execute.cpp

@@ -362,6 +375,25 @@ inline double fneg(double value) noexcept
    return bit_cast<double>(bit_cast<uint64_t>(value) ^ F64SignMask);
 }

+template <typename T>
+T fcopysign(T a, T b) noexcept = delete;


To make it more effective you should define it for two different types template <typename T, template U>.

Why? It should be the same type we're copying between?

If types are different the template will not match. E.g. fcopysign(0.0, 0.0f) will use fcopysign(double, double) but we want to be very restrictive to what overloads are provided.

fdiv, fmin, fmax have the same issue, plus all the basic add/mul/... ones. Should those be changed?

Moved these to #826 where we can clean all of them up.

chfast · 2022-05-24T19:42:57Z

lib/fizzy/execute.cpp

+T fcopysign(T a, T b) noexcept = delete;
+
+template <>
+inline float fcopysign(float a, float b) noexcept


Here we have small issue with the name: fcopysign vs std::copysign and signbit vs std::signbit.

Can rename fcopysign to copysign, it was an already existing feature and just moved. However if it is 100% compatible with std::copysign then should be renamed.

Hmm actually https://en.cppreference.com/w/cpp/numeric/math/copysign has some interesting comment:

float copysign ( float mag, float sgn );

If mag is NaN, then NaN with the sign of sgn is returned.
If sgn is -0, the result is only negative if the implementation supports the signed zero consistently in arithmetic operations.

I think it is fine to rename to copysign and follow C std names (which are bad by themselves).

The C standard does not require IEEE 754, but WebAssembly does. So we must use compilers that support IEEE 754. E.g. GCC supports signed zeros by default, but this can be disabled with -fno-signed-zeros. https://gcc.gnu.org/wiki/FloatingPointMath

chfast · 2022-05-28T13:57:08Z

lib/fizzy/execute.cpp

+template <typename T>
+T fcopysign(T a, T b) noexcept = delete;
+
+template <>


These should not be template specializations. Function overloading is good enough.

Actually, I've learned this is not enough because of the way this is used. Ignore this request.

axic · 2022-05-28T16:26:01Z

lib/fizzy/execute.cpp

+template <typename T>
+T signbit(T value) noexcept = delete;
+
+inline bool signbit(float value) noexcept


We may get a proper constexpr signbit in C++23: https://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p0533r5.pdf

axic commented Oct 9, 2020

View reviewed changes

chfast reviewed Oct 9, 2020

View reviewed changes

lib/fizzy/execute.cpp Outdated Show resolved Hide resolved

axic added the optimization Performance optimization label Oct 9, 2020

axic force-pushed the fp-optim branch 2 times, most recently from e574e0e to 532dbba Compare October 14, 2020 22:33

axic force-pushed the fp-optim branch from 532dbba to 847428b Compare November 3, 2020 13:23

axic marked this pull request as ready for review May 23, 2022 14:38

axic force-pushed the fp-optim branch from 847428b to a5690ef Compare May 23, 2022 17:39

chfast force-pushed the fp-optim branch from a5690ef to 73b46cf Compare May 24, 2022 12:20

chfast reviewed May 24, 2022

View reviewed changes

chfast force-pushed the fp-optim branch from 73b46cf to 0b03847 Compare May 24, 2022 19:58

axic force-pushed the fp-optim branch from 0b03847 to 5f25773 Compare May 28, 2022 00:34

chfast approved these changes May 28, 2022

View reviewed changes

axic added 4 commits May 28, 2022 17:48

Use the fcopysign helper in fceil and fnearest

c3379a9

Implement signbit using masks

34c91ba

Use fabs in fnearest

da3fa3c

Rename fcopysign to copysign

c5ae9aa

axic force-pushed the fp-optim branch from 5f25773 to c5ae9aa Compare May 28, 2022 15:48

axic commented May 28, 2022

View reviewed changes

axic merged commit 90ac6e0 into master May 28, 2022

axic deleted the fp-optim branch May 28, 2022 16:26

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Potential floating point optimizations #592

Potential floating point optimizations #592

axic commented Oct 9, 2020

axic Oct 9, 2020

axic Oct 9, 2020

axic Oct 9, 2020

chfast Oct 9, 2020

codecov bot commented Oct 14, 2020 •

edited

Loading

axic commented Dec 29, 2020

axic commented May 23, 2022

chfast left a comment

chfast May 24, 2022

axic May 24, 2022

chfast May 25, 2022

chfast May 28, 2022

axic May 28, 2022

chfast May 24, 2022

axic May 24, 2022

chfast May 25, 2022

axic May 28, 2022

axic May 28, 2022

chfast May 24, 2022

axic May 24, 2022

axic May 24, 2022

chfast May 25, 2022

chfast May 28, 2022

axic May 28, 2022

Potential floating point optimizations #592

Potential floating point optimizations #592

Conversation

axic commented Oct 9, 2020

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 14, 2020 • edited Loading

Codecov Report

axic commented Dec 29, 2020

axic commented May 23, 2022

chfast left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

codecov bot commented Oct 14, 2020 •

edited

Loading