-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential floating point optimizations #592
Conversation
|
||
inline bool signbit(float value) noexcept | ||
{ | ||
return (bit_cast<uint32_t>(value) & F32SignMask) != 0; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This generates two very simple instructions with GCC, but clang detects this pattern and replaces it with the SSE instruction.
lib/fizzy/execute.cpp
Outdated
@@ -356,7 +388,7 @@ inline T ffloor(T value) noexcept | |||
// the __builtin_floor() outputs -0 where it should +0. | |||
// The following workarounds the issue by using the fact that the sign of | |||
// the output must always match the sign of the input value. | |||
return std::copysign(result, value); | |||
return fcopysign(result, value); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This and the other cases may be faster or may be slower if the rest of the function already uses SSE registers.
@@ -408,7 +440,7 @@ inline T fmin(T a, T b) noexcept | |||
if (std::isnan(a) || std::isnan(b)) | |||
return std::numeric_limits<T>::quiet_NaN(); // Positive canonical NaN. | |||
|
|||
if (a == 0 && b == 0 && (std::signbit(a) == 1 || std::signbit(b) == 1)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chfast std::signbit
actually returns a bool, why did use integer literals here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't remember. Probably it is more clear that signbit is 1
, while the meaning of true
bit is confusing.
e574e0e
to
532dbba
Compare
Codecov Report
@@ Coverage Diff @@
## master #592 +/- ##
=======================================
Coverage 99.27% 99.27%
=======================================
Files 88 88
Lines 13154 13158 +4
=======================================
+ Hits 13058 13062 +4
Misses 96 96
Flags with carried forward coverage won't be shown. Click here to find out more.
|
I think this is ready, just needs benchmarking to show which changes are useful. |
Rebased. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarking situation does not look good. I suspect the performance is related to the code layout in the binary. I can "stabilize" it in clang with -mbranches-within-32B-boundaries
but not in GCC.
fizzy/execute/blake2b/512_bytes_rounds_1_mean +0.2276 +0.2276 68 83 68 83
fizzy/execute/blake2b/512_bytes_rounds_16_mean +0.2006 +0.2006 1022 1228 1022 1228
fizzy/execute/ecpairing/onepoint_mean +0.2205 +0.2205 328759 401248 328762 401251
fizzy/execute/keccak256/512_bytes_rounds_1_mean +0.4017 +0.4017 74 103 74 103
fizzy/execute/keccak256/512_bytes_rounds_16_mean +0.3864 +0.3864 1073 1488 1073 1488
fizzy/execute/memset/256_bytes_mean +0.0524 +0.0524 6 6 6 6
fizzy/execute/memset/60000_bytes_mean +0.0534 +0.0534 1279 1347 1279 1347
fizzy/execute/mul256_opt0/input1_mean +0.0047 +0.0047 25 25 25 25
fizzy/execute/ramanujan_pi/33_runs_mean +0.0093 +0.0093 101 102 101 102
fizzy/execute/sha1/512_bytes_rounds_1_mean +0.0147 +0.0147 76 77 76 77
fizzy/execute/sha1/512_bytes_rounds_16_mean +0.0149 +0.0149 1061 1076 1061 1076
fizzy/execute/sha256/512_bytes_rounds_1_mean +0.0016 +0.0016 73 73 73 73
fizzy/execute/sha256/512_bytes_rounds_16_mean -0.0024 -0.0024 1004 1002 1004 1002
fizzy/execute/taylor_pi/pi_1000000_runs_mean +0.0001 +0.0001 38213 38216 38213 38216
fizzy/execute/micro/eli_interpreter/exec105_mean -0.0059 -0.0059 4 4 4 4
fizzy/execute/micro/factorial/20_mean -0.0023 -0.0023 1 1 1 1
fizzy/execute/micro/fibonacci/24_mean +0.0091 +0.0091 4378 4417 4378 4417
fizzy/execute/micro/host_adler32/1_mean +0.0121 +0.0121 0 0 0 0
fizzy/execute/micro/host_adler32/1000_mean -0.0041 -0.0041 26 26 26 26
fizzy/execute/micro/icall_hash/1000_steps_mean +0.0374 +0.0374 61 63 61 63
fizzy/execute/micro/spinner/1_mean +0.0155 +0.0155 0 0 0 0
fizzy/execute/micro/spinner/1000_mean +0.0334 +0.0334 8 8 8 8
OVERALL_GEOMEAN +0.0691 +0.0691 0 0 0 0
template <typename T> | ||
T fcopysign(T a, T b) noexcept = delete; | ||
|
||
template <> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not be template specializations. Function overloading is good enough.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't add this, just moved this function. Can remove.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still good time to improve it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not be template specializations. Function overloading is good enough.
Actually, I've learned this is not enough because of the way this is used. Ignore this request.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, learned that yesterday too, but was lazy to close the PR 😓
lib/fizzy/execute.cpp
Outdated
@@ -362,6 +375,25 @@ inline double fneg(double value) noexcept | |||
return bit_cast<double>(bit_cast<uint64_t>(value) ^ F64SignMask); | |||
} | |||
|
|||
template <typename T> | |||
T fcopysign(T a, T b) noexcept = delete; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To make it more effective you should define it for two different types template <typename T, template U>
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why? It should be the same type we're copying between?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If types are different the template will not match. E.g. fcopysign(0.0, 0.0f)
will use fcopysign(double, double)
but we want to be very restrictive to what overloads are provided.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fdiv
, fmin
, fmax
have the same issue, plus all the basic add
/mul
/... ones. Should those be changed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Moved these to #826 where we can clean all of them up.
lib/fizzy/execute.cpp
Outdated
T fcopysign(T a, T b) noexcept = delete; | ||
|
||
template <> | ||
inline float fcopysign(float a, float b) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Here we have small issue with the name: fcopysign
vs std::copysign
and signbit
vs std::signbit
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can rename fcopysign
to copysign
, it was an already existing feature and just moved. However if it is 100% compatible with std::copysign
then should be renamed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm actually https://en.cppreference.com/w/cpp/numeric/math/copysign has some interesting comment:
float copysign ( float mag, float sgn );
If mag is NaN, then NaN with the sign of sgn is returned.
If sgn is -0, the result is only negative if the implementation supports the signed zero consistently in arithmetic operations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is fine to rename to copysign
and follow C std names (which are bad by themselves).
The C standard does not require IEEE 754, but WebAssembly does. So we must use compilers that support IEEE 754. E.g. GCC supports signed zeros by default, but this can be disabled with -fno-signed-zeros
. https://gcc.gnu.org/wiki/FloatingPointMath
template <typename T> | ||
T fcopysign(T a, T b) noexcept = delete; | ||
|
||
template <> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should not be template specializations. Function overloading is good enough.
Actually, I've learned this is not enough because of the way this is used. Ignore this request.
template <typename T> | ||
T signbit(T value) noexcept = delete; | ||
|
||
inline bool signbit(float value) noexcept |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We may get a proper constexpr
signbit
in C++23: https://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p0533r5.pdf
No description provided.