-
Notifications
You must be signed in to change notification settings - Fork 10.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SR-106: New floating-point description
implementation
#15474
SR-106: New floating-point description
implementation
#15474
Conversation
This replaces the current implementation of `description` and `debugDescription` for the standard floating-point types with a new formatting routine based on a variation of Florian Loitsch' Grisu2 algorithm with changes suggested by Andrysco, Jhala, and Lerner's 2016 paper describing Errol3. Unlike the earlier code based on `sprintf` with a fixed number of digits, this version always chooses the optimal number of digits. As such, we can now use the exact same output for both `description` and `debugDescription` (except of course that `debugDescription` provides full detail for NaNs). The implementation has been extensively commented; people familiar with Grisu-style algorithms should find the code easy to understand. This implementation is: * Fast. It uses only fixed-width integer arithmetic and has constant memory and time requirements. * Simple. It is only a little more complex than Loitsch' original implementation of Grisu2. The digit decomposition logic for double is less than 300 lines of standard C (half of which is common arithmetic support routines). * Always Accurate. Converting the decimal form back to binary (using an accurate algorithm such as Clinger's) will always yield exactly the original binary value. For the IEEE 754 formats, the round-trip will produce exactly the same bit pattern in memory. This is an essential requirement for JSON serialization, debugging, and logging. * Always Short. This always selects an accurate result with the minimum number of decimal digits. (So that `1.0 / 10.0` will always print `0.1`.) * Always Close. Among all accurate, short results, this always chooses the result that is closest to the exact floating-point value. (In case of an exact tie, it rounds the last digit even.) This resolves SR-106 and related issues that have complained about the floating-point `description` properties being inexact.
@swift-ci Please benchmark |
I just noticed the bottom label got cut off on the graph. The horizontal axis is |
Build comment file:Optimized (O)Regression (4)
Improvement (11)
No Changes (410)
Unoptimized (Onone)Regression (6)
Improvement (16)
No Changes (403)
Hardware Overview
|
} else if sign == .minus { | ||
return "-inf" | ||
} else { | ||
return "inf" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we short-circuit inf here, we probably don't need to worry about it at the next level down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed infinity handling in the next layer down.
@@ -83,15 +83,29 @@ extension ${Self} : CustomStringConvertible { | |||
/// A textual representation of the value. | |||
@_inlineable // FIXME(sil-serialize-all) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can anyone explain why this is @_inlineable
? This seems to me like a natural ABI boundary, so I'm curious if there was a specific reason for pushing it further down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@moiseev ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on the FIXME, Max added this mechanically, to smoothen the transition from previous versions where everything in the stdlib was implicitly inlinable. Attributes marked like this were added in bulk and they can (and should) be removed when we know better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll do some perf testing to see if removing this makes a real difference.
stdlib/public/core/Runtime.swift.gyb
Outdated
let significand = value.significandBitPattern | ||
if significand == 0 { | ||
// Infinity | ||
if value.isInfinite { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be omitted if inf
is handled at the next layer out.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed infinity handling here. Infinity is now handled in two places: In the outermost Swift code for description
and debugDescription
(which return constant strings) and in the actual C formatting code. (Because of the former, the latter is not technically necessary except to ensure that the C formatter is actually complete.)
@@ -408,37 +408,17 @@ internal func _float${bits}ToString( | |||
_ value: Float${bits}, debug: Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've kept the debug
flag here even though it's not used. The only difference now between debugDescription
and description
is how they print NaNs. That difference is now entirely encapsulated -- description
checks for and returns the static string immediately; the bottom C layer always produces the debug form, so debugDescription
can just pass NaNs down.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go ahead and get rid of the flag, but I'm OK with that happening in a follow-on cleanup patch.
stdlib/public/core/Runtime.swift.gyb
Outdated
let actualLength = _float${bits}ToStringImpl(bufferPtr, 32, value, debug) | ||
return String._fromWellFormedCodeUnitSequence( | ||
UTF8.self, | ||
input: UnsafeBufferPointer(start: bufferPtr, count: Int(actualLength))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does anything need to change here to take advantage of @milseman's recent Small String work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small string support hasn't been merged and I haven't come up with a clean alternative to _fromWellFormedCodeUnitSequence
for them yet. When you merge, file a bug against me
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ideally, I'd like to have something like String._unsafeFromWellFormedASCII(buffer:)
to use here. If the buffer length is short, build and return a small ASCII string. Otherwise, allocate and memcpy
to build a long ASCII string. In either case, omitting the charset shenanigans would be a sizable win.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have just what you need in the small-string branch: https://github.com/milseman/swift/blob/83030fb4fb7b593c16a7382ced82cbecfe211a4f/stdlib/public/core/String.swift#L647
edit: permalink
@@ -408,37 +408,17 @@ internal func _float${bits}ToString( | |||
_ value: Float${bits}, debug: Bool |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should go ahead and get rid of the flag, but I'm OK with that happening in a follow-on cleanup patch.
stdlib/public/runtime/SwiftDtoa.c
Outdated
|
||
#include "swift/Runtime/SwiftDtoa.h" | ||
|
||
#if defined(__x86_64__) || defined(__aarch64__) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be #if defined __SIZEOF_INT128__
, which will let it work on any platform where clang supports the type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
stdlib/public/runtime/SwiftDtoa.c
Outdated
|
||
// Try to verify that the system floating-point types really are what we | ||
// expect. Note that the code below is specific to these exact | ||
// floating-point representations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All of these checks are assumptions baked into Swift/LLVM already. It's OK to have them for documentation purposes, but they're unnecessary.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are mostly intended as simple guards in case someone copies this code elsewhere. However, even within the Swift stdlib, we cannot safely assume that C long double
really is Float80, so that check at least is important to have here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yup, that's fine.
stdlib/public/runtime/SwiftDtoa.c
Outdated
#define multiply128xi32(lhs, rhs) (*(lhs) *= (rhs)) | ||
#define initialize128WithHigh64(dest, value) ((dest) = (__uint128_t)(value) << 64) | ||
#define extractHigh64From128(arg) ((uint64_t)((arg) >> 64)) | ||
static int extractIntegerPart128(__uint128_t *fixed128, int fractionBits) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please add a short comment documenting the behavior of this function, as it doesn't just produce the integer part of fixed128
, it also clears those bits in the source.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've split it into separate extract
and clear
operations to clarify.
stdlib/public/runtime/SwiftDtoa.c
Outdated
__uint128_t full = (__uint128_t)lhs * rhs; | ||
return (uint64_t)(full >> 32); | ||
#else | ||
static const uint64_t mask32 = ((uint64_t)1 << 32) - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably write this UINT32_MAX
. I might also pull it out of these functions, since it's used over and over again.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've changed the spelling to UINT32_MAX
. I'll see about consolidating it as part of a follow-on.
stdlib/public/runtime/SwiftDtoa.c
Outdated
uint64_t t = (lhs & mask32) * (rhs & mask32); | ||
t >>= 32; | ||
uint64_t a = (lhs >> 32) * (rhs & mask32); | ||
uint64_t b = (lhs & mask32) * (rhs >> 32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The next bit can be simplified just a little if you want.
t += a + (b & mask32); // cannot overflow because a <= (2^64 - 2^33 + 1) and t,b <= (2^32 - 1).
t >>= 32;
t += (b >> 32);
return t + (lhs >> 32) * (rhs >> 32);
This is pretty minor, but it does save one add with carry. It's a bigger deal in the operations further down this file, where it saves multiple carries. More generally, you can clean all this up a lot with a few small helper functions (note that I've just sketched these pretty quickly here in the browser without testing):
static uint64_t mul32x32(uint32_t lhs, uint32_t rhs) {
return (uint64_t)lhs*rhs;
}
static uint64_t mul32x32add1(uint32_t lhs, uint32_t rhs, uint32_t add) {
return (uint64_t)lhs*rhs + add;
}
static uint64_t mul32x32add2(uint32_t lhs, uint32_t rhs, uint32_t add0, uint32_t add1) {
return (uint64_t)lhs*rhs + add0 + add1;
}
static uint64_t multiply64x64RoundingDown(uint64_t lhs, uint64_t rhs) {
uint64_t t00 = mul32x32(lhs, rhs);
uint64_t t01 = mul32x32(lhs, rhs >> 32);
uint64_t t10 = mul32x32add2(lhs >> 32, rhs, t00 >> 32, t01);
return mul32x32add2(lhs >> 32, rhs >> 32, t01 >> 32, t10 >> 32);
}
static uint64_t multiply64x64RoundingUp(uint64_t lhs, uint64_t rhs) {
uint64_t t00 = mul32x32add1(lhs, rhs, mask32);
uint64_t t01 = mul32x32add1(lhs, rhs >> 32, mask32);
uint64_t t10 = mul32x32add2(lhs >> 32, rhs, t00 >> 32, t01);
return mul32x32add2(lhs >> 32, rhs >> 32, t01 >> 32, t10 >> 32);
}
....
I know that you've already run your tests on the existing 32b code, which is somewhat lengthy, so I'm OK with cleaning this up later as a minor optimization PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice ideas! I'll work through this in a follow-up PR.
test/stdlib/PrintFloat.swift.gyb
Outdated
#if !arch(i386) | ||
// 32-bit i386 lacks signaling Double nans | ||
expectNaN("snan", Double.signalingNaN) | ||
expectNaN("-snan", -Double.signalingNaN) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@rintaro : I'd like to add a test for Double(nan: 0x4000000000001, signaling: true)
, but I'm not sure how that should print. I think it should print as "snan(0x4000000000001)"
but I think the previous code would print it as "snan"
without the payload.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, Double.init(nan:signaling:)
limits the max payload value to 0x3_ffff_ffff_ffff
because bit50 is reserved for default payload for signaling NaN. And the Double.debugDescription
for NaN takes up to bit49 into account. #2494
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, the standard does not reserve any bit for signaling NaN. It only requires that the quiet bit is not set and the payload is not zero.
I would expect to reserve a single value as the default signaling payload (not a particular bit), so that Double(nan: 0x4_0000_0000_0000, signaling: true).debugDescription
produces "snan"
but Double(nan: 0x4_0000_0000_0001, signaling: true).debugDescription
produces "snan(0x4000000000001)"
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell, the standard does not reserve any bit for signaling NaN. It only requires that the quiet bit is not set and the payload is not zero.
It's weaker than that. There is no "quiet bit" (as a shall clause) in the standard. Some HW platforms use a "signaling bit" instead.
However, for platforms with the common-case behavior that the high-order significand bit is a quiet bit, Swift decided to canonicalize on b10xxx...xxx
for qNaN and b01xxx...xxx
for sNaN because this makes the same set of payloads encodable with each; otherwise quieting a sNaN is a lossy operation, which is undesirable.
(Note that we should actually switch to using b11xxx...xxx
for qNaNs, so that b00xxx...xxx
can serve as reserved bit patterns that are never generated by pure Swift operations on HW with the customary quiet bit. This is potentially useful for some boxing schemes.)
The previous code was unnecessarily obfuscated by the attempt to combine these two operations.
What significant outstanding issues block this proceeding at this point? |
@swift-ci Please test and merge |
@swift-ci please test |
Build failed |
Build failed |
@swift-ci Please smoke test |
Looks like you need to update To iterate quicker, you can run (build-script with
edit: more details |
@milseman: Thanks! It took a while to get the tests running, but I've verified the corrected results against the 32-bit i386 iPhone simulator. I'd like to verify it against a real 32-bit ARM device as well; any hints? |
@swift-ci Please smoke test |
@jckarter any idea why there would be the following failure on Linux?
|
@milseman You may need to put |
Adding a C source file somehow exposed an issue in an unrelated C++ file. Thanks to Joe Groff for the fix.
Interesting: Adding any C file to stdlib/public/runtime/CMakeLists.txt will unmask this. Some quick tests suggest that @jckarter's suggested change fixes it. |
@swift-ci Please smoke test |
Now the Linux build fails at a different point:
|
Having a C file in stdlib/public/runtime causes strange build failures on Linux in unrelated C++ files. As a workaround, rename SwiftDtoa.c to .cpp to see if that avoids the problems.
I've renamed SwiftDtoa.c to SwiftDtoa.cpp to see if that avoids the Linux problems. |
@swift-ci Please smoke test |
@swift-ci smoke test |
This reverts commit 6cd5c20.
I've backed out 6cd5c20, since it's unrelated. (I think it's probably a good change, but should be handled separately.) With that, as soon as the smoke tests look good, merge away! |
@swift-ci smoke test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious
stdlib/public/runtime/SwiftDtoa.cpp
Outdated
// | ||
// Note: This is really a C file, but Swift's build system for Linux is | ||
// partially allergic to C, so it's being compiled as ".cpp" for now. Please | ||
// don't infect it with C++-isms. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"Don't infect with C++-isms": Because I expect to have some discussions soon about using this code in other projects that may not be able to easily use C++. Keeping this pure C (at least for now) will simplify possible reuse.
Plus, my test suites are predominantly pure C right now and I have a number of simplifications and/or performance improvements to the code that I'm actively testing.
@swift-ci please smoke test |
This was an oversight from PR swiftlang#15474. Most of the Float80 tests were correctly compiled only on `!Windows && (x86 || i386)` but I failed to mark some Float80 test data. Fixes: Radar 39246292
This replaces the current implementation of
description
anddebugDescription
for the standard floating-point types with a newformatting routine that provides both improved performance and
better results.
Unlike the earlier code based on
sprintf
with a fixed number ofdigits, this logic automatically chooses the optimal number of digits,
generating compact output where possible while ensuring round-trip
accuracy. As such, we can now use the exact same output for both
description
anddebugDescription
(except of course thatdebugDescription
providesfull detail for NaNs).
This resolves SR-106, SR-454, SR-491, SR-3131 and other related issues that have complained
about the floating-point
description
anddebugDescription
properties being inexactand/or inconsistent.
Ergonomics
With the new code, the REPL generally prints the values you would expect.
In comparison, the previous implementation routinely prints extraneous digits for
debugDescription
(used by the REPL) and omits significant digits indescription
(used byprint
):Of course, this only changes how the floating-point numbers are printed. The actual parsing, storage, and arithmetic operations are unaffected and are still subject to the same rounding issues common to all floating-point arithmetic.
About the Algorithm
The
SwiftDtoa.c
file here implements a variation of Florian Loitsch' Grisu2algorithm with changes suggested by Andrysco, Jhala, and Lerner's 2016
paper describing Errol3.
The implementation is:
Fast. It uses only fixed-width integer arithmetic and has constant
memory and time requirements.
Simple. It is only a little more complex than Loitsch' original
implementation of Grisu2. The digit decomposition logic for double is
less than 300 lines of standard C (half of which is common arithmetic
support routines).
Always Accurate. Converting the decimal form back to binary using an
accurate algorithm (such as Clinger's algorithm) will always yield exactly the
original binary value. For the IEEE 754 formats, the round-trip will
produce exactly the same bit pattern in memory. This is an essential
requirement for debugging, logging, and JSON serialization.
Always Short. This always selects an accurate result with the minimum
number of decimal digits. (So that
1.0 / 10.0
will always print0.1
.)Always Close. Among all accurate, short results, this always chooses
the result that is closest to the exact floating-point value. (In case
of an exact tie, it rounds the last digit even.)
Performance
The graph below compares the new code (in green) to the performance of three other
popular algorithms. Note that this graph benchmarks just the underlying C code; the Swift
description
logic must spend additional effort to allocate a returned String.printf
family of functions.SwiftDtoa
implementation (in green) is similar to the Errol algorithms but uses fixed-width integer arithmetic throughout. This gives it uniform fast performance regardless of the input value.