Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Float printing and/or parsing is inaccurate #24557

Closed
hanna-kruppe opened this issue Apr 18, 2015 · 23 comments · Fixed by #27307
Closed

Float printing and/or parsing is inaccurate #24557

hanna-kruppe opened this issue Apr 18, 2015 · 23 comments · Fixed by #27307
Labels
P-medium Medium priority

Comments

@hanna-kruppe
Copy link
Contributor

This example program produces a frighteningly large amount of numbers that, without even leaving the save haven of [0.0; 1.0), change when printed out and read back in:

#![feature(rand)]

use std::rand::{Rng, thread_rng};

const N: u32 = 30;

fn main() {
    let mut r = thread_rng();
    let mut errors = 0;
    for i in 0..N {
        let x0 = r.next_f64();
        let x1 = format!("{:.17}", x0).parse().unwrap();
        if x0 != x1 {
            //println!("{:.17} {:.17} {:e}", x0, x1, x0 - x1);
            errors += 1;
        }
    }
    println!("Found {} non-round trip-safe numbers among {} random numbers", errors, N);
}

Not all numbers fail to round trip, but in all my trials a majority did. The error is in the order of 1e-16 (much more, around 1e-6, for the default {} formatting) for all outputs I looked at, but that is more than IEEE-754 permits (it requires correct round trips when 17 decimal digits are printed) and certainly more than users should be forced to endure. Perfect round tripping is possible, useful, and important --- it's just a pain to implement.

I have recently worked my way through float formatting, and at the very least it looks pretty naive (as in, code and comments read as if the authors didn't care at all about rounding error, or were entirely unaware that it's a thing). I also just skimmed over the FromStr implementation and it looks a bit smarter, but since there is no reference to any paper and it doesn't use bignums I have doubts about its accuracy as well.

Many numbers reach a fixed point after one round trip, but there are also numbers which take hundreds, thousands, or even more round trips before they settle down --- if they settle down at all (some cycle). The largest I found was almost nine million round trips, which is almost absurd. For obvious reasons, the error changing (presumably, growing) over time is even worse than a one-time inaccurate rounding. Here's the program I used to find those.

For the formatting side of things, I also filed #24556

I know less about the parsing side of things. The topic seems to get less attention in general, but I found a paper by William Clinger which seems promising.

cc @rprichard

@pnkfelix
Copy link
Member

potential dupe of #7030 and/or #7648 . I've been meaning to try to look at this for a long long time; for the most part we've always just triaged these issues as "just a bug", but that's not a reason to not try to fix them, if someone has time.

I am very familiar with Will Clinger's paper, as he was my Ph.D advisor -- though floating-point parsing was not my topic area -- and I have seen him present the material more than once. Also relevant are the other papers I listed on my comment here:

#7648 (comment)

and possibly also relevant is Steele and White, also from PLDI 1990, "How to Print Floating Point Numbers Accurately"

and possibly also this recent paper:

@hanna-kruppe
Copy link
Contributor Author

Thanks for the dupes, I'm useless with GitHub's search.

If you check out the companion #24556 you'll see that lifthrasiir has an implementation (combining Loitsch's algorithm and the algorithm of Steele and White) and IIUC is intending to contribute it. I'm very interested in seeing this fixed, so I'll try implementing Clinger's algorithm over the next weeks.

@hanna-kruppe
Copy link
Contributor Author

I only just realized that there are actually functions in the standard libraries that display in bases other than ten. They all seem to be deprecated, though. Is this a thing that should be supported indefinitely, or is this slated for removal in the near future? That would be a bummer, since (as discussed in one of the other issues) a lot of the research focuses on base 10.

@nagisa
Copy link
Member

nagisa commented Apr 19, 2015

cc @lifthrasiir, because you were working on making conversions correct AFAIR.

@lifthrasiir
Copy link
Contributor

I originally planned to work on dec2flt once my flt2dec branch (now #24612) is finished, but my schedule was too sloppy. I wouldn't care if @rkruppe or others want to take this :)

@rkruppe Base 10 is not that special, but many practical algorithms rely on the precalculated table which is base-specific. It is not reasonable to ship all tables for possible bases.

@lilyball
Copy link
Contributor

lilyball commented May 4, 2015

I just ran into this issue. Printing a f64 value with 17 decimal digits of precision doesn't seem to affect the point at which it rounds off the value, it simply makes it print more 0s. For example, 0.6f64 is not precisely representable, and printing it with 17 decimal digits of precision should yield 0.59999999999999998, but instead it yields 0.60000000000000000.

@pnkfelix
Copy link
Member

pnkfelix commented May 4, 2015

@kballard that should be fixed by @lifthrasiir 's PR #24612, at least according to my understanding of the algorithms and the comment here

@lilyball
Copy link
Contributor

lilyball commented May 4, 2015

@pnkfelix It certainly sounds like it should.

bors added a commit that referenced this issue May 9, 2015
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief,

* This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting".
* This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
* The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite.
* It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

## Known Issues

- [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests.
- [ ] Figure out how this PR affects rustc.
- [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).
@huonw
Copy link
Member

huonw commented May 31, 2015

More examples in #25895 and #7648.

@huonw
Copy link
Member

huonw commented May 31, 2015

triage: P-medium

Adopting the priority from #7648.

@rust-highfive rust-highfive added the P-medium Medium priority label May 31, 2015
@pnkfelix
Copy link
Member

This remains an issue.

My current theory is that while we greatly improved our float2decimal conversion, we have not yet done anything to address our float-parsing routines (i.e. decimal2float conversion).

Here is some new demo code that I think makes this pretty clear. (I got rid of the random number generation because frankly I don't know how to do that on nightly anymore without using cargo -- and in any case, these deterministic tests should be easy to understand.

https://play.rust-lang.org/?gist=7858323861f542d64e34

Update: I also need to review @kballard 's argument about about the printing of 0.6 under the {:.17} format; that may indicate that there remains something wrong in our printing routines even now.

@hanna-kruppe
Copy link
Contributor Author

I am working on dec2flt. I have transcribed the algorithms (R and Bellerophon) from Clinger's paper and implemented most helper routines that were missing. It compiles and said helper routines pass some tests but the whole thing is still missing some pieces before I can start running it against well-behaved inputs. I hope to finish this part next weekend.

Before I file a PR I also want to implement correct handling of overflowing, almost-overflowing, underflowing, subnormal and almost-subnormal numbers. This may turn out to be more tricky (the paper mentions this more as an afterthought) but I have a pretty clear plan of attack.

Regarding 0.6: No, 0.60...0 is correct. It's not what Python outputs for %.17f (and I'm not sure if this is done using Martin Gay's code or the System sprintf) but 0.60...0 is read back in correctly by Python. I can also promise that my dec2flt code will handle it as correctly as plain 0.6. It may not be more correct than 0.59... but it's equally correct.

@huonw
Copy link
Member

huonw commented Jun 1, 2015

Before I file a PR I also want to implement correct handling of overflowing, almost-overflowing, underflowing, subnormal and almost-subnormal numbers.

I suspect that we don't handle any of these properly now, so I would be very happy with a PR where these weren't horribly-wrong (even if not fully correct).

E.g. subnormals:

fn main(){
    let x = 1e-308 / 1e15;
    println!("{}", x == 0.0);
    println!("{:e}", x);
}
false
thread '<main>' panicked at 'called `Option::unwrap()` on a `None` value', /home/rustbuild/src/rust-buildbot/slave/stable-dist-rustc-linux/build/src/libcore/option.rs:362
playpen: application terminated with error code 101

@hanna-kruppe
Copy link
Contributor Author

To be clear, the current state of my code is that things go horribly wrong when fed such numbers. There may be a relatively easy fix that's somewhat correct, but I'm not sure and (now that I've spent quite a bit of time thinking about proper support) I don't think the bandaid will be significantly faster to implement than a proper solution. I'll just have to implement bignum long division and Algorithm M from the paper — of which I have already written a reference implementation in Python — with checks for the minimum and maximum exponent. Well, and set the right cutoff for when to use Algorithm M.

Aside: I'm sure you're aware but @lifthrasiir's code correctly handles subnormals, at least on this example. Only the old code has this problem and 1.0 stable (and thus play.rust-lang.org with default settings) still uses the old code.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 1, 2015

Executive summary: I mostly agree with @rukruppe about 0.6; much more detail below.


I wrote:

Update: I also need to review @kballard 's argument about about the printing of 0.6 under the {:.17} format; that may indicate that there remains something wrong in our printing routines even now.

@rkruppe responded:

Regarding 0.6: No, 0.60...0 is correct. It's not what Python outputs for %.17f (and I'm not sure if this is done using Martin Gay's code or the System sprintf) but 0.60...0 is read back in correctly by Python. I can also promise that my dec2flt code will handle it as correctly as plain 0.6. It may not be more correct than 0.59... but it's equally correct.

I agree that printing 0.60...0 may be "as correct as" 0.59..98 for the floating point representation of 6/10, depending on what one's goals are...

First, I wrote this Rust code to print out some ranges of decimal fractions corresponding to low and high points for the floating point values around the area of 0.6

https://play.rust-lang.org/?gist=a50a1eea69cc97be1d45

[deleted paragraph with reported conclusions that were based on code with unquantified floating point error]

UPDATE: Argh, the technique in the Rust code above is flawed; casting the numerators and denominators to floats and then dividing is simply a bad way to try to my hands on the exact ranges that are involved here... I will go back later and do it again with proper bignum-based fractions in the future.


Now, as for the question of what should be printed (since we seem to have a number of options available to us) ... obviously when you use {} then the shortest output possible is nice (perhaps even "required"). But when someone requests a certain number of decimals, I would usually opt to print a value as close as possible to the value that is actually represented by the floating point value.

Based on some interactions with a Scheme interpreter (my go-to choice for bignum stuff), the 64-bit float for 0.6 corresponds to the following base-10 decimal value:

0.59999999999999997779553950749686919152736663818359375

From this we can see why @kballard might expect to see 0.59999999999999998 from {:.17} -- that really is close to the represented value (assuming my Scheme code and runtime is accurate...)


So: Its possible that some people may be annoyed by our current flt2dec handling of specifications like {:.17}, and I may be curious enough to dive into the flt2dec code again to try to understand why we are not handling this as I would expect there ... but I would not call it a "bug" per se, I do not think.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 2, 2015

ps @rkruppe thanks for looking into this; I was about to roll up my sleeves to work on it yesterday, but i am happy to see you taking care of it. Let me know if you want a second set of eyes during debugging.

@pnkfelix
Copy link
Member

pnkfelix commented Jun 5, 2015

@rkruppe

I'll just have to implement bignum long division ...

any reason we cannot just cut-and-paste the old (slow-ish) bignum code that we used to have in the Rust repo into a private mod for these bignum implementation issues? The path that utilizes bignums is only exercised in exceptional cases, right?

  • I mean, yes, there is the unfortunate subsequent bloating of libstd; but we can work on reducing the amount of code size it introduces later...

I'm referring to the libnum crate removed in fb169d5 that currently lives in https://github.com/rust-lang/num


Update: Its possible I'm missing the point and that the abstractions provided by libnum does not suffice; still, seems like it might be a nice starting point...

@hanna-kruppe
Copy link
Contributor Author

Decimal parsing and float formatting lives in libcore, i.e., can't use dynamic memory allocation. That alone already rules out using libnum verbatim. If I dug deeper, I'd probably find several other reasons, but it's a moot point since @lifthrasiir has already implemented almost all functionality I need (and all that flt2dec needs).

FYI, while division is only needed in exceptional cases, most code paths do require a bignum to some degree (subtraction, comparison, multiplication by powers of two and ten). I realized just now how it can be avoided on some cases (fewer than 15 decimal digits and the absolute decimal exponent is less than 23) but it's still required even for some relatively common cases.

Maybe libnum's code for division is a good starting point algorithmically, and I'll make sure to look at it in more depth before I dive into implementing division, but from a cursory glance it seems like importing the code and whipping it into something working may be more trouble than writing division from scratch.

@lilyball
Copy link
Contributor

lilyball commented Jun 5, 2015

Just to note, I think that {:17} printing 0.60000000000000000 is explicitly wrong. Yes, typing that as a float literal will yield the exact same value that typing 0.59999999999999998 does, which is a perfectly fine rationale for why {} should print 0.6. But when there's multiple possible string representations at a given precision that evaluate to the same value, it should print the most accurate one. Printing something that's not strictly accurate is only a reasonable choice when this results in printing fewer digits

Which is to say, the behavior of printing some value that's not strictly accurate but evaluates to the same value when parsed as an IEEE floating-point value is behavior that's only appropriate when precision has not been specified. When output precision has been specified, it should always print based on the precise floating-point representation.

@hanna-kruppe
Copy link
Contributor Author

I hope to finish this part next weekend.

Famous last words, heh.

That said, I also did a lot of other clean up and refactoring and testing, confirmed a couple of bugs I'd long suspected, and addressed the underlying issues satisfactorily. For example, up-front input validation is now pretty comprehensive and prevents any way I can think of to cause integer (including bignum) overflow during the conversion.

As for actual functionality: the code paths for nice inputs work now and passed a couple of simple tests. I'm not willing to declare it correct yet, but I'm hopeful. This is only for f64 as of now (of course I can parse into f64 and cast that to f32 and indeed I do that, and have plenty of test cases where it fails). Still missing, in order:

  • AlgorithmM. This is mostly blocked on bignum divmod. I know when to call it, I just have to write the damn thing.
  • Refactor everything so that f32 parsing works (correctly).
  • Many many more tests.
  • In particular, unit/regression tests in libcoretest. All tests so far are of the "take several minutes to try millions of inputs" kind. Those are necessary, but they can't reasonably be run by bors on every merge.
  • Get an idea of the performance, especially for common inputs. There are a few easy optimizations that I'm itching to try out. (Yeah, I also want to land this feature soon, but if I don't do this now, I risk putting it off forever.)
  • Actually integrating the code into libcore. It's currently an independent crate for much faster edit-compile-run cycles.

I won't make any predictions, but I feel I've solved all the hard problems and the rest is polishing and making sure I haven't overlooked any hard problems.

@pnkfelix
Copy link
Member

@rkruppe can you share a pointer to your code? Maybe I can assist with some of the steps, like integration into libcore, or a libcore-based implementation of AlgorithmM/divmod ?

@hanna-kruppe
Copy link
Contributor Author

My code is now at https://github.com/rkruppe/rust-dec2flt (virtually no git history though). The modifications to @lifthrasiir's bignum code are in my rust fork in the dec2flt branch.

@pnkfelix Thanks for the offer, but splitting the work between two people seems tough. None of the steps are particularly hard in isolation and most depend on earlier steps. For example, half of the testing can't be done before f32 is properly supported, and integrating into libcore before the code is 99% done would just slow down the edit-compile-test cycle significantly.

That said, additional test cases are always good and the optimizations should be independent from everything else. If either interests you, take a look at what's there and submit a PR. Since you'll be the ideal reviewer for the PR, even reading through the code will be valuable. And if you notice a bug, a possible optimization, a way to simplify the code, etc. — great!

@hanna-kruppe
Copy link
Contributor Author

Quick FYI, AlgorithmM is implemented and everything works with f32. I ran so many different tests and hammered out so many bugs that I'll going out on a limb and declare it functionally complete. Now to clean up the code some more, optimize it a little bit more, and write even more tests to ease my paranoia! Here's hoping this won't take another two weeks 😄

bors added a commit that referenced this issue Aug 13, 2015
Completely rewrite the conversion of decimal strings to `f64` and `f32`. The code is intended to be absolutely positively completely 100% accurate (when it doesn't give up). To the best of my knowledge, it achieves that goal. Any input that is not rejected is converted to the floating point number that is closest to the true value of the input. This includes overflow, subnormal numbers, and underflow to zero. In other words, the rounding error is less than or equal to 0.5 units in the last place. Half-way cases (exactly 0.5 ULP error) are handled with half-to-even rounding, also known as banker's rounding.

This code implements the algorithms from the paper [How to Read Floating Point Numbers Accurately][paper] by William D. Clinger, with extensions to handle underflow, overflow and subnormals, as well as some algorithmic optimizations.

# Correctness

With such a large amount of tricky code, many bugs are to be expected. Indeed tracking down the obscure causes of various rounding errors accounts for the bulk of the development time. Extensive tests (taking in the order of hours to run through to completion) are included in `src/etc/test-float-parse`: Though exhaustively testing all possible inputs is impossible, I've had good success with generating millions of instances from various "classes" of inputs. These tests take far too long to be run by @bors so contributors who touch this code need the discipline to run them. There are `#[test]`s, but they don't even cover every stupid mistake I made in course of writing this.

Another aspect is *integer* overflow. Extreme (or malicious) inputs could cause overflow both in the machine-sized integers used for bookkeeping throughout the algorithms (e.g., the decimal exponent) as well as the arbitrary-precision arithmetic. There is input validation to reject all such cases I know of, and I am quite sure nobody will *accidentally* cause this code to go out of range. Still, no guarantees.

# Limitations

Noticed the weasel words "(when it doesn't give up)" at the beginning? Some otherwise well-formed decimal strings are rejected because spelling out the value of the input requires too many digits, i.e., `digits * 10^abs(exp)` can't be stored in a bignum. This only applies if the value is not "obviously" zero or infinite, i.e., if you take a near-infinity or near-zero value and add many pointless fractional digits. At least with the algorithm used here, computing the precise value would require computing the full value as a fraction, which would overflow. The precise limit is `number_of_digits + abs(exp) > 375` but could be raised almost arbitrarily. In the future, another algorithm might lift this restriction entirely.

This should not be an issue for any realistic inputs. Still, the code does reject inputs that would result in a finite float when evaluated with unlimited precision. Some of these inputs are even regressions that the old code (mostly) handled, such as `0.333...333` with 400+ `3`s. Thus this might qualify as [breaking-change].

# Performance

Benchmarks results are... tolerable. Short numbers that hit the fast paths (`f64` multiplication or shortcuts to zero/inf) have performance in the same order of magnitude as the old code tens of nanoseconds. Numbers that are delegated to Algorithm Bellerophon (using floats with 64 bit significand, implemented in software) are slower, but not drastically so (couple hundred nanoseconds).

Numbers that need the AlgorithmM fallback (for `f64`, roughly everything below 1e-305 and above 1e305) take far, far longer, hundreds of microseconds. Note that my implementation is not quite as naive as the expository version in the paper (it needs one to four division instead of ~1000), but division is fundamentally pretty expensive and my implementation of it is extremely simple and slow.

All benchmarks run on a mediocre laptop with a i5-4200U CPU under light load.

# Binary size

Unfortunately the implementation needs to duplicate almost all code: Once for `f32` and once for `f64`. Before you ask, no, this cannot be avoided, at least not completely (but see the Future Work section). There's also a precomputed table of powers of ten, weighing in at about six kilobytes.

Running a stage1 `rustc` over a stand-alone program that simply parses pi to `f32` and `f64` and outputs both results reveals that the overhead vs. the old parsing code is about 44 KiB normally and about 28 KiB with LTO. It's presumably half of that + 3 KiB when only one of the two code paths is exercised.

| rustc options                 | old       | new       | delta         |
|---------------------------    |---------  |---------  |-----------    |
| [nothing]                     | 2588375   | 2633828   | 44.39 KiB     |
| -O                            | 2585211   | 2630688   | 44.41 KiB     |
| -O -C lto                     | 1026353   | 1054981   | 27.96 KiB     |
| -O -C lto -C link-args=-s     | 414208    | 442368    | 27.5 KiB      |

# Future Work

## Directory layout

The `dec2flt` code uses some types embedded deeply in the `flt2dec` module hierarchy, even though nothing about them it formatting-specific. They should be moved to a more conversion-direction-agnostic location at some point.

## Performance

It could be much better, especially for large inputs. Some low-hanging fruit has been picked but much more work could be done. Some specific ideas are jotted down in `FIXME`s all over the code.

## Binary size

One could try to compress the table further, though I am skeptical. Another avenue would be reducing the code duplication from basically everything being generic over `T: RawFloat`. Perhaps one can reduce the magnitude of the duplication by pushing the parts that don't need to know the target type into separate functions, but this is finicky and probably makes some code read less naturally.

## Other bases

This PR leaves `f{32,64}::from_str_radix` alone. It only replaces `FromStr` (and thus `.parse()`). I am convinced that `from_str_radix` should not exist, and have proposed its [deprecation and speedy removal][deprecate-radix]. Whatever the outcome of that discussion, it is independent from, and out of scope for, this PR.

Fixes #24557
Fixes #14353

r? @pnkfelix

cc @lifthrasiir @huonw 

[paper]: http://citeseerx.ist.psu.edu/viewdoc/summary?doi=10.1.1.45.4152
[deprecate-radix]: https://internals.rust-lang.org/t/deprecate-f-32-64-from-str-radix/2405
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
P-medium Medium priority
Projects
None yet
Development

Successfully merging a pull request may close this issue.

8 participants