Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default float formatting is too narrow-minded #24556

Closed
hanna-kruppe opened this issue Apr 18, 2015 · 12 comments
Closed

Default float formatting is too narrow-minded #24556

hanna-kruppe opened this issue Apr 18, 2015 · 12 comments
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one.

Comments

@hanna-kruppe
Copy link
Contributor

Currently, {} and {:?} always prints at most 6 digits after the decimal point, and never uses scientific exponential notation (1.23e6) regardless of how large the number is.

This has several undesirable consequences:

  • It introduces significant error when serializing numbers and reading them back (even ignoring Float printing and/or parsing is inaccurate #24557), making it the wrong default for machine-readable output.
  • It can produce extremely long strings of digits (up to three hundred), which hampers its usefulness for quick listings to be scanned by humans.
  • Newcomers are not confronted with the fact that floating point numbers are not real/rational/decimal numbers. 0.1 + 0.1 + 0.1 is not the same float as 0.3, but the default option does not reflect that.
  • For Debug specifically, it hides important differences (like the above 0.1 + 0.1 + 0.1 vs 0.3) which can mislead even experienced programmers when debugging, or at least force them to use unsightly formatting codes like {.17} (which is harder to read).

There are smarter algorithms that do better on all accounts (and also fix the accuracy issues reported in #24557). See e.g. Python 3.1+ (last bullet point, also back-ported to 2.7). It would be a great boon for people whose debugging consists of staring at the results of float calculations if Rust did the same thing.

These algorithms search for, roughly, the shortest string that reproduces the number exactly (bit for bit identical) when read back with an accurate parser. They also use scientific exponential notation when appropriate. We probably don't have accurate parsing either (again, see #24557) but this issue is not about that.

I propose adopting such an algorithm. This would lead to the following differences:

  • It will sometimes include more than six decimal digits. However, it will not include more digits than necessary, so it's better than {:.17} (this makes round trip-safe outputs easier to eyeball).
  • It will include fewer, or no, decimal digits for very large numbers.
  • It will use scientific notation for very small and very large numbers, rather than attempting to give them with ludicrous precision.
  • This is not strictly a property of these algorithms, but other languages do it and I think Rust will want it too: It always includes a decimal point, even for floats that happen to have no fractional part.

The combination of these changes mean a good balance between accuracy and not overly burdening readers. Debug and Display can still differ on minor details like whether negative zero is printed with a minus sign (#20596), but the changes listed above should apply to both. Exponential scientific notation is by no means exclusive to programmers, it's used by many calculators (hardware and software) for example, and printing a hundred digits is hardly very user friendly either.

Implementation

Python uses Martin Gay's algorithm (and his C implementation of the same), which by all accounts is incredibly complicated and complex --- porting it won't be fun. It also needs memory allocation, which disqualifies it from core (I'm sure there is an upper bound on how many bits it needs, but identifying that bound would be yet another porting hurdle).

Florian Loitsch's Grisu algorithm(s) should be doable (with the caveat that I'm only halfway through the paper myself). Grisu3 "gives up" on about 0.5% of all possible floats, but IIUC Grisu and Grisu2 can handle those, they just doesn't guarantee finding the shorted possible string, an acceptable trade off IMHO. I'm not sure whether using only Grisu2 gives the same result as Grisu3 on the floats the latter does handle, but if so, that would simplify the implementation.

There is an existing implementation (that uses only core) of Grisu3 in rust-strconv by @lifthrasiir --- anyone interested in working on this should get in touch with the author. I'm not sure if we'd want to import that implementation wholesale (even assuming the author's cooperation) though: It falls back to Dragon4 for the numbers Grisu3 can't handle, so it's rather more code (and more complicated) than the Grisu2-only option.

There are some open question though: How should formatting options like .N and LowerExp and UpperExp be handled? Can Grisu handle these, or do we need to build on top of it? I don't care that much about perfect accuracy for these (since they round anyway), so this may be easier than we expect.

cc @rprichard

@lifthrasiir
Copy link
Contributor

FYI, I'm already working on the flt2dec patch (but was a bit busy for recent weeks).

I have all necessary tools for formatting, including precision customization (see src/flt2dec/mod.rs helper functions). I believe that this should make the resulting patch smaller, but I'm yet to do that.

@rprichard
Copy link
Contributor

This is not strictly a property of these algorithms, but other languages do it and I think Rust will want it too: It always includes a decimal point, even for floats that happen to have no fractional part.

I think this is a good idea for the default float->string behavior -- it has better interoperability properties. I opened issue #1075 for it in the rfcs repo.

@lilyball
Copy link
Contributor

It will sometimes include more than six decimal digits. However, it will not include more digits than necessary, so it's better than {:.17} (this makes round trip-safe outputs easier to eyeball).

Presumably saying {:.17} will still produce 17 decimal digits even if they're all 0, right?

I just suggested in a comment on #1074 that we might want to consider adding a format modifier :f that produces a "simpler" display, e.g. no - on -0.0 and trimming off all trailing zeroes (including the .0 if there's no fractional part). With that modifier, {:.6f} would only print up to 6 digits of precision, but would print fewer if it would otherwise print trailing zeroes.

@bombless
Copy link
Contributor

I suggest we use Grisu2 for Debug printing f32/f64.
We can add a {:g} format for 1.0, and let's do it right in 1.1
Here's the weird behavior we have today:

fn main() {
    let a = 0.57 * 100f64;
    let b = 57f64;
    assert_eq!(format!("{:?}", a), format!("{:?}", b));
    assert!(a != b)
}

@pnkfelix
Copy link
Member

@bombless we're already planning to land the Grisu3 PR (#24612 ), but it is not planned to be part of 1.0. I cannot tell if that matches with what you were promoting or not; I just thought I'd mention the plan-of-record.

Update: Fixed text above to say "Grisu3" instead of "Grisu2".

@hanna-kruppe
Copy link
Contributor Author

To be clear, the PR adds Grisu3 + Dragon4, not Grisu2. That is, it always finds the shortest representation at the cost of more code (and needing significantly more time and memory on about 0.5% of all possible floats). In the issue, I promoted Grisu2 in an attempt to keep implementation effort a bit lower, not knowing that @lifthrasiir already solved all problems and nobody worries about the extra code (and some of that code is necessary for future float parsing work anyway).

tl;dr Forget about Grisu2 and celebrate Grisu3!

@lilyball
Copy link
Contributor

How does Grisu3 + Dragon4 compare to Burger & Dybvig? This is basically the same thing as David Gay's algorithm, but was developed separately, and apparently differs slightly. More importantly, David Gay's C implementation is "mind-numblingly complex" and was written to prioritize speed over everything else, but the algorithm described in that paper is apparently significantly simpler (and presumably performs well, just isn't as fast as David Gay's C implementation). The paper also includes a pretty simple Scheme code implementation (of the floating-point algorithm; it describes changes to the algorithm for fixed-point high-precision that is faster but doesn't give Scheme code for it)

Quoted from this python-list email:

As the paper says at the end,

[David] Gay ... showed that floating-point arithmetic is sufficiently
accurate in most cases when the requested number of digits is small.
The fixed-format printing algorithm described in this paper is useful when
these heuristics fail.

IOW, the point of Burger & Dybvig was to run faster than the
algorithms in the earlier Steele & White paper, but David Gay's code
usually beats everything on speed (if you don't care about speed,
code for this task is quite simple; if you do care about speed, it's
mind-numbingly complicated), so there's little incentive to implement
this algorithm.

So David Gay's algorithm is generally faster than Burger & Dybvig but I guess is too complex to be worth reimplementing. But this doesn't tell me how Burger & Dybvig's algorithm compares to Grisu3 + Dragon4 (both of which I know nothing at all about).

I'm also not sure offhand whether Burger & Dybvig's algorithm requires heap allocation.

@lilyball
Copy link
Contributor

(And of course, how does it compare to Grisu2?)

@hanna-kruppe
Copy link
Contributor Author

Dragon4 is the algorithm of Burder and Dybvig. I didn't find the name in the paper but it's stated elsewhere that this is a semi-official name, even attributed to the authors.

As for comparisons:
All Grisu variants are, by the benchmarks in the paper, several times faster than Dragon4 and the sprintf of glibc (version 2.11): three to six seconds per thousand numbers, versus 22 to 61 seconds per thousand numbers for sprintf and Dragon4. While there could be doubts about how applicable and well-done these benchmarks are, some speed-up or at worst competitive performance is to be expected.
If the non-optimized algorithm from Gay's paper is similar to Dragon4 in principle (bignum arithmetic and order of magnitude of performance), then I would it to have a similar relationship to Grisu. I have no idea how Gay's optimized C code would fare, but porting that is a pipe dream anyway. Especially since there is already a perfectly good implementation submitted as PR. Even if Gay's C code is faster, and remains faster after being ported to Rust, the gains will probably not be worth the effort and maintenance cost.

Grisu3 is very clever, simple and fast. Read the paper, its notation is a mess but it's enlightening nevertheless.

@lifthrasiir
Copy link
Contributor

I think others already have explained the situation well, but to recap:

  • Dragon4 (aka Burger and Dybvig) is a bignum-based algorithm, is slow but always produces the shortest and accurate result. It is particularly slow when the absolute magnitude of the number printed is very large or very small.
  • David Gay's dtoa.c is a hybrid algorithm using bignum and various optimizations, and is known to produce the shortest and accurate result (I'm not sure this has been proved). To my knowledge, dtoa is faster than Dragon4 for typical workloads but ultimately has a similar worst-case behavior. It is also considerably complex, so porting it is infeasible.
  • Grisu2 is an approximate algorithm using an 1K-sized lookup table [1] and constant space. It produces the shortest and accurate result for more than 99% of input. It is very fast.
  • Grisu3 is an approximate algorithm using the same table and constant space. It is similar to Grisu2 but can fall back to the other accurate algorithm if needed. Therefore Grisu3 is best paired with Dragon4, and remains very fast for most inputs.

[1] A variant of this table is briefly mentioned in the Grisu paper: "We will base future evolutions of Grisu on digit-get-mix with α,γ = −59,−32." After careful consideration V8 and Go stdlib (and also my PR) uses α = −60 instead, but this doesn't change the table size much.

Performance-wise, Grisu2 is the best, with Grisu3 + Dragon4 closely following. Dragon4 is much slower but has a simpler and more verifiable implementation (both glibc and msvcrt uses it). The merit of dtoa.c has been decreased over time, and the recent benchmark confirms this. (For your information, in this benchmark doubleconv is V8's Grisu3 + Dragon4, and milo is a hand-tuned Grisu2.)

Space-wise, all algorithm can be implemented without dynamic memory allocation. The bignum is limited in size so a fixed-size array is enough (and this is how I've implemented Dragon4). The lookup table size of 1K in Grisus might seem a lot, but this is easily overshadowed by binary size (in my PR, about 30K).

bors added a commit that referenced this issue May 9, 2015
This is a direct port of my prior work on the float formatting. The detailed description is available [here](https://github.com/lifthrasiir/rust-strconv#flt2dec). In brief,

* This adds a new hidden module `core::num::flt2dec` for testing from `libcoretest`. Why is it in `core::num` instead of `core::fmt`? Because I envision that the table used by `flt2dec` is directly applicable to `dec2flt` (cf. #24557) as well, which exceeds the realm of "formatting".
* This contains both Dragon4 algorithm (exact, complete but slow) and Grisu3 algorithm (exact, fast but incomplete).
* The code is accompanied with a large amount of self-tests and some exhaustive tests. In particular, `libcoretest` gets a new dependency on `librand`. For the external interface it relies on the existing test suite.
* It is known that, in the best case, the entire formatting code has about 30 KBs of binary overhead (judged from strconv experiments). Not too bad but there might be a potential room for improvements.

This is rather large code. I did my best to comment and annotate the code, but you have been warned.

For the maximal availability the original code was licensed in CC0, but I've also dual-licensed it in MIT/Apache as well so there should be no licensing concern.

This is [breaking-change] as it changes the float output slightly (and it also affects the casing of `inf` and `nan`). I hope this is not a big deal though :)

Fixes #7030, #18038 and #24556. Also related to #6220 and #20870.

## Known Issues

- [x] I've yet to finish `make check-stage1`. It does pass main test suites including `run-pass` but there might be some unknown edges on the doctests.
- [ ] Figure out how this PR affects rustc.
- [ ] Determine which internal routine is mapped to the formatting specifier. Depending on the decision, some internal routine can be safely removed (for instance, currently `to_shortest_str` is unused).
@lifthrasiir
Copy link
Contributor

Now fixed by #24612.

@bstrie
Copy link
Contributor

bstrie commented May 10, 2015

Closed by #24612.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-enhancement Category: An issue proposing an enhancement or a PR with one.
Projects
None yet
Development

No branches or pull requests

8 participants