Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pretty print isn't pretty for floats #29472

Closed
therustmonk opened this issue Oct 30, 2015 · 18 comments
Closed

Pretty print isn't pretty for floats #29472

therustmonk opened this issue Oct 30, 2015 · 18 comments
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@therustmonk
Copy link
Contributor

The following code:

fn main() {
    println!("{}", 1.2999999999999999);
    println!("{:?}", 1.2999999999999999);
    println!("{:.3}", 1.2999999999999999);
    println!("{:.*}", 5, 1.2999999999999999);

    // aren't pretty too
    println!("{:#}", 1.2999999999999999);
    println!("{:#?}", 1.2999999999999999);
}

prints:

1.2999999999999998
1.2999999999999998
1.300
1.30000
1.2999999999999998
1.2999999999999998

C behaviour:

#include <stdio.h>
void main() {
  double f = 1.2999999999999999999;
  printf("%.14g", f);
}

prints:

1.3

With Rust everything works like f, but not like g.

Is that really no way to get behaviour like g option of printf?
It's highly important feature for serialization libs.

rustc 1.6.0-nightly (8ca0acc 2015-10-28)

@therustmonk therustmonk changed the title Pretty print isn't pretty as expected Pretty print isn't pretty for floats Oct 30, 2015
@hanna-kruppe
Copy link
Contributor

1.2999999999999999 != 1.3. Pretty printing should not be inaccurate by default. However, we've long wanted a mode that prints the shortest exact literal, see #24612 for some discussion. The difference to the default behavior would be that it would use exponential notation when that gives a shorter result. This mode should probably accept a precision .N and use at most this many digits, i.e. omit the pointless zeros that can be seen in your example. Then one can opt-in into short, pretty, slightly inaccurate outputs.

Aside:

It's highly important feature for serialization libs.

Wouldn't serialization libraries want accuracy more than pleasing human readers?

@therustmonk
Copy link
Contributor Author

Thank you for conception.

Wouldn't serialization libraries want accuracy more than pleasing human readers?

Not by humans. But problem becoming little deeper... I have some Lua code, which contains parameters that numeric constants. I used to pure Lua to JSON serializer which calls Lua's tostring that calls sprintf inside. It produces 0.1 to 0.1. But when I take parameter directly from Lua state as numeric (= C double) and serialize it I get 0.10000000000000001 value in JSON. I don't do any calculations inside Lua or Rust with that constant but values differs (Maybe there is dirty value transformation during interpreting Lua code, because constant has inaccurate value inside Lua VM). sprintf + g solves it.

#24612 doesn't provide a twin for g option. Is there another activity in this direction?

@hanna-kruppe
Copy link
Contributor

0.10000000000000001 equals 0.1, so accuracy is not a reason to print the former. I can only assume your JSON serializer is being stupid, possibly aided by Lua indeed mangling the value internally. In any case, Rust's float formatting will never print 0.10000000000000001, it will only use as many decimal digits as necessary, with the aforementioned caveat that it isn't allowed to use exponential notation. Of course the latter is still a big problem if you have numbers as large as 1e300.

No, AFAIK there hasn't been any activity on a %g-analogue. Several people including myself want it but nobody wants it enough to implement it themselves. Perhaps this issue will change that.

@therustmonk
Copy link
Contributor Author

I've found better example:

fn main() {
    let mut x = 0.1;
    for z in (1..100) {
        x = x + 0.01;
        println!("{}", x);
    }
}

it prints:

0.11
0.12
0.13
0.14
0.15000000000000002
0.16000000000000003
0.17000000000000004
0.18000000000000005
0.19000000000000006
0.20000000000000007
0.21000000000000008
0.22000000000000008

That's really terrible print )
It seems other languages use g as default print strategy. What do you think about it?

@hanna-kruppe
Copy link
Contributor

This is an example of floating point arithmetic being inaccurate. The number really is closer to 0.22000000000000008 than to 0.22 and trying to hide that doesn't do a service to anyone:

  • it misleads people who try to find out how two numbers could possibly be not equal
  • it is a hazard for serialization, because it rounds during an operation that one might not expect to round
  • it doesn't confront float newbies with the reality of float arithmetic (which they will run into)

Besides, I do not buy that other languages do round by default. Here's Java, JavaScript, Python, Ruby giving identical or very similar output. Either %g does not do what you think it does, or these languages do not use %g.

Also, changing the default is tricky because lots of programs rely on the exact current behavior. Rounding the output might not break any programs (though it will make some of them less accurate, if they read the output back in), but as explained above I am opposed to that anyway.

@therustmonk
Copy link
Contributor Author

I think you are right, 0.22000000000000008 is value we have in memory, and it's bad idea to change its representation. But using another formatting library is uncomfortable to print nice-looking values.

@hanna-kruppe
Copy link
Contributor

I absolutely agree with you that there should be a formatting operator that is prettier and can also round to N decimal digits while remaining pretty. I just don't want it to round by default.

@pnkfelix
Copy link
Member

cc me

@roquendm
Copy link

roquendm commented Nov 2, 2015

If you want to error free load/store floating point via text files and target multiple languages...use hexadecimal. Using decimal is a recipe for pain.

@hanna-kruppe
Copy link
Contributor

@roquendm

It's true that decimal representation of binary floats has many downsides, but unfortunately it is also very convenient for humans. Decimal output must exist, might as well make it as pretty as possible while remaining accurate — to say nothing of output for end users that is never read back in. And the really hard part of the implementation is already solved, loading and storing floating point number is already correct in Rust today. It may have to round when first reading decimal input, but every subsequent load-store cycle preserves the bits exactly.

@pnkfelix
Copy link
Member

pnkfelix commented Nov 2, 2015

@rkruppe I had assumed that the suggestion to use hexadecimal was a response to the scenario given above where one is trying to serialize data between Lua and Rust. (But then again, it was unclear whom @roquendm 's comment was actually directed at... the "and target multiple languages" part was what made me think the text was targetted at @deniskolodin )

@pnkfelix
Copy link
Member

pnkfelix commented Nov 2, 2015

#24612 doesn't provide a twin for g option. Is there another activity in this direction?

The discussion on #24612 did (in a comment) mention how we might want to make Display ({}) and Debug ({:?}) outputs differ in their semantics. Display would be a rough analogue to g and Debug would continue to mean f.

But its not clear how far we would actually want to go with this. It would probably better to allocate a separate format modifier for this purpose, e.g. {:g}. (What did g stand for, anyway? I've always assumed it was chosen just because its the next letter after f, but that won't suffice for selecting a name for a std::fmt trait

@hanna-kruppe
Copy link
Contributor

Since # is the "pretty print" operator for some kinds of output (e.g. {:#?}) and in all other case at least selects an alternative format ({:#x} for example), I think {:#} and {:#?} would be an appropriate spelling. I do not know if this is what @deniskolodin wants, but in any case reading his posts gave me this thought.

@roquendm
Copy link

roquendm commented Nov 2, 2015

Indeed my comment was "if one wants to (as simply as possible) be bit-exact across all languages which might read/write the data in a text based format" then use hex floats,

@huonw huonw added I-nominated T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Jan 6, 2016
@huonw
Copy link
Member

huonw commented Jan 6, 2016

Nominating to discuss:

  1. making Display and Debug differ
  2. using # as the "pretty"/%g version
  3. adding {:g} for prettier output (the main benefit of this IMO, is automatically choosing between exponential and non-exponential notation, in addition to the already-discussed fact it doesn't try to round-tripping)

I'm not a huge fan of 1, but 2 doesn't seem crazy... although 3 and/or people just using e.g. {:.3} seems less subtle.

@therustmonk
Copy link
Contributor Author

I see in the doc that: # - is alternative form of printing the same value. Maybe we don't have to add round surprise to alternate (#) specifier. But g also is not a format trait because value changes.

What about to use - sign? It isn't used now, but semantic hints us to substraction or shrinking.

println!("{:-}", 1.2999999999999999);
// 1.3
println!("{:-?}", 1.2999999999999999);
// 1.3

@hanna-kruppe
Copy link
Contributor

Display and Debug already differ for signed zeros. However, that's the only difference and as I already expressed my opposition for either {} or {:?} to round.

(the main benefit of this IMO, is automatically choosing between exponential and non-exponential notation, in addition to the already-discussed fact it doesn't try to round-tripping)

I'm not sure what you're saying here, can't the # route also take the liberty to switch between exponential and decimal notation?

@deniskolodin {:-} is already a valid format specifier, though it's currently "not used" (which seems to mean "ignored"). However, as + is currently used for something related to the sign of the output, using - for something completely unrelated is pretty confusing IMHO.


Another thing that bugs me about a direct %g-analogue is the question of how many digits it should round to by default. C specifies 6 according to cppreference.com, but that seems completely arbitrary and it seems unlikely that this is the ideal value for most use cases. I would rather have the option (whether it be :#, :g, or something else) by itself only enable the dynamic switch to exponential notation. If additionally a precision is specified via .N, then:

  • round to so many digits, obviously
  • strip trailing zeros, if any
  • still switch to exponential, only now the threshold depends on the precision

This is pretty much what C's %.Ng does. The default would still be accurate and thus use up to ~17 decimal digits --- if you want less, you can specify it and (unlike with {:.N}) still get decent output if rounding was unnecessary or the number is huge or tiny.

NB: I would also prefer ordinary {:.N} to strip trailing zeros, though I can see the guarantee of "exactly N decimal digits" be useful in some contexts.

@aturon aturon removed the I-nominated label Jan 8, 2016
@aturon
Copy link
Member

aturon commented Jan 8, 2016

Libs team consensus: any of the changes being discussed here should go through the RFC process. The team would be wiling to consider an RFC for any of the options @huonw mentioned, but would be particularly keen for either option 2 or 3.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

7 participants