Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Default floating-point formatting does not produce shortest outputs; mismatch with std::format #3649

Open
jk-jeon opened this issue Sep 18, 2023 · 8 comments

Comments

@jk-jeon
Copy link
Contributor

jk-jeon commented Sep 18, 2023

As far as I understand, the default formatting option should produce the shortest output, not just in the number of significand digits, but also in the number of actual characters. At least that seems to be how std::format is specified, according to the std::to_chars specifications.

However, it seems currently fmt picks the fixed-point format whenever the exponent is between -4 and 16, regardless of the number of characters it will produce:

const int exp_lower = -4, exp_upper = 16;

Is this an intended divergence? Or maybe I misunderstood how std::format is specified?

For what it's worth, it seems MS STL implementation of std::format does what I described.

@vitaut
Copy link
Contributor

vitaut commented Sep 24, 2023

fmt::format is modeled after Python's str.format where shortest refers to the precision, not the full output. std::format diverged a bit because it was specified in terms of to_chars.

@jk-jeon
Copy link
Contributor Author

jk-jeon commented Sep 26, 2023

I honestly feel like the shortest string is what people may expect, but that's of course just a subjective opinion. If you are going to change the behavior (or accept a PR that does so) in the future, it would be great. If not, please feel free to close this, but I think this difference needs to be documented anyway in places like https://fmt.dev/dev/api.html#compatibility-with-c-20-std-format.

@vitaut
Copy link
Contributor

vitaut commented Sep 30, 2023

I am open to PRs to address this backed by more analysis of the effects of the change and concrete examples.

@scurest
Copy link

scurest commented Feb 10, 2024

Note that this also results in the rather surprising (to me) behavior that eg 123456792.0f formats as "123456790", the last digit apparently being wrong. But these roundtrip to the same float and 123456790 is shorter in the sense of having fewer sigfigs.

std::to_chars formats it as 123456792.

@vitaut
Copy link
Contributor

vitaut commented Feb 10, 2024

This is unrelated and I am surprised that to_chars produces "garbage" digits in this case.

@jessey-git
Copy link

Why is that "garbage" in this case? That value is perfectly representable as a float. Here's a nicely formatted sweep of some values for example: https://godbolt.org/z/a3Y8r1v6K

Is there a way to control the number of digits that rounds in this particular case, and without exponential notation, or should this be filed as another issue altogether?

@vitaut
Copy link
Contributor

vitaut commented Feb 10, 2024

That's the term they used in Grisu paper. You can control precision, so there is no issue here.

@jk-jeon
Copy link
Contributor Author

jk-jeon commented Feb 12, 2024

So this seems to be because std::to_chars is specified in terms of the number of characters, not the number of decimal digits. 123456784 and 123456780 are both of the shortest length, but the former is closer to the true value, so the implementation faithfully following the std spec must print the former.

So... this is interesting... we may need to look at what std::to_chars implementers have done if we ever want this behavior to be implemented in fmt.

EDIT:
Here is the relevant code from microsoft/STL:

https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1368
https://github.com/microsoft/STL/blob/192a84008a59ac4d2e55681e1ffac73535788674/stl/inc/xcharconv_ryu.h#L1406

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants