Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

properly fix #13196: json serialization with option for lossless roundtrip #13364

Closed

Conversation

timotheecour
Copy link
Member

@timotheecour timotheecour commented Feb 8, 2020

this properly fixes #13196 without affecting how nim prints/stringifies floats, so that:
echo 0.6 is now 0.6 again, unlike after #13276

echo 0.6 # 0.6
import std/json
echo(%* 0.6) # 0.59999999999999998

future PR's

implement https://github.com/ulfjack/ryu to keep short representations eg for json => see #13365

@timotheecour timotheecour changed the title [pending #13363] properly fix #13196: json serialialization lossless roundtrip properly fix #13196: json serialialization lossless roundtrip Feb 8, 2020
Copy link
Contributor

@dom96 dom96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

lib/pure/json.nim Outdated Show resolved Hide resolved
@Araq
Copy link
Member

Araq commented Feb 8, 2020

Let's instead use ryu asap and backport it.

@timotheecour
Copy link
Member Author

PTAL

I've turned precision from compiletime to runtime (compiletime being "sticky" did not lead to a good api)

relevant json apis now accept a precision parameter, with 2 obvious choices:

  • the default (same as prior to this PR) is precision = precisionDefault = -1, which keeps 0.4 as 0.4 (and is not roundtrip safe but will be after ryu)
  • and precision = precisionRoundtrip = 17 which is roundtrip safe (but not compact for 0.4)
  • or whatever other value is obviously fine too; which could be useful in some applications

so users can decide for their own application, and there is no breaking change.

Let's instead use ryu asap and backport it.

this could take an unspecified amount of time (ryu is not exactly simple, look at their source code); plus it should work for all supported targets (vm, c, js probably too); vm likely will use a vmcallback for sure.

in the meantime, this PR works right now for ppl needing roundtrip safety as an option, and it'll be by default too once ryu (#13365) is implemented

@Araq
Copy link
Member

Araq commented Feb 9, 2020

But this is not a quick fix. It's an API extension and you use c_sprintf twice, so everything runs a little slower.

@timotheecour
Copy link
Member Author

But this is not a quick fix. It's an API extension

reasonable IMO

and you use c_sprintf twice, so everything runs a little slower.

fixed, found a much better way

@krux02
Copy link
Contributor

krux02 commented Feb 9, 2020

Since you were so eager to revert my PR where your reasoning included this complaint:

this is bad for interop with other languages; eg it can blow up size of json messages, etc

May I complain here with "it can blow up size of json messages", too?

And no I disagree, this is not a proper fix #13196. Unlike your PR here, I analyzed what the root cause for #13196 is. And the root cause for it is lossy string conversion of floating point numbers. I fixed that. If the default stringification is lossy, then it is no surprise that lossy stringification is used where it shouldn't be used, for example in json. So I made sure that the default string representation is always lossless, and the json issue was fixed as a consequence unlike this fix which just fixes the symptom. Of course that breaks tests that expect a very specific string representation of a floating point number. But the string representation of a floating point number isn't unique, and people know this. If people still insist on wring tests that depend on a very specific string representation of a floating point number, then those are badly written tests that should be fixed. Breaking those tests is a price I am willing to pay.

Btw I am still in favor of #13276 over this and ryu. Why? I don't think that the performance aspect of ryu will really make a noticeable difference as long as we are still generating these intermediate string objects everywhere. Then ryu without any question much more code to maintain, than printf("%.17g", number). The only real disadvantage here is that number are sometimes, when they can't be represented pricesely a bit more verbose when printed. That doesn't justify for me to port and maintain a ryu implementation.

But whatever...

@timotheecour
Copy link
Member Author

timotheecour commented Feb 9, 2020

May I complain here with "it can blow up size of json messages", too?

with this PR, this is now up to the user, depending on precision param. by default, no blowup. As option, roundtrip safety.

And no I disagree [...]

Not a single language I've ever used prints float literal 0.6 as 0.59999999999999998: python, D, C, C++, go, node js, matlab, octave, java...
Don't use echo / $ for rountrip-safe serialization (or wait until ryu is implemented)

Btw, I don't think that the performance of ryu will really make a difference as long as we are still generating these intermediate string objects everywhere.

performance is a separate issue. And we already have means to avoid intermediate string objects today for performance sensitive code

@krux02
Copy link
Contributor

krux02 commented Feb 9, 2020

with this PR, this is now up to the user, depending on precision param. by default, no blowup. As option, roundtrip safety.

Do I get you right, by default json serialization isn't correct, but you provide an option to the user to enable correctness.

@timotheecour
Copy link
Member Author

I've explained this in detail in #13364 (comment)

@timotheecour timotheecour changed the title properly fix #13196: json serialialization lossless roundtrip properly fix #13196: json serialization with option for lossless roundtrip Feb 9, 2020
@Araq
Copy link
Member

Araq commented Feb 10, 2020

But the real issue is that 0.6 is now ugly but correct for JSON serialization only? That's terrible.

@disruptek
Copy link
Contributor

The idea that we add parsing modes for JSON is a good one. I'm not sure how the API works, but it lends the programmer a consistent tool to solve problems of nil, object refs, enums, maybe even circular references.

It can be argued that a better option is no option, but I don't think it can be argued that no option is a better option for Nim today given the complexities of the problem and the stability of the language.

@stale
Copy link

stale bot commented Feb 12, 2021

This pull request has been automatically marked as stale because it has not had recent activity. If you think it is still a valid PR, please rebase it on the latest devel; otherwise it will be closed. Thank you for your contributions.

@stale stale bot added the stale Staled PR/issues; remove the label after fixing them label Feb 12, 2021
@stale stale bot closed this Mar 14, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
stale Staled PR/issues; remove the label after fixing them
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Lossy json serialization/deserialization round-trip for floats
6 participants