Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove non-spec functionality from std.json by removing object field sorting in toJSON. #6084

Closed
wants to merge 1 commit into from

Conversation

marler8997
Copy link
Contributor

While attempting to enhance std.json to support ordering object fields, I noticed that it was sorting fields alphabetically when converting an object back to it's JSON string representation. This extra operation not only has a performance cost, but also imposes an ordering that is not defined by the spec. @quickfur has argued that such "non-spec" functionality should not exist in std.json so this PR has emerged as a result.

#6059 (comment)

...the underlying issue is that we're trying to impose order upon a structure that has been explicitly defined to be unordered. This, to me, is a code smell. In my experience, this style of coding often has or leads to fragility, unhandled corner cases, exceptional behaviours, and lots of little crufty things like that, that are symptomatic of an inherent impedance mismatch between what the code does and what the data is supposed to represent.

@dlang-bot
Copy link
Contributor

Thanks for your pull request, @marler8997! We are looking forward to reviewing it, and you should be hearing from a maintainer soon.

Some tips to help speed things up:

  • smaller, focused PRs are easier to review than big ones

  • try not to mix up refactoring or style changes with bug fixes or feature enhancements

  • provide helpful commit messages explaining the rationale behind each change

Bear in mind that large or tricky changes may require multiple rounds of review and revision.

Please see CONTRIBUTING.md for more information.

Bugzilla references

Your PR doesn't reference any Bugzilla issue.

If your PR contains non-trivial changes, please reference a Bugzilla issue or create a manual changelog.

@marler8997 marler8997 changed the title Remove non-spec functionality from std.json by remove object field sorting in toJSON. Remove non-spec functionality from std.json by removing object field sorting in toJSON. Jan 28, 2018
@marler8997 marler8997 force-pushed the jsonNoSort branch 2 times, most recently from c9c8449 to e4ebfe2 Compare January 28, 2018 10:32
@CyberShadow
Copy link
Member

Forgive the dumb question, but, would this field order sorting that's being removed here help with the initial problem that #6059 set out to solve (which I understand to be testing DMD's JSON output)? Simply sorting the fields to some stable order, incl. lexicographically, seems to satisfy the necessities of that particular use case.

the underlying issue is that we're trying to impose order upon a structure that has been explicitly defined to be unordered

While pedantically true, I wonder if we can eat the practical implications of this change. AA iteration order varies across D versions and (I think) the bitness of the target CPU architecture. Imposing a breaking change onto our users and then simply telling them that they were doing it wrong when they complain will not be doing them a service.

@marler8997
Copy link
Contributor Author

If we compared the JSON files by loading them in and comparing the data structures then it would work yes. However, it's more useful to show a diff at the file level since the file is what actually needs to be maintained, not the data structures that represent the files.

@CyberShadow
Copy link
Member

CyberShadow commented Jan 28, 2018

Well, why not sort the file before writing it? I understand that, given the present situation, it's as easy as fn.readText.parseJSON.toPrettyString().toFile(fn).

@marler8997
Copy link
Contributor Author

Well, why not sort the file before writing it? I understand that, given the present situation, it's as easy as fn.readText.parseJSON.toPrettyString().toFile(fn).

That would "hopefully" work:) Hopefully the ordering of the associative array key/values will be the same on different runs. But even if it did work, you end up with files where the once nicely ordered fields are no longer ordered logically, and now it's much harder to fix the expected file since the contents that were "diff'd" are in a completely different order.

@CyberShadow
Copy link
Member

and now it's much harder to fix the expected file since the contents that were "diff'd" are in a completely different order.

You can avoid that problem by also keeping the expected file sorted.

@marler8997
Copy link
Contributor Author

Since order isn't guaranteed, the order can change on each platform.

@marler8997
Copy link
Contributor Author

Oh sorry you were saying with the current state of things, yes you're right

@wilzbach
Copy link
Member

wilzbach commented Jan 29, 2018

My thoughts:

  • performance improvement: std.json is slow as snail anyways (std.data.json fixes that)
  • this was probably added to have a consistent toJSON output and could break code out there.
  • even though it's defined in the current ECMAScript script as not guaranteed, browser already preserve order, so people might have expected the same for std.json

@marler8997
Copy link
Contributor Author

Pinging @quickfur

No need to rush in getting a reply, but at some point I'd like to get your input on this PR. Thanks.

@quickfur
Copy link
Member

@wilzbach Browsers may have preserved insertion order (up to certain exceptions, as pointed out by the replies in the discussion you linked to), but that's not what std.json does today. So arguably, sorting keys for JSON output is something outside of the JSON spec.

I'd say:

  1. Going outside of the spec is risky behaviour; if browsers decide to change their current behaviour for whatever reason, all code that relies on the non-spec behaviour in std.json would suddenly become incompatible.

  2. OTOH, while the JSON spec says the order is not specified, it doesn't say that implementations cannot choose their own orders. All it says is that an implementation is free to choose whatever order it wants, and this includes being sorted, but user code should not rely on this order. So arguably, sorting output keys in std.json is within the realm of allowed implementation-specific behaviour.

  3. Of course, sorting keys does introduce a performance overhead, which may not be desirable.

  4. However, not sorting keys where we have previously sorted them may break some existing user code that has come to rely on the sorted behaviour. It's hard to say whether or not it's OK to do this. One can argue that the user code is wrong because it's relying on out-of-spec behaviour, so we're well within our "rights" to change the output order without notice. But OTOH, this breakage is going to be silent, which is not nice, whereas breaking existing code with a compile error, while still not very nice, is not as bad because at least the user is made aware of the problem and has an opportunity to fix it. Silent breakages, however, are very not nice -- the user upgrades the compiler and suddenly his program has a bug, but he doesn't know this until it causes something to explode on a client's production server.

So unless we can justify this breakage, e.g. if there is a huge performance win (which I doubt, given that this is std.json we're talking about -- but I could be wrong), I'm hesitant to go ahead with breaking existing code. If we were starting from scratch, then yeah, kill the sort. But unfortunately std.json has been around for a while, and we have to be very careful with breaking existing code.

If we do decide to go ahead with this, we'd better be sure to put up a big fat warning in the changelog to warn people of the potential breakage.

@marler8997
Copy link
Contributor Author

@quickfur Thanks for your comments. I think in this case, the potential for breaking others code is likely,

I am a bit confused though. The second point you made, an argument for not integrating this change is also an argument for integrating #6059 and contradicts what you said in criticism of that PR.

...the underlying issue is that we're trying to impose order upon a structure that has been explicitly defined to be unordered. This, to me, is a code smell. In my experience, this style of coding often has or leads to fragility, unhandled corner cases, exceptional behaviours, and lots of little crufty things like that, that are symptomatic of an inherent impedance mismatch between what the code does and what the data is supposed to represent.

OTOH, while the JSON spec says the order is not specified, it doesn't say that implementations cannot choose their own orders. All it says is that an implementation is free to choose whatever order it wants, and this includes being sorted, but user code should not rely on this order. So arguably, sorting output keys in std.json is within the realm of allowed implementation-specific behaviour.

Did you change your mind?

@quickfur
Copy link
Member

quickfur commented Jan 29, 2018

Well, that's why I said "arguably" in italics. In an ideal world, we'd only deal with strictly spec-only behaviour, so there would be no sorting / explicit order preservation anywhere. But we're dealing with a non-ideal situation where std.json has already had this extraneous sorting behaviour for a long time, and it would be bad to change it since it will break existing code. We'd like to change this, especially since it will improve performance, but our hands are tied by history (unless we can justify the breakage by a large performance gain).

OTOH, with PR #6059, there was no existing non-spec behaviour, and that PR is proposing to introduce such behaviour. Our hands are not tied in that case, and we can choose not to make the situation worse by adding more non-spec behaviour to std.json than it already has.

Taking a step back from these details, though: I think the amount of effort spent trying to improve std.json might be better spent working on getting a better JSON implementation, such as std.data.json, merged into Phobos as a replacement instead. There's just too much historical baggage and, in retrospect, suboptimal design decisions with std.json that limits how much we can improve it.

@marler8997
Copy link
Contributor Author

I agree that when you take all arguments into account the decision to not integrate this PR is clear. But the arguments you made in each PR stand on their own and very clearly contradict each other. In one case you say std.json should NOT SUPPORT any non-spec functionality which you interpreted to mean that an application should not support ordering the fields in any way. But in this PR you say it's OK to order the fields since the spec doesn't define an order, so the application is free to choose whatever order they require. These 2 statements cannot both be true. I personally agree with the second, which one do you agree with?

@quickfur
Copy link
Member

I really did not want to get into splitting hairs over this, but since you asked: There is a subtle, but important distinction between a spec allowing an implementation to choose some particular order in its output, vs. an implementation actively, or deliberately, choosing a particular order.

When a spec says something is implementation-defined or unspecified, the intent is usually that the implementation would choose whatever is most convenient for it -- since client code should not rely on unspecified behaviours, that means whatever order the implementation happens to produce should not cause any problems. So the implementor has more flexibility in how he implements the feature, since he's not bound by any requirements that the output must be in some particular order. Usually, this flexibility is used to make the implementation more performant and/or more memory-efficient, or perhaps whatever the implementor finds easiest to write.

Actively choosing a particular order, or even more overtly, giving client code control over the order (PR #6059) where a spec says the order is undefined, is something else. Explicitly sorting is something that introduces an extra cost, but is unnecessary because the spec doesn't require the output to be sorted. So if the implementor were given a choice, not sorting seems to be the better option, but neither violates the spec (the spec says the order is unspecified; sorted is included in "unspecified").

OTOH, as a library giving user code the option of specifying a sorting order, in other words, sorting order is now a feature of your library -- that's endorsing reliance on a particular order and encouraging user code to rely on it, whereas the spec clearly says the order is undefined. It's this endorsement that, even if it may not break the letter of the spec, clearly goes against the spirit of the spec, because now user code is encouraged to rely on something the spec says is undefined (otherwise why would the library implementor go out of his way to implement the possibility of choosing a sort order in the first place).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants