Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Validate that the new snake/kebab case naming policies match the Json.NET implementation. #77309

Closed
eiriktsarpalis opened this issue Oct 21, 2022 · 18 comments · Fixed by #90316
Closed
Assignees
Labels
area-System.Text.Json blocking-release enhancement Product code improvement that does NOT require public API changes/additions
Milestone

Comments

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Oct 21, 2022

We should probably add a few tests validating that the output matches that of the Json.NET implementation, particularly when it comes to nontrivial inputs. We have time until .NET 8 ships, so we can always introduce breaking changes if needed.

Originally posted by @eiriktsarpalis in #69613 (comment)

@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Oct 21, 2022
@eiriktsarpalis eiriktsarpalis added this to the 8.0.0 milestone Oct 21, 2022
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Oct 21, 2022
@eiriktsarpalis
Copy link
Member Author

cc @YohDeadfall

@ghost
Copy link

ghost commented Oct 21, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

We should probably add a few tests validating that the output matches that of the Json.NET implementation, particularly when it comes to nontrivial inputs. We have time until .NET 8 ships, so we can always introduce breaking changes if needed.

Originally posted by @eiriktsarpalis in #69613 (comment)

Author: eiriktsarpalis
Assignees: krwq
Labels:

area-System.Text.Json

Milestone: 8.0.0

@eiriktsarpalis eiriktsarpalis added the enhancement Product code improvement that does NOT require public API changes/additions label Oct 21, 2022
@gregsdennis
Copy link
Contributor

Is matching JSON.Net really a concern anymore? It'll be 5 .net versions by the time this goes out, and it seems that STJ stands on its own at this point.

@eiriktsarpalis
Copy link
Member Author

Given that it is new behavior, it stands to reason that we do some degree of due diligence, so at the bare minimum any divergences are introduced intentionally rather than accidentally.

@YohDeadfall
Copy link
Contributor

When I worked on the policies I looked at other canonical implementations in various languages and libraries, so it's not an accidental implementation. Tests are taken from externals sources too to verify compatibility. So, I guess it would be better to verify the current behavior with different popular implementations like it's in TypeScript, Rust which can be used for frontend development too, and so on.

From my Npgsql time I remember that I tweaked the code taken by @roji or someone else before just to make more expectable results than Json.NET makes. I wouldn't align the new policies with that library.

@krwq
Copy link
Member

krwq commented Oct 21, 2022

I think we should

  • write down some interesting cases (common usage scenarios, cases with _, some weird F# naming, whatever we can think of where we might observe differences)
  • show how our implementations behave - are word splitting rules consistent, what are the differences (for the recently added we can just show 1 of them since they have exact same rules)
  • show how NSJ behaves, is it consistent between its own implementation? Is it behaving same as ours? When does it behave differently

then given all of the above we should decide next steps - we should really do what's most intuitive to users and that may be influenced by existing behavior or not. If it means breaking change in some obscure scenario I'm ok with that. We should have perf in mind when looking at any fixes to the current default policy - we don't want to have 5% regression in E2E scenario

@eiriktsarpalis eiriktsarpalis added the help wanted [up-for-grabs] Good issue for external contributors label Jul 21, 2023
@eiriktsarpalis eiriktsarpalis removed the help wanted [up-for-grabs] Good issue for external contributors label Aug 8, 2023
@eiriktsarpalis eiriktsarpalis self-assigned this Aug 8, 2023
@eiriktsarpalis
Copy link
Member Author

I did a fuzzing run on the new naming policies, comparing their outputs with those of the equivalent Json.NET policies. It found a number of divergences primarily related to the handling of non-letter characters. More specifically (assuming JsonNamingPolicy.SnakeCaseLower):

  • Certain non-letter characters are being trimmed, for instance the name _foo will be rendered foo, a% will be rendered a and _?#__ will be rendered as the empty string. The Json.NET naming strategy preserves the original names in all cases and doesn't trim characters.
    • This doesn't seem to happen if these characters lie between words, for example foo%bar will get normalized to foo_bar.
    • At the same time, non-punctuation symbols are not being trimmed, for instance the name $type will be rendered $type, matching the Json.NET output.
  • Multiple occurrences of the separator character between words will be normalized to a single occurrence. For example, the name __foo__bar__ will be rendered foo_bar by STJ (with Json.NET preserving the original __foo__bar__ value).

These divergences are concerning for a number of reasons:

  1. Some of these examples are valid .NET member identifiers and could be expected to be part of the JSON contract of a POCO.
  2. Naming policies can also be applied to dictionary key deserialization, where all non-null strings are valid keys.
  3. The trimming behavior could result in key collisions that fail deserialization or overwrite data in unexpected ways. For instance the names _foo and Foo both normalize to foo.
  4. Divergences in the policy would make STJ deserialization of data persisted by other Json.NET applications more difficult to achieve.

While I don't have reason to believe that the Json.NET implementation is more correct than what we current have, it's probably for the best that we try to emulate it as much as possible. I'll submit a PR changing the implementation and updating/extending tests shortly.

@YohDeadfall you mentioned earlier that the current implementation was following best practices from naming policy components in other platforms. Would it be possible to share a few of these and why you think we should be emulating them?

@YohDeadfall
Copy link
Contributor

I did a fuzzing run on the new naming policies, comparing their outputs with those of the equivalent Json.NET policies.

The world doesn't end on Microsoft and .NET with its ecosystem. While I understand that System.Text.Json is aimed to replace JSON.NET which is quite popular it's not the best thing ever. It's just what people are used to, but again, in case of cross platform communications things can be pretty screwed.

At the same time, non-punctuation symbols are not being trimmed, for instance the name $type will be rendered $type, matching the Json.NET output.

That might be unintentional since the original code was using Unicode text segmentation for word extraction and then applying a casing policy with a follow up concatenation.

Would it be possible to share a few of these and why you think we should be emulating them?

The research was done a while ago during my previous attempt on bringing naming policies to .NET: dotnet/corefx#41354 (comment). Scroll up and down to see other relevant comments. What I can recall at the moment that my work was also inspired by hecks crate in Rust because it was one of popular naming converters non-utilizing regular expressions in contradiction to what I saw for Python and JavaScript.

@eiriktsarpalis
Copy link
Member Author

I did a fuzzing run on the new naming policies, comparing their outputs with those of the equivalent Json.NET policies.

The world doesn't end on Microsoft and .NET with its ecosystem. While I understand that System.Text.Json is aimed to replace JSON.NET which is quite popular it's not the best thing ever. It's just what people are used to, but again, in case of cross platform communications things can be pretty screwed.

I would agree with that, and if there's a way we can improve interoperability with other platforms we should certainly be pursuing it. What I'm failing to understand though is how this could be achieved with a naming policy: it's a feature catering to code-first as opposed to schema-first approaches where JSON is typically either being roundtripped by the same serializer, or is called by clients explicitly using a schema that has been generated by the server's naming policy. I don't ever recall seeing a setup where, say, a Python client is calling a Java server where each peer uses its own naming policy with the expectation that the resultant JSON property names match -- that sounds like an extremely brittle configuration.

So if indeed this is a feature meant to cater to code-first .NET applications, my impression is that parity with Json.NET is much more valuable than the semantics offered by a similar feature in another platform.

Would it be possible to share a few of these and why you think we should be emulating them?

The research was done a while ago during my previous attempt on bringing naming policies to .NET: dotnet/corefx#41354 (comment).

Thanks for the link, it seems you've spent a long time studying this problem space. My takeaway from reading it is that the semantics are effectively identical to the Json.NET implementation, at least when it comes to run-of-the mill .NET member names that would get used in the 99% case (which is consistent with the findings of my fuzzing runs).

The only divergence really comes from the trimming/collapsing semantics for certain char categories. If I'm honest I can't see how this might be considered desirable behavior since (as you're pointing out in that older post) it increases the chances of collision. If its only purpose is to prettify the resultant property names then I think we can leave it out.

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 9, 2023
@JamesNK
Copy link
Member

JamesNK commented Aug 9, 2023

What Json.NET does isn't perfect. There have been a number of PRs that have contributed improvements to what Json.NET does and I haven't merged them because it's a breaking change to change property names.

@eiriktsarpalis
Copy link
Member Author

Are there any specific examples you have in mind? I skimmed through the Json.NET backlog and I mostly found variations of this problem. While it would be nice to fix, I think being able to read old data persisted by Json.NET using the same model types is a valuable goal in its own right.

@eiriktsarpalis
Copy link
Member Author

cc @roji who has also worked on this problem in the past.

@YohDeadfall
Copy link
Contributor

As far as I remember @roji took the code from JSON.NET for name translations in Npgsql and then I improved it there. Then as the result of my work opened a pull request in JSON.NET which you can see linked to the issue you mentioned.

@YohDeadfall
Copy link
Contributor

@khellang was on the same area with me previosuly, so I would call him actually.

@eiriktsarpalis
Copy link
Member Author

Here's an alternative implementation that addresses JamesNK/Newtonsoft.Json#1956 but still doesn't trim non-alphanumeric characters. It deviates from Json.NET implementation but still provides 1-1 mapping guarantees up to case insensitivity. The diff in the test file should highlight the semantic differences from Json.NET.

@YohDeadfall
Copy link
Contributor

While searching for answer on a different question in the first pull request, I accidentally stopped scrolling on dotnet/corefx#41354 (comment) which highlights that one of popular JavaScript name converters ignores punctuation while a package from Python keeps it.

That makes me think why don't support both behaviors which differ only by a switch in the implementation? That can be really valuable for all consumers and stop bike shedding around since there's no standard.

In my personal opinion I would go with trimming just because frontend development is done in JavaScript or TypeScript and interoperability without a headache is crucial here.

@eiriktsarpalis
Copy link
Member Author

eiriktsarpalis commented Aug 10, 2023

We can consider a offering a switch in the future, but there's no time to squeeze in new APIs for .NET 8. For now we need to make sure that we ship good defaults in the APIs already approved, and I think not skipping characters is a good default.

@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 11, 2023
@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Json blocking-release enhancement Product code improvement that does NOT require public API changes/additions
Projects
None yet
5 participants