Validate that the new snake/kebab case naming policies match the Json.NET implementation. #77309

eiriktsarpalis · 2022-10-21T09:57:03Z

We should probably add a few tests validating that the output matches that of the Json.NET implementation, particularly when it comes to nontrivial inputs. We have time until .NET 8 ships, so we can always introduce breaking changes if needed.

Originally posted by @eiriktsarpalis in #69613 (comment)

dotnet-issue-labeler · 2022-10-21T09:57:06Z

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

eiriktsarpalis · 2022-10-21T09:57:50Z

cc @YohDeadfall

ghost · 2022-10-21T09:58:15Z

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

We should probably add a few tests validating that the output matches that of the Json.NET implementation, particularly when it comes to nontrivial inputs. We have time until .NET 8 ships, so we can always introduce breaking changes if needed.

Originally posted by @eiriktsarpalis in #69613 (comment)

Author:	eiriktsarpalis
Assignees:	krwq
Labels:	`area-System.Text.Json`
Milestone:	8.0.0

gregsdennis · 2022-10-21T10:02:09Z

Is matching JSON.Net really a concern anymore? It'll be 5 .net versions by the time this goes out, and it seems that STJ stands on its own at this point.

eiriktsarpalis · 2022-10-21T10:08:34Z

Given that it is new behavior, it stands to reason that we do some degree of due diligence, so at the bare minimum any divergences are introduced intentionally rather than accidentally.

YohDeadfall · 2022-10-21T10:15:17Z

When I worked on the policies I looked at other canonical implementations in various languages and libraries, so it's not an accidental implementation. Tests are taken from externals sources too to verify compatibility. So, I guess it would be better to verify the current behavior with different popular implementations like it's in TypeScript, Rust which can be used for frontend development too, and so on.

From my Npgsql time I remember that I tweaked the code taken by @roji or someone else before just to make more expectable results than Json.NET makes. I wouldn't align the new policies with that library.

krwq · 2022-10-21T10:16:57Z

I think we should

write down some interesting cases (common usage scenarios, cases with _, some weird F# naming, whatever we can think of where we might observe differences)
show how our implementations behave - are word splitting rules consistent, what are the differences (for the recently added we can just show 1 of them since they have exact same rules)
show how NSJ behaves, is it consistent between its own implementation? Is it behaving same as ours? When does it behave differently

then given all of the above we should decide next steps - we should really do what's most intuitive to users and that may be influenced by existing behavior or not. If it means breaking change in some obscure scenario I'm ok with that. We should have perf in mind when looking at any fixes to the current default policy - we don't want to have 5% regression in E2E scenario

eiriktsarpalis · 2023-08-08T17:58:30Z

I did a fuzzing run on the new naming policies, comparing their outputs with those of the equivalent Json.NET policies. It found a number of divergences primarily related to the handling of non-letter characters. More specifically (assuming JsonNamingPolicy.SnakeCaseLower):

Certain non-letter characters are being trimmed, for instance the name _foo will be rendered foo, a% will be rendered a and _?#__ will be rendered as the empty string. The Json.NET naming strategy preserves the original names in all cases and doesn't trim characters.
- This doesn't seem to happen if these characters lie between words, for example foo%bar will get normalized to foo_bar.
- At the same time, non-punctuation symbols are not being trimmed, for instance the name $type will be rendered $type, matching the Json.NET output.
Multiple occurrences of the separator character between words will be normalized to a single occurrence. For example, the name __foo__bar__ will be rendered foo_bar by STJ (with Json.NET preserving the original __foo__bar__ value).

These divergences are concerning for a number of reasons:

Some of these examples are valid .NET member identifiers and could be expected to be part of the JSON contract of a POCO.
Naming policies can also be applied to dictionary key deserialization, where all non-null strings are valid keys.
The trimming behavior could result in key collisions that fail deserialization or overwrite data in unexpected ways. For instance the names _foo and Foo both normalize to foo.
Divergences in the policy would make STJ deserialization of data persisted by other Json.NET applications more difficult to achieve.

While I don't have reason to believe that the Json.NET implementation is more correct than what we current have, it's probably for the best that we try to emulate it as much as possible. I'll submit a PR changing the implementation and updating/extending tests shortly.

@YohDeadfall you mentioned earlier that the current implementation was following best practices from naming policy components in other platforms. Would it be possible to share a few of these and why you think we should be emulating them?

YohDeadfall · 2023-08-08T19:03:41Z

I did a fuzzing run on the new naming policies, comparing their outputs with those of the equivalent Json.NET policies.

The world doesn't end on Microsoft and .NET with its ecosystem. While I understand that System.Text.Json is aimed to replace JSON.NET which is quite popular it's not the best thing ever. It's just what people are used to, but again, in case of cross platform communications things can be pretty screwed.

At the same time, non-punctuation symbols are not being trimmed, for instance the name $type will be rendered $type, matching the Json.NET output.

That might be unintentional since the original code was using Unicode text segmentation for word extraction and then applying a casing policy with a follow up concatenation.

Would it be possible to share a few of these and why you think we should be emulating them?

The research was done a while ago during my previous attempt on bringing naming policies to .NET: dotnet/corefx#41354 (comment). Scroll up and down to see other relevant comments. What I can recall at the moment that my work was also inspired by hecks crate in Rust because it was one of popular naming converters non-utilizing regular expressions in contradiction to what I saw for Python and JavaScript.

eiriktsarpalis · 2023-08-08T22:15:39Z

I did a fuzzing run on the new naming policies, comparing their outputs with those of the equivalent Json.NET policies.

The world doesn't end on Microsoft and .NET with its ecosystem. While I understand that System.Text.Json is aimed to replace JSON.NET which is quite popular it's not the best thing ever. It's just what people are used to, but again, in case of cross platform communications things can be pretty screwed.

I would agree with that, and if there's a way we can improve interoperability with other platforms we should certainly be pursuing it. What I'm failing to understand though is how this could be achieved with a naming policy: it's a feature catering to code-first as opposed to schema-first approaches where JSON is typically either being roundtripped by the same serializer, or is called by clients explicitly using a schema that has been generated by the server's naming policy. I don't ever recall seeing a setup where, say, a Python client is calling a Java server where each peer uses its own naming policy with the expectation that the resultant JSON property names match -- that sounds like an extremely brittle configuration.

So if indeed this is a feature meant to cater to code-first .NET applications, my impression is that parity with Json.NET is much more valuable than the semantics offered by a similar feature in another platform.

Would it be possible to share a few of these and why you think we should be emulating them?

The research was done a while ago during my previous attempt on bringing naming policies to .NET: dotnet/corefx#41354 (comment).

Thanks for the link, it seems you've spent a long time studying this problem space. My takeaway from reading it is that the semantics are effectively identical to the Json.NET implementation, at least when it comes to run-of-the mill .NET member names that would get used in the 99% case (which is consistent with the findings of my fuzzing runs).

The only divergence really comes from the trimming/collapsing semantics for certain char categories. If I'm honest I can't see how this might be considered desirable behavior since (as you're pointing out in that older post) it increases the chances of collision. If its only purpose is to prettify the resultant property names then I think we can leave it out.

JamesNK · 2023-08-09T23:12:57Z

What Json.NET does isn't perfect. There have been a number of PRs that have contributed improvements to what Json.NET does and I haven't merged them because it's a breaking change to change property names.

eiriktsarpalis · 2023-08-10T09:09:45Z

Are there any specific examples you have in mind? I skimmed through the Json.NET backlog and I mostly found variations of this problem. While it would be nice to fix, I think being able to read old data persisted by Json.NET using the same model types is a valuable goal in its own right.

eiriktsarpalis · 2023-08-10T11:05:58Z

cc @roji who has also worked on this problem in the past.

YohDeadfall · 2023-08-10T11:11:59Z

As far as I remember @roji took the code from JSON.NET for name translations in Npgsql and then I improved it there. Then as the result of my work opened a pull request in JSON.NET which you can see linked to the issue you mentioned.

YohDeadfall · 2023-08-10T11:26:39Z

@khellang was on the same area with me previosuly, so I would call him actually.

eiriktsarpalis · 2023-08-10T13:33:08Z

Here's an alternative implementation that addresses JamesNK/Newtonsoft.Json#1956 but still doesn't trim non-alphanumeric characters. It deviates from Json.NET implementation but still provides 1-1 mapping guarantees up to case insensitivity. The diff in the test file should highlight the semantic differences from Json.NET.

YohDeadfall · 2023-08-10T14:35:25Z

While searching for answer on a different question in the first pull request, I accidentally stopped scrolling on dotnet/corefx#41354 (comment) which highlights that one of popular JavaScript name converters ignores punctuation while a package from Python keeps it.

That makes me think why don't support both behaviors which differ only by a switch in the implementation? That can be really valuable for all consumers and stop bike shedding around since there's no standard.

In my personal opinion I would go with trimming just because frontend development is done in JavaScript or TypeScript and interoperability without a headache is crucial here.

eiriktsarpalis · 2023-08-10T15:06:00Z

We can consider a offering a switch in the future, but there's no time to squeeze in new APIs for .NET 8. For now we need to make sure that we ship good defaults in the APIs already approved, and I think not skipping characters is a good default.

ghost added the untriaged New issue has not been triaged by the area owner label Oct 21, 2022

eiriktsarpalis assigned krwq Oct 21, 2022

eiriktsarpalis added this to the 8.0.0 milestone Oct 21, 2022

ghost removed the untriaged New issue has not been triaged by the area owner label Oct 21, 2022

eiriktsarpalis added the area-System.Text.Json label Oct 21, 2022

eiriktsarpalis added the enhancement Product code improvement that does NOT require public API changes/additions label Oct 21, 2022

eiriktsarpalis unassigned krwq Jan 23, 2023

eiriktsarpalis added the blocking-release label May 31, 2023

eiriktsarpalis added the help wanted [up-for-grabs] Good issue for external contributors label Jul 21, 2023

eiriktsarpalis removed the help wanted [up-for-grabs] Good issue for external contributors label Aug 8, 2023

eiriktsarpalis self-assigned this Aug 8, 2023

eiriktsarpalis mentioned this issue Aug 9, 2023

Rework JsonNamingPolicy.SnakeCase/KebabCase to precisely match Json.NET semantics. #90248

Closed

ghost added the in-pr There is an active PR which will close this issue when it is merged label Aug 9, 2023

eiriktsarpalis mentioned this issue Aug 10, 2023

Rework SnakeCase/KebabCase naming policies to closer match Json.NET's #90316

Merged

eiriktsarpalis closed this as completed in #90316 Aug 11, 2023

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Aug 11, 2023

ghost locked as resolved and limited conversation to collaborators Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Validate that the new snake/kebab case naming policies match the Json.NET implementation. #77309

Validate that the new snake/kebab case naming policies match the Json.NET implementation. #77309

eiriktsarpalis commented Oct 21, 2022 •

edited

Loading

dotnet-issue-labeler bot commented Oct 21, 2022

eiriktsarpalis commented Oct 21, 2022

ghost commented Oct 21, 2022

gregsdennis commented Oct 21, 2022

eiriktsarpalis commented Oct 21, 2022

YohDeadfall commented Oct 21, 2022

krwq commented Oct 21, 2022

eiriktsarpalis commented Aug 8, 2023

YohDeadfall commented Aug 8, 2023

eiriktsarpalis commented Aug 8, 2023

JamesNK commented Aug 9, 2023

eiriktsarpalis commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023

YohDeadfall commented Aug 10, 2023

YohDeadfall commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023

YohDeadfall commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023 •

edited

Loading

Validate that the new snake/kebab case naming policies match the Json.NET implementation. #77309

Validate that the new snake/kebab case naming policies match the Json.NET implementation. #77309

Comments

eiriktsarpalis commented Oct 21, 2022 • edited Loading

dotnet-issue-labeler bot commented Oct 21, 2022

eiriktsarpalis commented Oct 21, 2022

ghost commented Oct 21, 2022

gregsdennis commented Oct 21, 2022

eiriktsarpalis commented Oct 21, 2022

YohDeadfall commented Oct 21, 2022

krwq commented Oct 21, 2022

eiriktsarpalis commented Aug 8, 2023

YohDeadfall commented Aug 8, 2023

eiriktsarpalis commented Aug 8, 2023

JamesNK commented Aug 9, 2023

eiriktsarpalis commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023

YohDeadfall commented Aug 10, 2023

YohDeadfall commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023

YohDeadfall commented Aug 10, 2023

eiriktsarpalis commented Aug 10, 2023 • edited Loading

eiriktsarpalis commented Oct 21, 2022 •

edited

Loading

eiriktsarpalis commented Aug 10, 2023 •

edited

Loading