-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Validate that the new snake/kebab case naming policies match the Json.NET implementation. #77309
Comments
I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label. |
cc @YohDeadfall |
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsWe should probably add a few tests validating that the output matches that of the Json.NET implementation, particularly when it comes to nontrivial inputs. We have time until .NET 8 ships, so we can always introduce breaking changes if needed. Originally posted by @eiriktsarpalis in #69613 (comment)
|
Is matching JSON.Net really a concern anymore? It'll be 5 .net versions by the time this goes out, and it seems that STJ stands on its own at this point. |
Given that it is new behavior, it stands to reason that we do some degree of due diligence, so at the bare minimum any divergences are introduced intentionally rather than accidentally. |
When I worked on the policies I looked at other canonical implementations in various languages and libraries, so it's not an accidental implementation. Tests are taken from externals sources too to verify compatibility. So, I guess it would be better to verify the current behavior with different popular implementations like it's in TypeScript, Rust which can be used for frontend development too, and so on. From my Npgsql time I remember that I tweaked the code taken by @roji or someone else before just to make more expectable results than Json.NET makes. I wouldn't align the new policies with that library. |
I think we should
then given all of the above we should decide next steps - we should really do what's most intuitive to users and that may be influenced by existing behavior or not. If it means breaking change in some obscure scenario I'm ok with that. We should have perf in mind when looking at any fixes to the current default policy - we don't want to have 5% regression in E2E scenario |
I did a fuzzing run on the new naming policies, comparing their outputs with those of the equivalent Json.NET policies. It found a number of divergences primarily related to the handling of non-letter characters. More specifically (assuming
These divergences are concerning for a number of reasons:
While I don't have reason to believe that the Json.NET implementation is more correct than what we current have, it's probably for the best that we try to emulate it as much as possible. I'll submit a PR changing the implementation and updating/extending tests shortly. @YohDeadfall you mentioned earlier that the current implementation was following best practices from naming policy components in other platforms. Would it be possible to share a few of these and why you think we should be emulating them? |
The world doesn't end on Microsoft and .NET with its ecosystem. While I understand that
That might be unintentional since the original code was using Unicode text segmentation for word extraction and then applying a casing policy with a follow up concatenation.
The research was done a while ago during my previous attempt on bringing naming policies to .NET: dotnet/corefx#41354 (comment). Scroll up and down to see other relevant comments. What I can recall at the moment that my work was also inspired by hecks crate in Rust because it was one of popular naming converters non-utilizing regular expressions in contradiction to what I saw for Python and JavaScript. |
I would agree with that, and if there's a way we can improve interoperability with other platforms we should certainly be pursuing it. What I'm failing to understand though is how this could be achieved with a naming policy: it's a feature catering to code-first as opposed to schema-first approaches where JSON is typically either being roundtripped by the same serializer, or is called by clients explicitly using a schema that has been generated by the server's naming policy. I don't ever recall seeing a setup where, say, a Python client is calling a Java server where each peer uses its own naming policy with the expectation that the resultant JSON property names match -- that sounds like an extremely brittle configuration. So if indeed this is a feature meant to cater to code-first .NET applications, my impression is that parity with Json.NET is much more valuable than the semantics offered by a similar feature in another platform.
Thanks for the link, it seems you've spent a long time studying this problem space. My takeaway from reading it is that the semantics are effectively identical to the Json.NET implementation, at least when it comes to run-of-the mill .NET member names that would get used in the 99% case (which is consistent with the findings of my fuzzing runs). The only divergence really comes from the trimming/collapsing semantics for certain char categories. If I'm honest I can't see how this might be considered desirable behavior since (as you're pointing out in that older post) it increases the chances of collision. If its only purpose is to prettify the resultant property names then I think we can leave it out. |
What Json.NET does isn't perfect. There have been a number of PRs that have contributed improvements to what Json.NET does and I haven't merged them because it's a breaking change to change property names. |
Are there any specific examples you have in mind? I skimmed through the Json.NET backlog and I mostly found variations of this problem. While it would be nice to fix, I think being able to read old data persisted by Json.NET using the same model types is a valuable goal in its own right. |
cc @roji who has also worked on this problem in the past. |
As far as I remember @roji took the code from JSON.NET for name translations in Npgsql and then I improved it there. Then as the result of my work opened a pull request in JSON.NET which you can see linked to the issue you mentioned. |
@khellang was on the same area with me previosuly, so I would call him actually. |
Here's an alternative implementation that addresses JamesNK/Newtonsoft.Json#1956 but still doesn't trim non-alphanumeric characters. It deviates from Json.NET implementation but still provides 1-1 mapping guarantees up to case insensitivity. The diff in the test file should highlight the semantic differences from Json.NET. |
While searching for answer on a different question in the first pull request, I accidentally stopped scrolling on dotnet/corefx#41354 (comment) which highlights that one of popular JavaScript name converters ignores punctuation while a package from Python keeps it. That makes me think why don't support both behaviors which differ only by a switch in the implementation? That can be really valuable for all consumers and stop bike shedding around since there's no standard. In my personal opinion I would go with trimming just because frontend development is done in JavaScript or TypeScript and interoperability without a headache is crucial here. |
We can consider a offering a switch in the future, but there's no time to squeeze in new APIs for .NET 8. For now we need to make sure that we ship good defaults in the APIs already approved, and I think not skipping characters is a good default. |
We should probably add a few tests validating that the output matches that of the Json.NET implementation, particularly when it comes to nontrivial inputs. We have time until .NET 8 ships, so we can always introduce breaking changes if needed.
Originally posted by @eiriktsarpalis in #69613 (comment)
The text was updated successfully, but these errors were encountered: