-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Rework JsonNamingPolicy.SnakeCase/KebabCase to precisely match Json.NET semantics. #90248
Conversation
Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis Issue DetailsSee #77309 (comment) and #77309 (comment) for a justification for this change. I fuzzed the new implementation using FsCheck with 1M inputs, confirming that it matches its Json.NET equivalent. cc @JamesNK @YohDeadfall @jeffhandley Fix #77309.
|
[InlineData("snA__ kEcAsE", "SnA__ kEcAsE")] | ||
[InlineData("already_snake_case_ ", "already_snake_case_ ")] | ||
[InlineData("isJSONProperty", "IsJSONProperty")] | ||
[InlineData("shoutinG_CASE", "SHOUTING_CASE")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That case looks incorrect in my opinion even if it's how JSON.NET does that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's the camelCase policy though. It's been shipped for a while so there's no way that can change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I remember @kwrq had a ticket about making a breaking change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Honestly, such a change would be a very high-risk, low-reward move. Why break users like that?
[InlineData("CAMEL-CASE", "camelCase")] | ||
[InlineData("CAMEL-CASE", "CamelCase")] | ||
[InlineData("SNAKE_CASE", "snake_case")] | ||
[InlineData("SNAK-E_-CASE", "SNAKE_CASE")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The same. While I understand you intentions on moving devs from JSON.NET to System.Text.Json
cannot agree on such cases because it's not an improvement, but bug porting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bugward compatibility is important sometimes, but I wouldn't necessarily agree that this behavior is wrong. This is testing for a naming policy designed to map either PascalCase or camelCase identifiers to KEBAB-CASE, why should it try to be smart and also convert/collapse underscores? (at the expense of unintended collisions happening in other contexts?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I meant a different thing then. It adds a separator where it shouldn't be. As a developer I want to see SNAKE_CASE
instead of SNAK-E_CASE
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is similar to how SNAKEsCASE
will render as SNAK-ES-CASE
. In other words _
has no special meaning in the context of a kebab case naming strategy (it only does in snake_case). I can see why one might expect that we should try to be smart and also convert snake_case values to kebab-case, however:
- It has the side-effect of punctuation chars being collapsed to the separator value, increasing the chance of naming collisions. Json.NET has the nice property of being a 1-1 mapping (up to case insensitivity and whitespace), which substantially reduces that risk.
- .NET uses PascalCase for properties and camelCase for fields. Snake_case identifiers trigger analyzer warnings and kebab-case is outright illegal in C#. It doesn't make a lot of sense to optimize for something that is unidiomatic in the context of .NET
- As strange as
SNAK-E_-CASE
might look, similar weirdness arises in the current implementation if you know where to look. As far as I'm concerned all naming policies are magic and as such they can be unpredictable in certain cases.
Not a popular opinion, but have you thought about making a poll on Mastodon or Twitter? Can make that research if you wish. |
[InlineData("abc", "abc ")] | ||
[InlineData("abc", " abc ")] | ||
[InlineData("abc_def", " abc def ")] | ||
[InlineData("_abc", "_abc")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not evident from the diff, but newly added test cases start here.
result.Slice(0, resultUsedLength).Clear(); | ||
ArrayPool<char>.Shared.Return(rentedBuffer); | ||
} | ||
// Rented buffer 20% longer that the input. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
// Rented buffer 20% longer that the input. | |
// Rented buffer 20% longer than the input. |
Where did the 20% number come from? Is that arbitrary / a guess as to how much it might expand? It'd be worth expanding the comment a bit to explain its genesis.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Long story short, I used the median length of words in English as a base. It was discussed somewhere in dotnet/corefx#41354 I guess.
ArrayPool<char>.Shared.Return(rentedBuffer); | ||
} | ||
// Rented buffer 20% longer that the input. | ||
int initialBufferLength = (12 * name.Length) / 10; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not the simpler:
int initialBufferLength = (12 * name.Length) / 10; | |
int initialBufferLength = (int)(name.Length * 1.2); |
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would guess it was trying to avoid floats and use integral division only? I don't think it matters much.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's correct, my intention was about having no float math due to mathematical errors, but on small scales it shouldn't matter.
return; | ||
} | ||
char current = chars[i]; | ||
UnicodeCategory category = char.GetUnicodeCategory(current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we need to do anything special for surrogates?
continue; | ||
} | ||
if (!lowercase) | ||
current = char.ToUpperInvariant(current); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: Let's keep bracing consistent in the same method. Just above this there's a one-line body that uses braces, it'd be nice to do so here, too.
rentedBuffer = newBuffer; | ||
result = rentedBuffer; | ||
} | ||
ReadOnlySpan<char> chars = name.AsSpan(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is this local needed rather than all uses of chars
below just using name
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like a leftover here since the proposed implementation operates on separate characters rather than on words as the original does which brings an issue because characters have one-to-one mapping, but that's false for strings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just as a reference for all of those who are reviewing the pull request, here are two implementations I made before migrating policies to .NET:
-
Without true word segmentation according to the Unicode rules (the current mine and what you have in
main
): https://github.com/YohDeadfall/Yoh.Text.Json.NamingPolicies/blob/6ca9e48cf9bf1dc0d1b15ac1df46bfc02db10557/src/JsonSeparatorNamingPolicy.cs -
With true word segmentation according to the Unicode rules: https://github.com/YohDeadfall/Yoh.Text.Json.NamingPolicies/blob/cfbf445538d4ff8410b8584ab75009d4368777c9/src/JsonSimpleNamingPolicy.cs
The last one was the most correct implementation, but I had to remove Unicode text segmentation due to my plans on merging that work into .NET. It required a bit of side code because there are no text segmentation support (discussed with Levi Broderick on Twitter, he thought that won't find use case) and due to having worse performance than without it while producing same outputs for most important cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do like this better than the previous PR, especially when considering the cases cited in StringUtils.ToSnakeCase fails for many common scenarios · Issue #1956 · JamesNK/Newtonsoft.Json
Assert.AreEqual("abc1", StringUtils.ToSnakeCase("ABC1")); // FAILS, produces ab_c1
Yuck 😆
I would prefer not to backport the Rune
code though, and I'd rather we consistently not handle surrogate pairs than handle them in some naming policies but not others.
We've decided to close this approach in favor of #90316. |
See #77309 (comment) and #77309 (comment) for a justification for this change.
I fuzzed the new implementation using FsCheck with 1M inputs, confirming that it matches its Json.NET equivalent.
cc @JamesNK @YohDeadfall @jeffhandley
Fix #77309.