Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rework JsonNamingPolicy.SnakeCase/KebabCase to precisely match Json.NET semantics. #90248

Closed

Conversation

eiriktsarpalis
Copy link
Member

See #77309 (comment) and #77309 (comment) for a justification for this change.

I fuzzed the new implementation using FsCheck with 1M inputs, confirming that it matches its Json.NET equivalent.

cc @JamesNK @YohDeadfall @jeffhandley

Fix #77309.

@ghost
Copy link

ghost commented Aug 9, 2023

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

See #77309 (comment) and #77309 (comment) for a justification for this change.

I fuzzed the new implementation using FsCheck with 1M inputs, confirming that it matches its Json.NET equivalent.

cc @JamesNK @YohDeadfall @jeffhandley

Fix #77309.

Author: eiriktsarpalis
Assignees: -
Labels:

area-System.Text.Json

Milestone: -

@eiriktsarpalis eiriktsarpalis added this to the 8.0.0 milestone Aug 9, 2023
[InlineData("snA__ kEcAsE", "SnA__ kEcAsE")]
[InlineData("already_snake_case_ ", "already_snake_case_ ")]
[InlineData("isJSONProperty", "IsJSONProperty")]
[InlineData("shoutinG_CASE", "SHOUTING_CASE")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That case looks incorrect in my opinion even if it's how JSON.NET does that.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's the camelCase policy though. It's been shipped for a while so there's no way that can change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I remember @kwrq had a ticket about making a breaking change.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Honestly, such a change would be a very high-risk, low-reward move. Why break users like that?

[InlineData("CAMEL-CASE", "camelCase")]
[InlineData("CAMEL-CASE", "CamelCase")]
[InlineData("SNAKE_CASE", "snake_case")]
[InlineData("SNAK-E_-CASE", "SNAKE_CASE")]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same. While I understand you intentions on moving devs from JSON.NET to System.Text.Json cannot agree on such cases because it's not an improvement, but bug porting.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bugward compatibility is important sometimes, but I wouldn't necessarily agree that this behavior is wrong. This is testing for a naming policy designed to map either PascalCase or camelCase identifiers to KEBAB-CASE, why should it try to be smart and also convert/collapse underscores? (at the expense of unintended collisions happening in other contexts?)

Copy link
Contributor

@YohDeadfall YohDeadfall Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I meant a different thing then. It adds a separator where it shouldn't be. As a developer I want to see SNAKE_CASE instead of SNAK-E_CASE.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is similar to how SNAKEsCASE will render as SNAK-ES-CASE. In other words _ has no special meaning in the context of a kebab case naming strategy (it only does in snake_case). I can see why one might expect that we should try to be smart and also convert snake_case values to kebab-case, however:

  1. It has the side-effect of punctuation chars being collapsed to the separator value, increasing the chance of naming collisions. Json.NET has the nice property of being a 1-1 mapping (up to case insensitivity and whitespace), which substantially reduces that risk.
  2. .NET uses PascalCase for properties and camelCase for fields. Snake_case identifiers trigger analyzer warnings and kebab-case is outright illegal in C#. It doesn't make a lot of sense to optimize for something that is unidiomatic in the context of .NET
  3. As strange as SNAK-E_-CASE might look, similar weirdness arises in the current implementation if you know where to look. As far as I'm concerned all naming policies are magic and as such they can be unpredictable in certain cases.

@YohDeadfall
Copy link
Contributor

Not a popular opinion, but have you thought about making a poll on Mastodon or Twitter? Can make that research if you wish.

[InlineData("abc", "abc ")]
[InlineData("abc", " abc ")]
[InlineData("abc_def", " abc def ")]
[InlineData("_abc", "_abc")]
Copy link
Member Author

@eiriktsarpalis eiriktsarpalis Aug 9, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not evident from the diff, but newly added test cases start here.

@YohDeadfall
Copy link
Contributor

Made a tiny survey on Mastodon and Twitter. Will last for 3 days.

result.Slice(0, resultUsedLength).Clear();
ArrayPool<char>.Shared.Return(rentedBuffer);
}
// Rented buffer 20% longer that the input.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// Rented buffer 20% longer that the input.
// Rented buffer 20% longer than the input.

Where did the 20% number come from? Is that arbitrary / a guess as to how much it might expand? It'd be worth expanding the comment a bit to explain its genesis.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Long story short, I used the median length of words in English as a base. It was discussed somewhere in dotnet/corefx#41354 I guess.

ArrayPool<char>.Shared.Return(rentedBuffer);
}
// Rented buffer 20% longer that the input.
int initialBufferLength = (12 * name.Length) / 10;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not the simpler:

Suggested change
int initialBufferLength = (12 * name.Length) / 10;
int initialBufferLength = (int)(name.Length * 1.2);

?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would guess it was trying to avoid floats and use integral division only? I don't think it matters much.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's correct, my intention was about having no float math due to mathematical errors, but on small scales it shouldn't matter.

return;
}
char current = chars[i];
UnicodeCategory category = char.GetUnicodeCategory(current);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to do anything special for surrogates?

continue;
}
if (!lowercase)
current = char.ToUpperInvariant(current);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: Let's keep bracing consistent in the same method. Just above this there's a one-line body that uses braces, it'd be nice to do so here, too.

rentedBuffer = newBuffer;
result = rentedBuffer;
}
ReadOnlySpan<char> chars = name.AsSpan();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this local needed rather than all uses of chars below just using name?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like a leftover here since the proposed implementation operates on separate characters rather than on words as the original does which brings an issue because characters have one-to-one mapping, but that's false for strings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just as a reference for all of those who are reviewing the pull request, here are two implementations I made before migrating policies to .NET:

The last one was the most correct implementation, but I had to remove Unicode text segmentation due to my plans on merging that work into .NET. It required a bit of side code because there are no text segmentation support (discussed with Levi Broderick on Twitter, he thought that won't find use case) and due to having worse performance than without it while producing same outputs for most important cases.

Copy link
Member

@jeffhandley jeffhandley left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do like this better than the previous PR, especially when considering the cases cited in StringUtils.ToSnakeCase fails for many common scenarios · Issue #1956 · JamesNK/Newtonsoft.Json

Assert.AreEqual("abc1", StringUtils.ToSnakeCase("ABC1")); // FAILS, produces ab_c1

Yuck 😆

I would prefer not to backport the Rune code though, and I'd rather we consistently not handle surrogate pairs than handle them in some naming policies but not others.

@eiriktsarpalis
Copy link
Member Author

We've decided to close this approach in favor of #90316.

@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Validate that the new snake/kebab case naming policies match the Json.NET implementation.
4 participants