-
Notifications
You must be signed in to change notification settings - Fork 4.9k
Added snake case naming policy to the JSON serializer #41354
Conversation
public static class SnakeCaseUnitTests | ||
{ | ||
[Fact] | ||
public static void ToSnakeCaseTest() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on discussion from the API review, we should add tests for non-ascii text and for text with surrogate pairs (including invalid surrogate pairs) and also validate (at least locally - but I would be ok with checking it in) that the output matches Newtonsoft.Json.
I was working on some testing for CamelCase, so you could try and leverage that as a starting point. Note that for non-ascii characters, we wouldn't match Newtonsoft.Json exactly due to the default escaping policy in S.T.Json (feel free to either override that or loosen the assert condition for the tests).
[Theory]
[InlineData("lowerCase")]
[InlineData("UpperCase")]
[InlineData("UPperCase")]
[InlineData("lower Case")]
[InlineData("loweR Case")]
[InlineData("Upper Case")]
[InlineData("UppeR Case")]
[InlineData("lower case")]
[InlineData("loweR case")]
[InlineData("Upper case")]
[InlineData("UppeR case")]
[InlineData("lower cASe")]
[InlineData("loweR cASe")]
[InlineData("Upper cASe")]
[InlineData("UppeR cASe")]
[InlineData("UPper Case")]
[InlineData("UPpeR Case")]
[InlineData("UPper case")]
[InlineData("UPpeR case")]
[InlineData("UPper cASe")]
[InlineData("UPpeR cASe")]
[InlineData("i")]
[InlineData("I")]
[InlineData("i ")]
[InlineData("I ")]
[InlineData("ii")]
[InlineData("II")]
[InlineData("iI")]
[InlineData("Ii")]
[InlineData("ii ")]
[InlineData("II ")]
[InlineData("iI ")]
[InlineData("Ii ")]
[InlineData("iii")]
[InlineData("III")]
[InlineData("iiI")]
[InlineData("iIi")]
[InlineData("Iii")]
[InlineData("iII")]
[InlineData("IIi")]
[InlineData("IiI")]
[InlineData("iii ")]
[InlineData("III ")]
[InlineData("iiI ")]
[InlineData("iIi ")]
[InlineData("Iii ")]
[InlineData("iII ")]
[InlineData("IIi ")]
[InlineData("IiI ")]
[InlineData("iPhone")]
[InlineData("IPhone")]
[InlineData("IPHone")]
[InlineData("IPH one")]
[InlineData("IPH One")]
[InlineData("IPHONe")]
[InlineData("IPHON e")]
[InlineData("IPHON E")]
[InlineData("IPHONE")]
public static void TestCasing(string baseString)
{
var testObject = new TestClassWithDictionary
{
Dictionary = new Dictionary<string, string>
{
{ baseString, "hi" }
}
};
string json = JsonSerializer.Serialize<TestClassWithDictionary>(testObject, new JsonSerializerOptions { DictionaryKeyPolicy = JsonNamingPolicy.CamelCase });
// Assert JSON output is correct
testObject = JsonSerializer.Deserialize<TestClassWithDictionary>(json);
json = JsonSerializer.Serialize(testObject);
// Assert JSON output is correct after round-tripping
DefaultContractResolver contractResolver = new DefaultContractResolver
{
NamingStrategy = new CamelCaseNamingStrategy()
};
string expected = JsonConvert.SerializeObject(testObject, new JsonSerializerSettings { ContractResolver = contractResolver });
// Assert and compare against Newtonsoft.Json
char[] baseStringArray = baseString.ToCharArray();
char[] testCharArray = new char[baseStringArray.Length + 2];
for (int i = 0; i <= baseStringArray.Length; i++)
{
for (int j = 0; j < i; j++)
{
testCharArray[j] = baseStringArray[j];
}
testCharArray[i] = '\uD835';
testCharArray[i + 1] = '\uDD80';
//testCharArray[i] = '\uD801';
//testCharArray[i + 1] = '\uDC27';
for (int j = i; j < baseStringArray.Length; j++)
{
testCharArray[j + 2] = baseStringArray[j];
}
string testString = new string(testCharArray);
testObject = new TestClassWithDictionary
{
Dictionary = new Dictionary<string, string>
{
{ testString, "hi" }
}
};
json = JsonSerializer.Serialize<TestClassWithDictionary>(testObject, new JsonSerializerOptions { DictionaryKeyPolicy = JsonNamingPolicy.CamelCase });
// Assert JSON output is correct
testObject = JsonSerializer.Deserialize<TestClassWithDictionary>(json);
json = JsonSerializer.Serialize(testObject);
// Assert JSON output is correct after round-tripping
expected = JsonConvert.SerializeObject(testObject, new JsonSerializerSettings { ContractResolver = contractResolver });
// Assert and compare against Newtonsoft.Json
}
baseStringArray = baseString.ToCharArray();
testCharArray = new char[baseStringArray.Length + 1];
for (int i = 0; i <= baseStringArray.Length; i++)
{
for (int j = 0; j < i; j++)
{
testCharArray[j] = baseStringArray[j];
}
testCharArray[i] = '\uD835';
for (int j = i; j < baseStringArray.Length; j++)
{
testCharArray[j + 1] = baseStringArray[j];
}
string testString = new string(testCharArray);
testObject = new TestClassWithDictionary
{
Dictionary = new Dictionary<string, string>
{
{ testString, "hi" }
}
};
json = JsonSerializer.Serialize<TestClassWithDictionary>(testObject, new JsonSerializerOptions { DictionaryKeyPolicy = JsonNamingPolicy.CamelCase });
// Assert JSON output is correct
testObject = JsonSerializer.Deserialize<TestClassWithDictionary>(json);
json = JsonSerializer.Serialize(testObject);
// Assert JSON output is correct after round-tripping
expected = JsonConvert.SerializeObject(testObject, new JsonSerializerSettings { ContractResolver = contractResolver });
// Assert and compare against Newtonsoft.Json
}
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newtonsoft.Json had some bad translations (Xml2Json
to xml2_json
) and the bug I mentioned. Should the behavior be the same or should it be improved a bit? Maybe open an issue in Newtonsoft.Json?
/cc @JamesNK
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Newtonsoft.Json had some bad translations (Xml2Json to xml2_json) and the bug I mentioned.
It's fine for those cases for us to deviate, but for other test cases making sure our output matches would be good (or better yet, codify the expected output from Newtonsoft.Json into test cases and verify our output matches against those strings).
Should the behavior be the same or should it be improved a bit?
In cases where the Newtonsoft behavior is buggy/undesirable, we should certainly try to make it better here (possibly in both libraries).
Maybe open an issue in Newtonsoft.Json?
I'd leave the Newtonsoft.Json side of things to @JamesNK, but filing an issue wouldn't hurt :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already tracked by JamesNK/Newtonsoft.Json#1956.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Based on discussion from the API review, we should add tests for non-ascii text and for text with surrogate pairs (including invalid surrogate pairs) and also validate (at least locally - but I would be ok with checking it in) that the output matches Newtonsoft.Json.
Can you add some more tests for non-ascii text?
src/System.Text.Json/src/System/Text/Json/Serialization/JsonNamingPolicy.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSnakeCaseNamingPolicy.cs
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSnakeCaseNamingPolicy.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSnakeCaseNamingPolicy.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSnakeCaseNamingPolicy.cs
Outdated
Show resolved
Hide resolved
src/System.Text.Json/src/System/Text/Json/Serialization/JsonNamingPolicy.cs
Show resolved
Hide resolved
Added two new tests to cover compatibility with Newtonsoft.Json. Everything works well except a some cases due to the bug in Newtonsoft.Json there is some difference in produced output.
For me it looks like the current behavior is better, so makes sense to remove failing tests. Are you okay with this, @ahsonkhan? |
I agree. @JamesNK - what do you think about those cases? I think having a single |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
@ahsonkhan There are infrastructure related failures again... |
/azp run |
Azure Pipelines successfully started running 4 pipeline(s). |
@ahsonkhan All checks passed. |
if (string.IsNullOrEmpty(name)) | ||
return name; | ||
|
||
StringBuilder builder = new StringBuilder(name.Length + Math.Min(2, name.Length / 5)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would this benefit from ValueStringBuilder
with a stackalloc
buffer (alocated with a size cap)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, but in that case ValueStringBuilder
should be referenced by the project since the struct isn't public yet (see #28379). If it's okay, I will do it (:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since ValueStringBuilder
is internal (within corelib), I wouldn't take a dependency on it here. I don't think it's worth it in this context, particularly because creating the converted name only happens once and then the result is stored.
Though I believe it is feasible to do, if we really wanted to. For example:
ValueStringBuilder sb = buffer != null ? |
The main benefit would be on startup, but we might be able to write this method without using ValueStringBuilder
and still be faster. I am heads down on some other work currently, and will review this change soon. Thanks for your patience, @YohDeadfall, appreciate it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it's worth it in this context, particularly because creating the converted name only happens once and then the result is stored
@ahsonkhan, are there circumstances where it might happen for every occurrence? e.g. when happens when serializing a dictionary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I will take a second look and verify, but from what I understand, calls to ConvertName
should only ever happen once. @steveharter, @layomia - feel free to chime in about the behavior of serializing dictionary keys/values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from what I understand, calls to ConvertName should only ever happen once
That doesn't seem to be the case, e.g.
Line 154 in 1338e4e
key = options.DictionaryKeyPolicy.ConvertName(key); |
but maybe I'm just missing where that gets cached. I wouldn't expect it to be cached, though, considering arbitrary dictionaries might get serialized, and their keys could be all over the map.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point @stephentoub. I was looking at PropertyNamingPolicy
only, but JsonNamingPolicy
applies to DictionaryKeyPolicy
as well.
Once the implementation of JsonSnakeCaseNamingPolicy
is well tested, we should consider trying to optimize the built-in naming policies. This can be done in a follow-up PR (no point trying to optimize it without a larger test set that ensures correctness). Filed an issue: https://github.com/dotnet/corefx/issues/42581
BenchmarkDotNet=v0.11.5.1159-nightly, OS=Windows 10.0.18362
Intel Core i7-6700 CPU 3.40GHz (Skylake), 1 CPU, 8 logical and 4 physical cores
.NET Core SDK=3.1.100-preview1-014459
[Host] : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
Job-HPZZRZ : .NET Core 3.0.0 (CoreCLR 4.700.19.46205, CoreFX 4.700.19.46214), X64 RyuJIT
PowerPlanMode=00000000-0000-0000-0000-000000000000 IterationTime=250.0000 ms MaxIterationCount=10
MinIterationCount=5 WarmupCount=1
Method | Mean | Error | StdDev | Median | Min | Max | Ratio | RatioSD | Gen 0 | Gen 1 | Gen 2 | Allocated |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Default | 2.426 us | 0.0473 us | 0.0281 us | 2.425 us | 2.391 us | 2.473 us | 1.00 | 0.00 | 0.1567 | - | - | 656 B |
Camel | 3.040 us | 0.0895 us | 0.0592 us | 3.027 us | 2.979 us | 3.150 us | 1.26 | 0.04 | 0.2491 | - | - | 1048 B |
Snake | 4.293 us | 0.0939 us | 0.0621 us | 4.280 us | 4.195 us | 4.397 us | 1.77 | 0.01 | 0.5624 | - | - | 2432 B |
public class SerializeDictionaryWithNamingPolicy
{
JsonSerializerOptions _camelOptions;
JsonSerializerOptions _snakeOptions;
Dictionary<string, string> _foo;
[GlobalSetup]
public void Setup()
{
_camelOptions = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.CamelCase, DictionaryKeyPolicy = JsonNamingPolicy.CamelCase };
_snakeOptions = new JsonSerializerOptions { PropertyNamingPolicy = new JsonSnakeCaseNamingPolicy(), DictionaryKeyPolicy = new JsonSnakeCaseNamingPolicy() };
_foo = new Dictionary<string, string>()
{
["ID"] = "A",
["url"] = "A",
["URL"] = "A",
["THIS IS SPARTA"] = "A",
["IsCIA"] = "A",
["iPhone"] = "A",
["IPhone"] = "A",
["xml2json"] = "A",
["already_snake_case"] = "A",
["IsJSONProperty"] = "A",
["ABCDEFGHIJKLMNOP"] = "A",
["Hi!! This is text.Time to test."] = "A",
};
}
[BenchmarkCategory(Categories.CoreFX, Categories.JSON)]
[Benchmark(Baseline = true)]
public string Default()
{
return JsonSerializer.Serialize(_foo);
}
[BenchmarkCategory(Categories.CoreFX, Categories.JSON)]
[Benchmark]
public string Camel()
{
return JsonSerializer.Serialize(_foo, _camelOptions);
}
[BenchmarkCategory(Categories.CoreFX, Categories.JSON)]
[Benchmark]
public string Snake()
{
return JsonSerializer.Serialize(_foo, _snakeOptions);
}
}
Thank you for your contribution. As announced in dotnet/coreclr#27549 this repository will be moving to dotnet/runtime on November 13. If you would like to continue working on this PR after this date, the easiest way to move the change to dotnet/runtime is:
|
src/System.Text.Json/src/System/Text/Json/Serialization/JsonSnakeCaseNamingPolicy.cs
Outdated
Show resolved
Hide resolved
[InlineData("u_ppe_r_case", "UPpeR Case")] | ||
[InlineData("u_ppe_r_c_a_se", "UPpeR cASe")] | ||
// | ||
[InlineData("ä", "ä")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Non-ascii tests like these should be written using escaped/hex characters.
When including non-ASCII characters in the source code use Unicode escape sequences (\uXXXX) instead of literal characters. Literal non-ASCII characters occasionally get garbled by a tool or editor.
See:
[InlineData("{\r\n\"is\\r\\nAct\u6F22\u5B57ive\": false \"in\u6F22\u5B57valid\"\r\n}", 30, 30, 1, 28)] |
So, for this case (same for the special "42" below):
[InlineData("ä", "ä")] | |
[InlineData("\u00E4", "\u00E4")] |
} | ||
break; | ||
|
||
case UnicodeCategory.Surrogate: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Consider adding tests for invalid surrogate characters too (which is possible by doing a substring in between a surrogate pair of characters or crafting one using chars).
So:
- a string containing a high-surrogate without a low-surrogate following it
- a string containing a low-surrogate first before a high-surrogate
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Already have one of them. It's 42.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The special 42
contains valid surrogate pairs (https://www.fileformat.info/info/unicode/char/1d7dc/index.htm).
UTF-16 (hex) | 0xD835 0xDFDC (d835dfdc) |
---|
I am talking about a string that is invalid (i.e. doesn't contain the correct pair of surrogate characters).
For example:
corefx/src/System.Text.Json/tests/JsonDocumentTests.cs
Lines 3545 to 3546 in 5cee7c9
[InlineData("\"hello\"", new char[1] { (char)0xDC01 })] // low surrogate - invalid | |
[InlineData("\"hello\"", new char[1] { (char)0xD801 })] // high surrogate - missing pair |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missed word "invalid", sorry. Will do (:
Agreed with you, but should we follow the rule that adds an underscore after numbers? If so |
I'd say either |
@ygoe shown an example which doesn't look well without underscores ( |
I think @ahsonkhan's observations from the two linked implementations in Python and Node makes sense:
|
The Here is a map of characters which should appear in a resulting string using the package:
Included into an output
Excluded from an output
QuestionsShould surrogates be included?Pros: Compatibility with the existing implementations. Lowers collision probability Should listed above characters be excluded or included?Some of them ansuvara (randomly checked) so there is no reason to exclude this symbols since they behave like variations, but phonetic. To make the final decision each character should be checked. Should all excluded letters be replaced by an underscore?Pros: Lowers collision probability. Should leading/trailing replacing characters be ignored?Pros: Compatibility with the existing implementations. Should multiple underscores be collapsed?Pros: Compatibility with the existing implementations only when letters get replaced. Underscores in the original string should be preserved. Should an underscore be added after a number?Pros: Compatibility with the existing implementations. |
@ahsonkhan To make things simple I researched the npm package and added the table you can see above. Let's analyze it and answer the questions first so we won't spent time on adding test and changing the behavior multiple times. After that I will write a final and optimized version of the policy. |
Question: How does all of this compare to the existing behavior of camelCase naming? It would certainly be interesting to use the same word-splitting behavior for all naming conventions.
|
This is how the npm works, but the camel case policy (from the new JSON serializer) just lowers the first characters and does nothing more due to performance reasons. So an user controls all characters except the first one if it's a letter. Otherwise, the user controls all of them. The snake case policy isn't limited to the first, so it's a problem. |
This is what camel-snake-kebab in Clojure does as well. It's very consistent.
How is this performance critical? Is the result of the transformation not cached? Anyway, I guess the ship has sailed for camelCase as it would be a major breaking change to switch behavior now, but for this and (the inevitable ask for) future conventions, it would be nice if they used the same strategy. I guess this could be refactored out if it becomes relevant later. |
|
The serializer caches only property and class names, but not dictionary keys. Having a pool would help only if it supports LRU strategy. |
This is major problem, but as I know the serializer isn't very popular due to limitations, a lot of them. So if a breakage would be introduced, it won't affect to many people. Taking into account that any policy is mostly used for members and classes, the change won't hurt so much, but will give a positive impact. |
Thank you for your contribution. As announced in #27549 the dotnet/runtime repository will be used going forward for changes to this code base. Closing this PR as no more changes will be accepted into master for this repository. If you’d like to continue working on this change please move it to dotnet/runtime. |
actual.SomeIntProperty); | ||
} | ||
|
||
private class NamingPolictyTestClass |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
typo NamingPolictyTestClass
-> NamingPolicyTestClass
Assert.Equal( | ||
expected.SomeIntProperty, | ||
actual.SomeIntProperty); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is short enough to be one line, no?
[Fact] | ||
public static void SerializeDictionary_RoundTipping_MatchesOriginal() | ||
{ | ||
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.SnakeCase }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should be DictionaryKeyPolicy = JsonNamingPolicy.SnakeCase
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.SnakeCase }; | ||
|
||
string expected = @"{""some_int_property"":42}"; | ||
string actual = JsonSerializer.Serialize(JsonSerializer.Deserialize<Dictionary<string, int>>(expected, options), options); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
DictionaryKeyPolicy
is only applied on serialization. The way to test this is to have a dictionary with keys that are some other casing like Pascal case, and make sure the serialized content is snake case.
Fixes #39564. The code is written some time ago for our ADO.NET driver for PostgreSQL where snake case is the common and default naming style.