Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Added snake case naming policy to the JSON serializer #41354

Closed
wants to merge 5 commits into from
Closed
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/System.Text.Json/ref/System.Text.Json.cs
Original file line number Diff line number Diff line change
Expand Up @@ -227,6 +227,7 @@ public abstract partial class JsonNamingPolicy
{
protected JsonNamingPolicy() { }
public static System.Text.Json.JsonNamingPolicy CamelCase { get { throw null; } }
public static System.Text.Json.JsonNamingPolicy SnakeCase { get { throw null; } }
public abstract string ConvertName(string name);
YohDeadfall marked this conversation as resolved.
Show resolved Hide resolved
}
public abstract partial class JsonNode
Expand Down
1 change: 1 addition & 0 deletions src/System.Text.Json/src/System.Text.Json.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -121,6 +121,7 @@
<Compile Include="System\Text\Json\Serialization\JsonSerializer.Write.Utf8JsonWriter.cs" />
<Compile Include="System\Text\Json\Serialization\JsonSerializerOptions.cs" />
<Compile Include="System\Text\Json\Serialization\JsonSerializerOptions.Converters.cs" />
<Compile Include="System\Text\Json\Serialization\JsonSnakeCaseNamingPolicy.cs" />
<Compile Include="System\Text\Json\Serialization\JsonStringEnumConverter.cs" />
<Compile Include="System\Text\Json\Serialization\MemberAccessor.cs" />
<Compile Include="System\Text\Json\Serialization\PooledByteBufferWriter.cs" />
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -15,10 +15,15 @@ public abstract class JsonNamingPolicy
protected JsonNamingPolicy() { }

/// <summary>
/// Returns the naming policy for camel-casing.
/// Gets the naming policy for camel-casing.
/// </summary>
public static JsonNamingPolicy CamelCase { get; } = new JsonCamelCaseNamingPolicy();

/// <summary>
/// Gets the naming policy for snake-casing.
/// </summary>
public static JsonNamingPolicy SnakeCase { get; } = new JsonSnakeCaseNamingPolicy();

internal static JsonNamingPolicy Default { get; } = new JsonDefaultNamingPolicy();
YohDeadfall marked this conversation as resolved.
Show resolved Hide resolved

/// <summary>
Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Globalization;

namespace System.Text.Json
{
internal sealed class JsonSnakeCaseNamingPolicy : JsonNamingPolicy
{
public override string ConvertName(string name)
YohDeadfall marked this conversation as resolved.
Show resolved Hide resolved
{
if (string.IsNullOrEmpty(name))
return name;

// Allocates a string builder with the guessed result length,
// where 5 is the average word length in English, and
// max(2, length / 5) is the number of underscores.
StringBuilder builder = new StringBuilder(name.Length + Math.Max(2, name.Length / 5));
UnicodeCategory? previousCategory = null;

for (int currentIndex = 0; currentIndex < name.Length; currentIndex++)
{
char currentChar = name[currentIndex];
if (currentChar == '_')
{
builder.Append('_');
previousCategory = null;
continue;
}

UnicodeCategory currentCategory = char.GetUnicodeCategory(currentChar);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ahsonkhan Out of curiosity, when method already contains the return type name is it okay to use var on the left side?

Copy link
Member

@ahsonkhan ahsonkhan Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think so (but it is a bit of a grey zone). Use var when the RHS has new or otherwise it is clear what the type is from reading that line (without having prior knowledge of what the calling API is doing - such as an explicit cast).

We only use var when it's obvious what the variable type is

I can see the argument that GetX methods return X and hence it is clear, so this comes down to what is considered "obvious" (and probably personal preference). What if GetX was returning an interface/abstract type (like IX)?


switch (currentCategory)
{
case UnicodeCategory.UppercaseLetter:
case UnicodeCategory.TitlecaseLetter:
Copy link
Member

@ahsonkhan ahsonkhan Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see why we need to special case TitlecaseLetter. Can we ignore them?

https://www.fileformat.info/info/unicode/category/Lt/list.htm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean that they cannot be lowercased? I haven't worked with Unicode a lot, so it's out of my knowledge.

Copy link
Member

@ahsonkhan ahsonkhan Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am also not super familiar with all the categories. I am asking if you had a test case or scenario that motivated including TitlecaseLetter in the implementation to begin with. It looks like they do have lower-case counterparts. Again, adding tests where we exercise this UnicodeCategory more would be useful. I believe, if we remove this switch case, all the existing tests would still pass, which means there is a test gap.

@GrabYourPitchforks - do you have some thoughts on the unicode categories we are special casing here for snake casing?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am primarily asking for adding more tests so that we can more confidently optimize the implementation later. I want to avoid removing some code path in the future to improve perf that would end up breaking functionality.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Checked. They all can be lowered.

if (previousCategory == UnicodeCategory.SpaceSeparator ||
Copy link
Member

@ahsonkhan ahsonkhan Nov 13, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Put brackets around the && conditions to make the ordering of operations clear.

previousCategory == UnicodeCategory.LowercaseLetter ||
previousCategory != UnicodeCategory.DecimalDigitNumber &&
currentIndex > 0 &&
currentIndex + 1 < name.Length &&
char.IsLower(name[currentIndex + 1]))
{
builder.Append('_');
}

currentChar = char.ToLower(currentChar);
break;

case UnicodeCategory.LowercaseLetter:
case UnicodeCategory.DecimalDigitNumber:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the rational for special casing DecimalDigitNumber? Can you share/add a test case?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This category contains a bunch of characters. Is 0-9 sufficient?
https://www.fileformat.info/info/unicode/category/Nd/list.htm

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tests already exists for numbers in names. One of them you have used in benchmarks (:

Since the policy should put underscores and not remove any characters and numbers including non-ASCII, this category should be used.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah yes, I see the test case in the bug npgsql/npgsql#2152.

ABC123 -> abc123 clearly indicates the need for using 0-9 (ab_c123 is clearly wrong). I see the numbers tests now :)

To be explicit, having non-ASCII digit tests would be good too (i.e. pick others from the Nd list).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's about non-decimal numbers, math symbols and other characters? Maybe invert the logic and write everything except punctuation and spaces?

/cc @GrabYourPitchforks

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

everything except punctuation and spaces

That makes sense to me.

Not sure how accurate this is, but it is a data point (from https://capitalizemytitle.com/camel-case/):

Basic Snake Case Capitalization Rules
All letters are lowercase.
All spaces between words are filled with underscores.
Remove all punctuation.

if (previousCategory == UnicodeCategory.SpaceSeparator)
{
builder.Append('_');
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Add braces around the if since this is in the nested scope.

From https://github.com/dotnet/corefx/blob/master/Documentation/coding-guidelines/coding-style.md#c-coding-style

We use Allman style braces, where each brace begins on a new line. A single line statement block can go without braces but the block must be properly indented on its own line and must not be nested in other statement blocks that use braces (See rule 17 for more details). One exception is that a using statement is permitted to be nested within another using statement by starting on the following line at the same indentation level, even if the nested using contains a controlled block.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I see.

}
break;

case UnicodeCategory.Surrogate:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding tests for invalid surrogate characters too (which is possible by doing a substring in between a surrogate pair of characters or crafting one using chars).
So:

  • a string containing a high-surrogate without a low-surrogate following it
  • a string containing a low-surrogate first before a high-surrogate

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Already have one of them. It's 42.

Copy link
Member

@ahsonkhan ahsonkhan Nov 14, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The special 42 contains valid surrogate pairs (https://www.fileformat.info/info/unicode/char/1d7dc/index.htm).

UTF-16 (hex) 0xD835 0xDFDC (d835dfdc)

I am talking about a string that is invalid (i.e. doesn't contain the correct pair of surrogate characters).

For example:

[InlineData("\"hello\"", new char[1] { (char)0xDC01 })] // low surrogate - invalid
[InlineData("\"hello\"", new char[1] { (char)0xD801 })] // high surrogate - missing pair

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Missed word "invalid", sorry. Will do (:

break;

default:
if (previousCategory != null)
{
previousCategory = UnicodeCategory.SpaceSeparator;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here (braces).

}
continue;
}

builder.Append(currentChar);
previousCategory = currentCategory;
}

return builder.ToString();
}
}
}
157 changes: 157 additions & 0 deletions src/System.Text.Json/tests/Serialization/SnakeCaseUnitTests.cs
Original file line number Diff line number Diff line change
@@ -0,0 +1,157 @@
// Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.

using System.Collections.Generic;
using Xunit;

namespace System.Text.Json.Serialization.Tests
{
public static class SnakeCaseUnitTests
{
[Theory]
[InlineData(null, null)]
[InlineData("", "")]
//
[InlineData("i", "i")]
[InlineData("i", "I")]
//
[InlineData("ii", "ii")]
[InlineData("i_i", "iI")]
[InlineData("ii", "Ii")]
[InlineData("ii", "II")]
//
[InlineData("iii", "iii")]
[InlineData("ii_i", "iiI")]
[InlineData("i_ii", "iIi")]
[InlineData("i_ii", "iII")]
[InlineData("iii", "Iii")]
[InlineData("ii_i", "IiI")]
[InlineData("i_ii", "IIi")]
[InlineData("iii", "III")]
//
[InlineData("i_phone", "iPhone")]
[InlineData("i_phone", "IPhone")]
[InlineData("ip_hone", "IPHone")]
[InlineData("iph_one", "IPHOne")]
[InlineData("ipho_ne", "IPHONe")]
[InlineData("iphone", "IPHONE")]
//
[InlineData("id", "id")]
[InlineData("id", "ID")]
//
[InlineData("url", "url")]
[InlineData("url", "URL")]
[InlineData("url_value", "url_value")]
[InlineData("url_value", "URLValue")]
//
[InlineData("xml2json", "xml2json")]
[InlineData("xml2json", "Xml2Json")]
//
[InlineData("already_snake_case", "already_snake_case")]
[InlineData("_already_snake_case", "_already_snake_case")]
[InlineData("__already_snake_case", "__already_snake_case")]
[InlineData("already_snake_case_", "already_snake_case_")]
[InlineData("already_snake_case__", "already_snake_case__")]
//
[InlineData("sn_a__k_ec_as_e", "sn_a__k_ec_as_e")]
[InlineData("sn_ak_ec_as_e", "sn_ak_ec_as_e")]
[InlineData("sn_a__k_ec_as_e", "SnA__ kEcAsE")]
[InlineData("sn_a__k_ec_as_e", "SnA__kEcAsE")]
[InlineData("sn_ak_ec_as_e", "SnAkEcAsE")]
//
[InlineData("spaces", "spaces ")]
[InlineData("spaces", "spaces ")]
[InlineData("spaces", "spaces ")]
[InlineData("spaces", " spaces")]
[InlineData("spaces", " spaces")]
[InlineData("spaces", " spaces")]
//
[InlineData("9999_12_31t23_59_59_9999999z", "9999-12-31T23:59:59.9999999Z")]
[InlineData("hi_this_is_text_time_to_test", "Hi!! This is text. Time to test.")]
//
[InlineData("is_cia", "IsCIA")]
[InlineData("is_json_property", "IsJSONProperty")]
//
[InlineData("lower_case", "lower case")]
[InlineData("lower_case", "lower Case")]
[InlineData("lower_case", "lowerCase")]
[InlineData("lower_c_a_se", "lower cASe")]
[InlineData("lowe_r_case", "loweR case")]
[InlineData("lowe_r_case", "loweR Case")]
[InlineData("lowe_r_c_a_se", "loweR cASe")]
[InlineData("upper_case", "Upper case")]
[InlineData("upper_case", "Upper Case")]
[InlineData("upper_case", "UPPER CASE")]
[InlineData("upper_case", "UpperCase")]
[InlineData("upper_c_a_se", "Upper cASe")]
[InlineData("uppe_r_case", "UppeR case")]
[InlineData("uppe_r_case", "UppeR Case")]
[InlineData("uppe_r_c_a_se", "UppeR cASe")]
[InlineData("u_pper_case", "UPper case")]
[InlineData("u_pper_case", "UPper Case")]
[InlineData("u_pper_case", "UPperCase")]
[InlineData("u_pper_c_a_se", "UPper cASe")]
[InlineData("u_ppe_r_case", "UPpeR case")]
[InlineData("u_ppe_r_case", "UPpeR Case")]
[InlineData("u_ppe_r_c_a_se", "UPpeR cASe")]
//
[InlineData("ä", "ä")]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Non-ascii tests like these should be written using escaped/hex characters.

When including non-ASCII characters in the source code use Unicode escape sequences (\uXXXX) instead of literal characters. Literal non-ASCII characters occasionally get garbled by a tool or editor.

See:

[InlineData("{\r\n\"is\\r\\nAct\u6F22\u5B57ive\": false \"in\u6F22\u5B57valid\"\r\n}", 30, 30, 1, 28)]

So, for this case (same for the special "42" below):

Suggested change
[InlineData("ä", "ä")]
[InlineData("\u00E4", "\u00E4")]

[InlineData("𝟜𝟚", "𝟜𝟚")]
public static void Convert_SpecifiedName_MatchesExpected(string expected, string name) =>
Assert.Equal(expected, JsonNamingPolicy.SnakeCase.ConvertName(name));

[Fact]
public static void SerializeType_RoundTipping_MatchesOriginal()
{
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.SnakeCase };

string expected = @"{""some_int_property"":42}";
string actual = JsonSerializer.Serialize(JsonSerializer.Deserialize<NamingPolictyTestClass>(expected, options), options);

Assert.Equal(expected, actual);
}
YohDeadfall marked this conversation as resolved.
Show resolved Hide resolved

[Fact]
public static void DeserializeType_RoundTipping_MatchesOriginal()
{
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.SnakeCase };

var expected = new NamingPolictyTestClass { SomeIntProperty = 42 };
var actual = JsonSerializer.Deserialize<NamingPolictyTestClass>(JsonSerializer.Serialize(expected, options), options);

Assert.Equal(
expected.SomeIntProperty,
actual.SomeIntProperty);
Comment on lines +126 to +128
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is short enough to be one line, no?

}

private class NamingPolictyTestClass
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo NamingPolictyTestClass -> NamingPolicyTestClass

{
public int SomeIntProperty { get; set; }
}

[Fact]
public static void SerializeDictionary_RoundTipping_MatchesOriginal()
{
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.SnakeCase };
YohDeadfall marked this conversation as resolved.
Show resolved Hide resolved
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be DictionaryKeyPolicy = JsonNamingPolicy.SnakeCase


string expected = @"{""some_int_property"":42}";
string actual = JsonSerializer.Serialize(JsonSerializer.Deserialize<Dictionary<string, int>>(expected, options), options);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DictionaryKeyPolicy is only applied on serialization. The way to test this is to have a dictionary with keys that are some other casing like Pascal case, and make sure the serialized content is snake case.


Assert.Equal(expected, actual);
}

[Fact]
public static void DeserializeDictionary_RoundTipping_MatchesOriginal()
{
var options = new JsonSerializerOptions { PropertyNamingPolicy = JsonNamingPolicy.SnakeCase };

var expected = new Dictionary<string, int> { ["SomeIntProperty"] = 42 };
var actual = JsonSerializer.Deserialize<Dictionary<string, int>>(JsonSerializer.Serialize(expected, options), options);

Assert.Equal(
expected["SomeIntProperty"],
actual["SomeIntProperty"]);
}
}
}
1 change: 1 addition & 0 deletions src/System.Text.Json/tests/System.Text.Json.Tests.csproj
Original file line number Diff line number Diff line change
Expand Up @@ -76,6 +76,7 @@
<Compile Include="Serialization\ReadValueTests.cs" />
<Compile Include="Serialization\SampleTestData.OrderPayload.cs" />
<Compile Include="Serialization\SpanTests.cs" />
<Compile Include="Serialization\SnakeCaseUnitTests.cs" />
<Compile Include="Serialization\Stream.ReadTests.cs" />
<Compile Include="Serialization\Stream.WriteTests.cs" />
<Compile Include="Serialization\TestClasses.cs" />
Expand Down