Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Json: Using JsonStringEnumConverter, after changing CurrentCulture to "sv-SE", enums with unknown negative values serialize strangely, and cannot be deserialized. #68600

Closed
Tracked by #63918
dbc2 opened this issue Apr 27, 2022 · 7 comments · Fixed by #68663
Assignees
Milestone

Comments

@dbc2
Copy link

dbc2 commented Apr 27, 2022

Description

JSON serialization should not depend on the current locale in any way, but enum serialization is affected by the current setting for CurrentInfo.NumberFormatInfo.NegativeSign such that that enums with negative unknown values cannot be round-tripped. This seems wrong.

More specifically, I serialized an enum to JSON using JsonStringEnumConverter with allowIntegerValues : true with the current locale being the invariant locale. Afterwards, I changed CultureInfo.CurrentCulture to Swedish, which happens to use the character U+2212 as its negative sign. I then serialized a unknown negative value of the same enum type to JSON, and it was serialized strangely, as follows:

"\u22122"

When I attempted to deserialize the value, an exception was thrown, making it impossible to round-trip the enum value in the Swedish locale:

System.Text.Json.JsonException: The JSON value could not be converted to TestClass+TestEnum. Path: $ | LineNumber: 0 | BytePositionInLine: 9.
   at System.Text.Json.ThrowHelper.ThrowJsonException(String message)
   at System.Text.Json.Serialization.Converters.EnumConverter`1.ReadAsPropertyNameCore(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.Converters.EnumConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo jsonTypeInfo, Nullable`1 actualByteCount)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 json, JsonTypeInfo jsonTypeInfo)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](String json, JsonSerializerOptions options)

Demo fiddle here: https://dotnetfiddle.net/Lztx97

Reproduction Steps

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.Threading;

using System.Text.Json;
using System.Text.Json.Serialization;

using NUnit.Framework; // Throws NUnit.Framework.AssertionException

class TestClass
{
	[Flags]
	public enum TestEnum
	{
		One = (1 << 0),
		Two = (1 << 1),
	};

	public static void Test()
	{
		Test(unchecked((TestEnum)(1 << 4))); // Works, round-trips as 16
		Test(unchecked((TestEnum)(-2))); // Throws an exception
	}

	public static void Test(TestEnum value)
	{
		// Sets the minus sign to -
		CultureInfo.CurrentCulture = CultureInfo.CurrentUICulture = Thread.CurrentThread.CurrentCulture = Thread.CurrentThread.CurrentUICulture = 
			CultureInfo.InvariantCulture;

		var options = new JsonSerializerOptions
		{
			Converters = { new JsonStringEnumConverter(allowIntegerValues : true) },
		};

		var json1 = JsonSerializer.Serialize(value, options); 

		Console.WriteLine(json1); // Prints -2

		var value1 = JsonSerializer.Deserialize<TestEnum>(json1, options);

		Assert.AreEqual(value, value1); 

		// Sets the minus sign to U+2212
		CultureInfo.CurrentCulture = CultureInfo.CurrentUICulture = Thread.CurrentThread.CurrentCulture = Thread.CurrentThread.CurrentUICulture = 
			new CultureInfo("sv-SE");

		var json2 = JsonSerializer.Serialize(value, options);

		Console.WriteLine(json2); // Prints "\u22122"

		var value2 = JsonSerializer.Deserialize<TestEnum>(json2, options); // Fails & throws a JsonException 

		Assert.AreEqual(value, value2);
	}
}

public class Program
{
	public static void Main()
	{
		Console.WriteLine("Environment version: {0} ({1})", System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription, Environment.Version);
		Console.WriteLine("System.Text.Json version: " + typeof(JsonSerializer).Assembly.FullName);
		Console.WriteLine();

		try
		{
			TestClass.Test();
		}
		catch (Exception ex)
		{
			Console.WriteLine("Failed with unhandled exception: ");
			Console.WriteLine(ex);
			throw;
		}
	}
}

Expected behavior

JSON serialization of enums should be completely independent of locale.

Actual behavior

JSON serialization of negative integer-valued enums breaks if NumberFormatInfo.CurrentInfo.NegativeSign is changed.

Regression?

No response

Known Workarounds

No response

Configuration

Environment version: .NET 6.0.0-rtm.21522.10 (6.0.0)
System.Text.Json version: System.Text.Json, Version=6.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51

Other information

The problem seems to be here, in EnumConverter.cs:

        // Odd type codes are conveniently signed types (for enum backing types).
        private static readonly string? s_negativeSign = ((int)s_enumTypeCode % 2) == 0 ? null : NumberFormatInfo.CurrentInfo.NegativeSign;

The converter caches the current negative sign on startup, but if that changes the value will become stale.

@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Apr 27, 2022
@ghost
Copy link

ghost commented Apr 27, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

JSON serialization should not depend on the current locale in any way, but enum serialization is affected by the current setting for CurrentInfo.NumberFormatInfo.NegativeSign such that that enums with negative unknown values cannot be round-tripped. This seems wrong.

More specifically, I serialized an enum to JSON using JsonStringEnumConverter with allowIntegerValues : true with the current locale being the invariant locale. Afterwards, I changed CultureInfo.CurrentCulture to Swedish, which happens to use the character U+2212 as its negative sign. I then serialized a unknown negative value of the same enum type to JSON, and it was serialized strangely, as follows:

"\u22122"

When I attempted to deserialize the value, an exception was thrown, making it impossible to round-trip the enum value in the Swedish locale:

System.Text.Json.JsonException: The JSON value could not be converted to TestClass+TestEnum. Path: $ | LineNumber: 0 | BytePositionInLine: 9.
   at System.Text.Json.ThrowHelper.ThrowJsonException(String message)
   at System.Text.Json.Serialization.Converters.EnumConverter`1.ReadAsPropertyNameCore(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.Converters.EnumConverter`1.Read(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo jsonTypeInfo, Nullable`1 actualByteCount)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 json, JsonTypeInfo jsonTypeInfo)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](String json, JsonSerializerOptions options)

Demo fiddle here: https://dotnetfiddle.net/Lztx97

Reproduction Steps

using System;
using System.Collections;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Globalization;
using System.Threading;

using System.Text.Json;
using System.Text.Json.Serialization;

using NUnit.Framework; // Throws NUnit.Framework.AssertionException

class TestClass
{
	[Flags]
	public enum TestEnum
	{
		One = (1 << 0),
		Two = (1 << 1),
	};

	public static void Test()
	{
		Test(unchecked((TestEnum)(1 << 4))); // Works, round-trips as 16
		Test(unchecked((TestEnum)(-2))); // Throws an exception
	}

	public static void Test(TestEnum value)
	{
		// Sets the minus sign to -
		CultureInfo.CurrentCulture = CultureInfo.CurrentUICulture = Thread.CurrentThread.CurrentCulture = Thread.CurrentThread.CurrentUICulture = 
			CultureInfo.InvariantCulture;

		var options = new JsonSerializerOptions
		{
			Converters = { new JsonStringEnumConverter(allowIntegerValues : true) },
		};

		var json1 = JsonSerializer.Serialize(value, options); 

		Console.WriteLine(json1); // Prints -2

		var value1 = JsonSerializer.Deserialize<TestEnum>(json1, options);

		Assert.AreEqual(value, value1); 

		// Sets the minus sign to U+2212
		CultureInfo.CurrentCulture = CultureInfo.CurrentUICulture = Thread.CurrentThread.CurrentCulture = Thread.CurrentThread.CurrentUICulture = 
			new CultureInfo("sv-SE");

		var json2 = JsonSerializer.Serialize(value, options);

		Console.WriteLine(json2); // Prints "\u22122"

		var value2 = JsonSerializer.Deserialize<TestEnum>(json2, options); // Fails & throws a JsonException 

		Assert.AreEqual(value, value2);
	}
}

public class Program
{
	public static void Main()
	{
		Console.WriteLine("Environment version: {0} ({1})", System.Runtime.InteropServices.RuntimeInformation.FrameworkDescription, Environment.Version);
		Console.WriteLine("System.Text.Json version: " + typeof(JsonSerializer).Assembly.FullName);
		Console.WriteLine();

		try
		{
			TestClass.Test();
		}
		catch (Exception ex)
		{
			Console.WriteLine("Failed with unhandled exception: ");
			Console.WriteLine(ex);
			throw;
		}
	}
}

Expected behavior

JSON serialization of enums should be completely independent of locale.

Actual behavior

JSON serialization of negative integer-valued enums breaks if NumberFormatInfo.CurrentInfo.NegativeSign is changed.

Regression?

No response

Known Workarounds

No response

Configuration

Environment version: .NET 6.0.0-rtm.21522.10 (6.0.0)
System.Text.Json version: System.Text.Json, Version=6.0.0.0, Culture=neutral, PublicKeyToken=cc7b13ffcd2ddd51

Other information

The problem seems to be here, in EnumConverter.cs:

        // Odd type codes are conveniently signed types (for enum backing types).
        private static readonly string? s_negativeSign = ((int)s_enumTypeCode % 2) == 0 ? null : NumberFormatInfo.CurrentInfo.NegativeSign;

The converter caches the current negative sign on startup, but it that changes the value will become stale.

Author: dbc2
Assignees: -
Labels:

area-System.Text.Json, untriaged

Milestone: -

@eiriktsarpalis
Copy link
Member

If I'm understanding this correctly, would changing s_negativeSign to use CultureInfo.InvariantCulture.NumberFormat.NegativeSign instead address the problem?

@dbc2
Copy link
Author

dbc2 commented Apr 27, 2022

Unfortunately not. The problem is that, while Enum implements IConvertible, Enum.ToString(IFormatProvider) is marked as obsolete because

The provider argument is not used. Please use ToString().

I.e. Enum.ToString() and Enum.ToString(NumberFormatInfo.InvariantInfo) both return a value with a localized minus sign. You can confirm by checking the reference source, an IFormatProvider is never passed to the ToString() method of the underlying value. That means that code from EnumConverter like:

string original = value.ToString();

Is going to return with a localized minus sign. In fact there doesn't seem to be a method to generate a culture-invariant string representation of an enum, see https://dotnetfiddle.net/JNQv52 which tries all of the following and gets a localized minus sign for each:

	var test = unchecked((TestEnum)(-1));

	var toString = test.ToString();
	var toStringInvariant = test.ToString(NumberFormatInfo.InvariantInfo); // [System.Obsolete("The provider argument is not used. Use ToString() instead.")]
	var enumFormat = Enum.Format(test.GetType(), test, "G");
	var typeConverterFormat = System.ComponentModel.TypeDescriptor.GetConverter(typeof(TestEnum)).ConvertToInvariantString(test);

Rather unfortunately actually.

You may need to eliminated the cached s_negativeSign and do

var negativeSign = ((int)s_enumTypeCode % 2) == 0 ? null : NumberFormatInfo.CurrentInfo.NegativeSign

at runtime whenever the converter needs to know the current minus sign used by Enum.ToString().

I'm guessing the author of System.Text.Json.Serialization.Converters.EnumConverter<T> tried to write their code as performantly as possible given the occasional oddities of the Enum API, and got a little too aggressive. The comments from IsValidIdentifier() suggest exactly this:

        private static bool IsValidIdentifier(string value)
        {
            // Trying to do this check efficiently. When an enum is converted to
            // string the underlying value is given if it can't find a matching
            // identifier (or identifiers in the case of flags).
            //
            // The underlying value will be given back with a digit (e.g. 0-9) possibly
            // preceded by a negative sign. Identifiers have to start with a letter
            // so we'll just pick the first valid one and check for a negative sign
            // if needed.
            return (value[0] >= 'A' &&
                (s_negativeSign == null || !value.StartsWith(s_negativeSign)));
        }

Replacing s_negativeSign with

        private static bool IsValidIdentifier(string value)
        {
            // Trying to do this check efficiently. When an enum is converted to
            // string the underlying value is given if it can't find a matching
            // identifier (or identifiers in the case of flags).
            //
            // The underlying value will be given back with a digit (e.g. 0-9) possibly
            // preceded by a negative sign. Identifiers have to start with a letter
            // so we'll just pick the first valid one and check for a negative sign
            // if needed.
            var negativeSign = ((int)s_enumTypeCode % 2) == 0 ? null : NumberFormatInfo.CurrentInfo.NegativeSign;
            return (value[0] >= 'A' &&
                (negativeSign  == null || !value.StartsWith(negativeSign)));
        }

Might be sufficient to solve the problem.

Of course, there might be some private Enum API that returns an invariant version of Enum.ToString() that you could just use but that external developers can't access.

@eiriktsarpalis
Copy link
Member

I agree with your analysis, I think we should take the fix you are proposing. Looking up the negative sign from the current culture every time might theoretically incur a performance hit, but it should only impact numeric values when the string enum converter is enabled -- hardly in the hot path for enum serialization.

@eiriktsarpalis eiriktsarpalis added bug and removed untriaged New issue has not been triaged by the area owner labels Apr 28, 2022
@eiriktsarpalis eiriktsarpalis self-assigned this Apr 28, 2022
@eiriktsarpalis eiriktsarpalis added this to the 7.0.0 milestone Apr 28, 2022
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Apr 28, 2022
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Apr 29, 2022
@Neme12
Copy link

Neme12 commented May 21, 2022

Wait, I'm looking at the solution and it seems like current culture is always used in serialization? This makes no sense to me - I would expect the opposite. For example, in web apps when you're serializing or deserializing stuff for JSON, you don't care what language the website uses, it shouldn't depend on the culture at all because serialization should be using a standard format that everyone can agree on. But given that this does use current culture by default, there should at the very least be a way to opt out of that, and specify that serialization/deserialization should use the invariant culture, without using the global invariant culture flag, because you still want current culture for other things where it's appropriate.

@Neme12
Copy link

Neme12 commented May 21, 2022

Does JSON even allow using other characters than the standard - in a number? That seems really unlikely to me. JSON shouldn't use culture-sensitive numbers just like outputting C# can't use culture-sensitive numbers. It's a standardized language that has standardized rules including what negative sign to use. Doing this in a culture aware way doesn't make any sense :/

@eiriktsarpalis
Copy link
Member

True, but as @dbc2 originally pointed out enums don't appear to have a culture-insensitive formatting method. The change appears to predate the open sourcing of .NET, @stephentoub might know we've obsoleted the method or why the obsolete method isn't using the IFormatProvider argument.

@ghost ghost locked as resolved and limited conversation to collaborators Jun 22, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
3 participants