Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

System.Text.Json.JsonSerializer should't deserialize string to char #32429

Closed
blankensteiner opened this issue Feb 17, 2020 · 5 comments
Closed

Comments

@blankensteiner
Copy link

Unlike Newtonsoft.Json, when deserializing a string to a char, it will not throw an exception.

Let me give you an example.

This is our json:

{
    "Char": "Test"
}

This is our class:

public class MyClass
{ 
    public char Char { get; set; }
}

Using Newtonsoft.Json (version 12.0.3) I get an exception when trying to deserialize. The code below will throw this exception:

Newtonsoft.Json.JsonSerializationException: 'Error converting value "Test" to type 'System.Char'. Path 'Char', line 1, position 13.'

Inner Exception
FormatException: String must be exactly one character long.

var myClass= Newtonsoft.Json.JsonConvert.DeserializeObject<MyClass>("{\"Char\": \"Test\"}");

But when using System.Text.Json.JsonSerializer (.NET Core 3.1), I get an instance of MyClass where "Char" is 'T'.

var myClass = System.Text.Json.JsonSerializer.Deserialize<MyClass>("{\"Char\": \"Test\"}");

I think the default behavior should match that of Newtonsoft.Json. If you feel otherwise, then please at least provide a way to overwrite it via JsonSerializerOptions and JsonReaderOptions.

I am writing my own (de)serializer (because I need support for serializing/deserializing private and readonly fields) and therefore I would really prefer that System.Text.Json.Utf8JsonReader is extended with these two new methods:

public char GetChar();
public bool TryGetChar(out char value);

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Text.Json untriaged New issue has not been triaged by the area owner labels Feb 17, 2020
@ahsonkhan ahsonkhan added this to the 5.0 milestone Feb 19, 2020
@ahsonkhan ahsonkhan added bug and removed untriaged New issue has not been triaged by the area owner labels Feb 19, 2020
@ahsonkhan
Copy link
Contributor

ahsonkhan commented Feb 19, 2020

JsonSerializer should't deserialize string to char

Agreed.

This has been fixed in 5.0 already in #114. See #114 (comment). Feel free to try it by adding a package reference to the latest S.T.Json NuGet package from our nightly feed: https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet5/nuget/v3/index.json

Since it would be a behavioral breaking change for some, we decided against porting the fix in servicing for 3.1.

If waiting for 5.0 isn't feasible, the workaround for you would be to copy the current char converter implementation from master and register that in the JsonSerializerOptions Converter list.

internal sealed class CharConverter : JsonConverter<char>
{
public override char Read(ref Utf8JsonReader reader, Type typeToConvert, JsonSerializerOptions options)
{
string? str = reader.GetString();
if (string.IsNullOrEmpty(str) || str.Length > 1)
{
throw ThrowHelper.GetInvalidOperationException_ExpectedChar(reader.TokenType);
}
return str[0];
}
public override void Write(Utf8JsonWriter writer, char value, JsonSerializerOptions options)
{
writer.WriteStringValue(
#if BUILDING_INBOX_LIBRARY
MemoryMarshal.CreateSpan(ref value, 1)
#else
value.ToString()
#endif
);
}
}

Is this a blocking issue for you? Can you please explain your scenario in which you happen to use char properties? I am curious to know.

therefore I would really prefer that System.Text.Json.Utf8JsonReader is extended with these two new methods

This was discussed here and we ultimately decided against adding these to the Utf8JsonReader since it wasn't a common enough scenario. The user could just call the GetString() API and get the first character.
#29000

If there is significant need for the GetChar API on the reader, we could consider revisiting it, but I haven't seen a demand for it. What would you want the behavior to be for unpaired UTF-16 surrogate characters?

@blankensteiner
Copy link
Author

Hi @ahsonkhan

Great to hear that it has already been fixed!

It's not a blocking issue for me and I can wait for .NET 5.

Can you please explain your scenario in which you happen to use char properties? I am curious to know.

My scenario is that I am writing a serializer using Utf8JsonReader and Utf8JsonWriter. I need to support creating the object without calling a constructor (using System.Runtime.Serialization.FormatterServices.GetUninitializedObject) and getting/setting even private and read-only fields (using System.Reflection.FieldInfo.SetValue).

If there is significant need for the GetChar API on the reader, we could consider revisiting it, but I haven't seen a demand for it. What would you want the behavior to be for unpaired UTF-16 surrogate characters?

I would love not to duplicate the behavior of JsonSerializer and/or JsonDocument.
Right now we are missing a WriteString(string propertyName, char value) on Utf8JsonWriter and GetChar() on Utf8JsonReader. So, that means I have to read a string and use the first character because that is the current implementation and I want to respect/mimic that. When the implementation changes, I also have to change my implementation, throwing the same Exception and preferably with the same message. That could be avoided if the behavior was pushed down to the reader and writer (along with the options of altering behavior).

The same goes for the missing WriteString(string propertyName, TimeSpan value) and GetTimeSpan. I could just implement it by parsing a string, but that is not what I want. I just want to create a new serializer, building on the "lower-level" rules of dealing with TimeSpan and Char.

Does that make sense?

@ahsonkhan
Copy link
Contributor

ahsonkhan commented Feb 19, 2020

Does that make sense?

Sure. But do you actually need support for char in your serializer? I wanted to understand that use case better (since I haven't seen a lot of .NET object models with char properties).

I just want to create a new serializer, building on the "lower-level" rules of dealing with TimeSpan and Char.

What do you expect the read/write behavior to be for chars?

  • Do you read JSON numbers into char (as if it was ushort)? What do you read for incomplete surrogates?
string json = "\"a\"";
// excluding steps to transcode to UTF-8 and create reader
char c = reader.GetChar(); // Returning c == 'a' makes sense here

json = "\"ab\"";
c = reader.GetChar(); // throws?

json = "\"\u0064\"";
c = reader.GetChar(); // Returning c == 'd' makes sense here (unescape the JSON string)

json = "100"; // Notes: this is a single, unquoted, JSON number.
c = reader.GetChar(); // should this throw? should this be 'd'?

json = "\"\uD801\""; // high-surrogate, no low-surrogate
c = reader.GetChar(); // should this throw? should this be ushort equivalent of 0xD801?

json = "\"\uD8010\uDC37\""; // valid surrogate pair string
c = reader.GetChar(); // should this throw?
  • Similarly, do you write chars as a JSON number or as a single character string surrounded by quotes? What do you do for writing a character that happens to be part of an incomplete surrogate pair?

Here's the previous API review link from #29000 (comment) (for context): Video

Regarding TimeSpan, that's covered by #29932
Once the JsonSerializer has support for it, we can re-consider adding public reader/writer APIs for it too.

@blankensteiner
Copy link
Author

Sure. But do you actually need support for char in your serializer? I wanted to understand that use case better (since I haven't seen a lot of .NET object models with char properties).

Yes, we have applications using char fields and properties. It's not common and I can't remember if I have ever created one myself, but we have them in our codebase.

What do you expect the read/write behavior to be for chars?

Well, my point is actually what I don't really care, but with the current implementation I have to care.
My user has created an object with a char-field/property and I have to support that. So in a dream world the Utf8JsonWriter would just have a WriteChar(char value) and I wouldn't even know or care if it was converted to a number or a string or whatever data types JSON happens to support. Likewise, Utf8JsonReader would have a ReadChar() and again I'm spared the burden of handling the "how".
.NET has a char type, JSON has some data types, some logic maps them and that is (in my opinion) the domain of the Utf8JsonReader and Utf8JsonWriter. If that logic leaks into JsonSerializer or JsonDocument, then others (like me) have to duplicate that logic.

@ahsonkhan
Copy link
Contributor

ahsonkhan commented Feb 19, 2020

Yes, we have applications using char fields and properties. It's not common and I can't remember if I have ever created one myself, but we have them in our codebase.

Gotcha. Is the code open source/available on GitHub (and if so, can you share the link to the model that contains char)? What is the semantic meaning of those char properties (as an example)? For example, I can see one use case for int32 properties being an "id" (or "age"). Similarly, I can see decimal properties used in the context of money/currency. What do the char properties mean in your context?

Likewise, Utf8JsonReader would have a ReadChar() and again I'm spared the burden of handling the "how".
.NET has a char type, JSON has some data types, some logic maps them and that is (in my opinion) the domain of the Utf8JsonReader and Utf8JsonWriter.

Can you please file a new issue with the proposal to add the char-based API overloads to Utf8JsonReader and Utf8JsonWriter similar to #29000 (comment) along with providing your motivation/use case? Since you are looking for an API addition, here's the process: https://github.com/dotnet/runtime/blob/master/docs/project/api-review-process.md#steps
If you'd like me to do it instead, let me know and I will follow up tomorrow.

Btw, I was asking for details on the expected behavior because we previously deferred adding such APIs due to lack of clarity on the behavior. The objective was to use the motivating scenario to provide that clarity.

Given the primary concern of this issue has been resolved (deserializing string to char will throw in 5.0), closing this one.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants