Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support filtering property names on dictionary deserialization. #79059

Open
terrajobst opened this issue Nov 30, 2022 · 31 comments
Open

Support filtering property names on dictionary deserialization. #79059

terrajobst opened this issue Nov 30, 2022 · 31 comments
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Milestone

Comments

@terrajobst
Copy link
Contributor

terrajobst commented Nov 30, 2022

EDIT See #79059 (comment) for API proposal.

It seems there is currently no way to make JSON documents that reference a schema just deserializa.

Repro

{
    "$schema": "http://example.org/myschema.json",
    "Key1": {
        "V1": 10,
        "V2": "Ten"
    },
    "Key2": {
        "V1": 20,
        "V2": "Twenty"
    }
}
var content = "<json above>";
return JsonSerializer.Deserialize<Dictionary<string, CustomType>>(content);

class CustomType {
    public int V1 { get; set; }
    public string V2 { get; set; }
}

Expected Behavior

It just works. Or, based on JsonSerializerOptions, it could be configured to just work.

Actual Behavior

It causes an exception:

The JSON value could not be converted to CustomType. Path: $.$schema

In order to make it deserializable, the easiest option is to manually remove the schema node from the document:

if (JsonNode.Parse(content) is JsonObject jsonObject)
{
    if (jsonObject.Remove("$schema"))
        content = jsonObject.ToJsonString();
}

The more performant alternative would be to write a custom converter, but that seems like a lot of work. If the root type of the document weren't dictionary but a type I control, then presumably I could work this around by adding a property to my class and decorate it with [JsonPropertyName("$schema")] but that feels equally hacky.

Customer Evidence

It’s not entirely clear to me if this is worth prioritizing. On the one hand, it seems a decent amount of our customers might use JSON schemas (based on this super scientific poll below).

On the other hand, it would only impact customers who use dictionaries as the document type. That combination might be quite rare and may not be be worth handling directly. And based on James’s comments below, JSON.NET never handled that either. I suggest we leave it open for a bit and see what responses we get but my gut feel says cutting seems fine.

A5B403CE-A45D-49BD-9861-9C3D65E3281F

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Nov 30, 2022
@ghost
Copy link

ghost commented Nov 30, 2022

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

It seems there is currently no way to make JSON documents that reference a schema just deserializa.

Repro

{
    "$schema": "http://example.org/myschema.json",
    "Key1": "Value1",
    "Key2": "Value2"
}
var content = "<json above>";
return JsonSerializer.Deserialize<Dictionary<string, string>>(content);

Expected Behavior

It just works. Or, based on JsonSerializerOptions, it could be configured to just work.

Actual Behavior

It causes an exception:

The JSON value could not be converted to System.String. Path: $.$schema

In order to make it deserializable, the easiest option is to manually remove the schema node from the document:

if (JsonNode.Parse(content) is JsonObject jsonObject)
{
    if (jsonObject.Remove("$schema"))
        content = jsonObject.ToJsonString();
}

The more performant alternative would be to write a custom converter, but that seems like a lot of work. If the root type of the document weren't dictionary but a type I control, then presumably I could work this around by adding a property to my class and decorate it with [JsonPropertyName("$schema")] but that feels equally hacky.

Author: terrajobst
Assignees: -
Labels:

area-System.Text.Json

Milestone: -

@terrajobst terrajobst changed the title JSON Deserialization files for documents with schema JSON deserialization fails for documents with schema Nov 30, 2022
@layomia
Copy link
Contributor

layomia commented Nov 30, 2022

It seems reasonable to enable this scenario if we have substantial user asks for it. Feels like "future" for now wrt urgency.

I wonder if this problem is scoped to JSON schema support or whether a general solution is needed i.e a feature to express that a set of property names should be omitted from (de)serialization. I assume this would be scoped to just dictionaries since we already have that for POCO properties ([JsonIgnore] and related features).

@layomia layomia removed the untriaged New issue has not been triaged by the area owner label Nov 30, 2022
@layomia layomia added this to the Future milestone Nov 30, 2022
@gregsdennis
Copy link
Contributor

This isn't a JSON Schema problem (at least not properly). JSON Schema puts no requirements on JSON data.

Including $schema in an instance is a convention that some tools, like VS Code, uses to identify how to validate the data.

$schema inside of a JSON Schema carries additional and specific meaning.

@ShreyasJejurkar
Copy link
Contributor

@terrajobst did you gave try with @JamesNK Newtonsoft one? Does it work there?

@JamesNK
Copy link
Member

JamesNK commented Dec 1, 2022

Newtonsoft.Json errors:

Newtonsoft.Json.JsonSerializationException: 'Error converting value "http://example.org/myschema.json" to type 'CustomType'. Path '$schema', line 2, position 49.'

I'd recommend people load the JSON into a JObject and remove the $schema property and then deserialize it.

@gregsdennis
Copy link
Contributor

The System.Text.Json analog for that is JsonNode. @terrajobst has the proper workaround in the opening comment.

@eiriktsarpalis
Copy link
Member

A few observations:

  • POCO deserialization will generally skip any properties it fails to bind to so this shouldn't be an issue in that case. The same is true for non-generic dictionaries or IDictionary<string, object>, whose deserialization semantics closely follow those of JsonObject.
  • This issue only concerns generic dictionary deserialization, and the fact that all values in the deserialized JSON object must conform to whatever contract TValue imposes in IDictionary<TKey, TValue>. Assuming we get union types eventually (where something like Dictionary<string, string | CustomType> becomes expressible) it would be possible to work around the issue using appropriate modelling.
  • Assuming this is a request for JSON Schema support, this is being tracked in [System.Text.Json] Json Schemas Support #29887.
  • Assuming it is a request for property deny lists for dictionary deserialization in general, it might be something we could consider. I don't believe it is possible to work around the issue currently using JsonIgnoreAttribute or JsonPropertyName attributes since dictionaries don't use them.

@terrajobst
Copy link
Contributor Author

I added a customer evidence section with my thoughts.

@terrajobst
Copy link
Contributor Author

terrajobst commented Dec 1, 2022

@eiriktsarpalis

  • Assuming this is a request for JSON Schema support

That’s a good point. I didn’t call it out explicitly but that wasn’t my intent. Rather, I think we could decide to never support JSON schemas (ie offer validation or an OM for reading one) while still making this scenario possible by simply ignoring the well-known property name during The deserialization. So in that sense, I think of them of as orthogonal issues.

@eiriktsarpalis
Copy link
Member

Would we need to ignore other well-known metadata property names like $type or $id? There are dozens of schemes like that out there, I suspect we might want to expose it as deny list so that users can cherry-pick what they need to be skipped.

@gregsdennis
Copy link
Contributor

This sounds like a slippery slope.

@terrajobst
Copy link
Contributor Author

terrajobst commented Dec 2, 2022

I think one way to expose this would be like that:

namespace System.Text.Json;

public sealed class JsonSerializerOptions
{
    public JsonDictionaryKeyFilter DictionaryKeyFilter { get; set; }
}

public enum JsonDictionaryKeyFilter
{
    None,
    IgnoreMetadataNames // Ignores any keys starting with $, such as `$schema`.
}

One could make this slightly more flexible by borrowing from the design of strongly typed strings and create an enum-like wrapper around a func, which offers some simple well-known versions:

public readonly struct JsonDictionaryKeyFilter
{
    public static JsonDictionaryKeyFilter None { get; }
    public static JsonDictionaryKeyFilter IgnoreMetadataNames { get; }
    
    public JsonDictionaryKeyFilter(Func<string, bool> filter);
    public bool Include(string name);
}

In either design, the default would be JsonDictionaryKeyFilter.None.

@eiriktsarpalis
Copy link
Member

I think the predicate approach works. I would prefer if we could avoid the wrapper type and just use Func<string, bool> or perhaps just use a dedicated delegate type. There also needs to be a configuration point on JsonTypeInfo.

@gregsdennis
Copy link
Contributor

Why does everyone keep using Func<string, bool> when Predicate<string> exists?! I see it a lot.

Anyway... Carry on.

@eiriktsarpalis
Copy link
Member

Predicate predates Func and its use is largely restricted to Linq and a few generic collection methods. I don't believe we use it in any new APIs.

@gregsdennis
Copy link
Contributor

I know it's older, which is why I'm confused that it's not used more.

I don't mean to derail the conversation though. Carry on.

@eiriktsarpalis
Copy link
Member

Would we need to ignore other well-known metadata property names like $type or $id? There are dozens of schemes like that out there, I suspect we might want to expose it as deny list so that users can cherry-pick what they need to be skipped.

#82012 seems to support the fact we need to ignore arbitrary property names.

@eiriktsarpalis eiriktsarpalis changed the title JSON deserialization fails for documents with schema Support filtering property names on dictionary deserialization. Feb 13, 2023
@eiriktsarpalis
Copy link
Member

We might want to use a design similar to JsonNamingPolicy:

namespace System.Text.Json.Serialization;

public abstract class JsonDictionaryKeyFilter
{
    public static JsonDictionaryKeyFilter IgnoreMetadataNames { get; }

    public abstract bool IgnoreKey(ReadOnlySpan<byte> utf8Key);
}

namespace System.Text.Json;

public partial class JsonSerializerOptions
{
    public JsonDictionaryKeyFilter? DictionaryKeyFilter { get; set; } = null;
}

@eiriktsarpalis eiriktsarpalis modified the milestones: Future, 8.0.0 Feb 14, 2023
@eiriktsarpalis eiriktsarpalis self-assigned this Feb 14, 2023
@eiriktsarpalis eiriktsarpalis added the api-ready-for-review API is ready for review, it is NOT ready for implementation label Feb 14, 2023
@Sergio0694
Copy link
Contributor

Would it make sense to also have an attribute that would allow people to opt-in into this over arbitrary properties and with arbitrary values, rather than only at the options level? Somewhat related, if this can only be set from the options would this mean that if you need this you'd be unable to use the generated Default property from a JSON context, unless you manually set this property at some point before using it? Additionally, if you have multiple accessors for options from a given context (eg. we have a couple, one is the default one and one is for camel case naming policy), would this mean you'd have to remember to set this for each of those contexts before using them? 🤔

@gregsdennis
Copy link
Contributor

gregsdennis commented Feb 15, 2023

Please be aware that in JsonSchema.Net, I use the STJ serializer to deserialize actual schemas, for which the $schema, $id, and other properties are considered and deserialized, not ignored.

If something is done here, please make the new behavior opt-in.

@eiriktsarpalis
Copy link
Member

If something is done here, please make the new behavior opt-in.

That goes without saying. Making it the default behavior would be a breaking change.

@krwq
Copy link
Member

krwq commented Feb 15, 2023

IMO that should be something that lives on JsonTypeInfo rather than global. Also we should think of cases like Dictionary<string, Dictionary<string, string>> when designing this. Possibly we could expose ElementTypeInfo/KeyTypeInfo to solve that problem (that needs a proper design though to not get us into recursive types hell).

@eiriktsarpalis
Copy link
Member

Would it make sense to also have an attribute

It wouldn't be possible to specify a filter value on the attribute level unless we either exposed a proxy enum type or the type of the IgnoreMetadataNames class. We don't do this for things like DictionaryKeyPolicy, which can only be configured globally.

Somewhat related, if this can only be set from the options would this mean that if you need this you'd be unable to use the generated Default property from a JSON context

That's right, although this added step should have no impact on serialization performance. We might want to consider exposing a parameter in JsonSourceGenerationOptionsAttribute eventually, but I wouldn't consider this a high enough priority, there's a big functionality gap between that attribute and JsonSerializerOptions already (including lack of options for the existing DictionaryKeyPolicy).

IMO that should be something that lives on JsonTypeInfo rather than global.

I'm not too sure. This feature dovetails with reference handling which is also configured globally. I don't see why couldn't expose this in JsonTypeInfo but bare minimum it should validate that no reference handling or polymorphism have been configured for the type. I would suggest keeping it simple for now and consider bringing it to JsonTypeInfo together with DictionaryKeyPolicy provided there is demand for that.

@terrajobst
Copy link
Contributor Author

terrajobst commented Mar 21, 2023

Video

  • It seems odd that we'd use UTF8 in this API, given that JsonNamingPolicy doesn't
    • Since this is an opt-in policy we're OK with allocating for cases where the keys aren't strings
  • We should name the method ShouldIgnoreKey
  • We will apply this policy before DictionaryKeyPolicy and we will run it for both string and non-string key types.
namespace System.Text.Json.Serialization;

public abstract class JsonDictionaryKeyFilter
{
    public static JsonDictionaryKeyFilter IgnoreMetadataNames { get; }
    public abstract bool ShouldIgnoreKey(string key);
}
namespace System.Text.Json;

public partial class JsonSerializerOptions
{
    public JsonDictionaryKeyFilter? DictionaryKeyFilter { get; set; } = null;
}

@terrajobst terrajobst added api-approved API was approved in API review, it can be implemented and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Mar 21, 2023
@Sergio0694
Copy link
Contributor

I missed the API review, small question on this API:

public abstract bool ShouldIgnoreKey(string key);

Even though we don't care about UTF8, is there any benefit in not making that at least ReadOnlySpan<char>? 🤔

@krwq
Copy link
Member

krwq commented Mar 21, 2023

@terrajobst presumably this doesn't apply for dictionary for extension data properties?

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Mar 21, 2023

@krwq It shouldn't. These are always Dictionary<string, object> or similar so it's one size fits all.

@eiriktsarpalis eiriktsarpalis removed their assignment Mar 22, 2023
@eiriktsarpalis eiriktsarpalis removed the partner-impact This issue impacts a partner who needs to be kept updated label Mar 22, 2023
@eiriktsarpalis eiriktsarpalis modified the milestones: 8.0.0, Future May 31, 2023
@AlexRadch
Copy link
Contributor

Can anybody help me with PR #87868 ?

@eiriktsarpalis
Copy link
Member

@AlexRadch currently going through a backlog of other things, will get to yours eventually. Thanks!

@koyote
Copy link

koyote commented Mar 13, 2024

I know this issue had the title changed to be Dictionary specific, but I feel like there's a related issue for non-dictionaries:

[JsonDerivedType(typeof(DerivedType), typeDiscriminator: nameof(DerivedType))]
public abstract class BaseType
{
}

public sealed class DerivedType : BaseType
{
}

internal class Program
{
    static void Main()
    {
        var text =
        """
        {
            "$type": "DerivedType",
            "$schema": "http://example.org/myschema.json",
        }
        """;
        var baseObj = JsonSerializer.Deserialize<BaseType>(text);
    }
}

The above results in:

System.Text.Json.JsonException: Properties that start with '$' are not allowed in types that support metadata. Either escape the character or disable reference preservation and polymorphic deserialization. Path: $.$schema | LineNumber: 2 | BytePositionInLine: 14.
   at System.Text.Json.ThrowHelper.ThrowJsonException(String message)
   at System.Text.Json.ThrowHelper.ThrowJsonException_MetadataInvalidPropertyWithLeadingDollarSign(ReadOnlySpan`1 propertyName, ReadStack& state, Utf8JsonReader& reader)
   at System.Text.Json.Serialization.Converters.ObjectDefaultConverter`1.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.OnTryReadAsObject(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, Object& value)
   at System.Text.Json.Serialization.Converters.ObjectDefaultConverter`1.OnTryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value)
   at System.Text.Json.Serialization.JsonConverter`1.TryRead(Utf8JsonReader& reader, Type typeToConvert, JsonSerializerOptions options, ReadStack& state, T& value, Boolean& isPopulatedValue)
   at System.Text.Json.Serialization.JsonConverter`1.ReadCore(Utf8JsonReader& reader, JsonSerializerOptions options, ReadStack& state)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 utf8Json, JsonTypeInfo`1 jsonTypeInfo, Nullable`1 actualByteCount)
   at System.Text.Json.JsonSerializer.ReadFromSpan[TValue](ReadOnlySpan`1 json, JsonTypeInfo`1 jsonTypeInfo)
   at System.Text.Json.JsonSerializer.Deserialize[TValue](String json, JsonSerializerOptions options)

I am not sure what a good workaround would be as adding a [JsonProperty("$schema")] attribute to a property on either class won't help here.

Should any fix for the Dictionary case also take the above case into account or is this a separate issue that needs its own issue number?

@terrajobst
Copy link
Contributor Author

Should any fix for the Dictionary case also take the above case into account or is this a separate issue that needs its own issue number?

That's a fair question. Originally, I scoped it to dictionaries because it's a built-in type that the user can't easily change the serialization behavior for, but I guess for custom objects a generic way to turn this off would be desirable as well, even though other workarounds exist.

@eiriktsarpalis eiriktsarpalis modified the milestones: 9.0.0, 10.0.0 Jul 5, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-approved API was approved in API review, it can be implemented area-System.Text.Json
Projects
None yet
Development

No branches or pull requests

10 participants