Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add polymorphism support to System.Text.Json #53882

Closed

Conversation

eiriktsarpalis
Copy link
Member

@eiriktsarpalis eiriktsarpalis commented Jun 8, 2021

This is a draft PR showcasing the prototype implementation for polymorphic serialization support in System.Text.Json (#45189). No APIs have been approved for this feature yet. Please see this PR for an overview of the proposed design.

Overview

Conceptually, the implementation can be broken into two distinct sets of changes:

  1. The System.Text.Json infrastructural changes required to accommodate the new polymorphism features and
  2. The classes and tests implementing the polymorphism features themselves.

Infrastructural Changes

Here is a high-level overview of the infrastructural changes being made:

  • The CanBePolymorphic and CanUseDirectReadOrWrite properties from JsonConverter have been moved to JsonTypeInfo, since the value of these flags can now be influenced by user configuration or attributes.
  • Rewrites the WriteStack/ReadStack structs to accommodate root-level polymorphic objects: the existing implementation depends on root-level values not being polymorphic, however this invariant can no longer be honored.
  • Removes the JsonConverter.RuntimeType property: its only purpose is to construct a default ConstructorDelegate instance for interface converters, however it is being passed around pervasively in the JsonTypeInfo/JsonPropertyInfo infrastructure and it can result in unsound behaviour in the context of polymorphic deserialization. The changes replace the property with a ConstructorDelegate? property in JsonConverter. In order to minimize churn, I have kept all other RuntimeTypeInfo members in our infrastructure for now, but at this point they are all guaranteed to equal TypeToConvert. The final PR should be removing these altogether.
  • Reworks the ReferenceResolver.IngoreCycles implementation: the current implementation contains bespoke code to avoid polymorphic converters triggering false-negative cycles. I have refactored the code so that the cycle detection logic is combined with the polymorphic serialization infrastructure.
  • Makes the ConverterList class generic so that it can be reused with other list types in JsonSerializerOptions.

Functional Changes

Please see the design document for a high-level overview of the proposed features.

Performance

The changes have a small but measurable impact on deserialization performance, since the hot path now needs to account for polymorphism. Here are some results using the dotnet/performance benchmark suite:

ReadJson_Int32

Method Job Toolchain Mean Error StdDev Median Min Max Ratio MannWhitney(5%) RatioSD Gen 0 Gen 1 Gen 2 Allocated
DeserializeFromString Job-IIQBKK main 126.94 ns 6.072 ns 6.993 ns 123.38 ns 120.13 ns 135.58 ns 1.00 Base 0.00 - - - -
DeserializeFromString Job-MTBECC feature 119.61 ns 0.240 ns 0.213 ns 119.55 ns 119.34 ns 119.99 ns 0.94 Same 0.05 - - - -
DeserializeFromUtf8Bytes Job-IIQBKK main 84.25 ns 3.747 ns 4.315 ns 81.27 ns 80.30 ns 90.74 ns 1.00 Base 0.00 - - - -
DeserializeFromUtf8Bytes Job-MTBECC feature 82.96 ns 3.895 ns 4.485 ns 79.82 ns 78.76 ns 91.46 ns 0.99 Same 0.07 - - - -
DeserializeFromStream Job-IIQBKK main 289.50 ns 5.632 ns 5.268 ns 290.95 ns 283.39 ns 298.82 ns 1.00 Base 0.00 0.0069 - - 72 B
DeserializeFromStream Job-MTBECC feature 301.79 ns 0.945 ns 0.738 ns 301.44 ns 301.02 ns 303.05 ns 1.05 Same 0.02 0.0060 - - 72 B

ReadJson_SimpleStructWithProperties

Method Job Toolchain Mean Error StdDev Median Min Max Ratio MannWhitney(5%) RatioSD Gen 0 Gen 1 Gen 2 Allocated
DeserializeFromString Job-IIQBKK main 317.1 ns 1.44 ns 1.21 ns 316.7 ns 316.0 ns 319.2 ns 1.00 Base 0.00 0.0052 - - 64 B
DeserializeFromString Job-MTBECC feature 333.6 ns 9.50 ns 10.94 ns 326.3 ns 323.8 ns 350.2 ns 1.04 Same 0.03 0.0052 - - 64 B
DeserializeFromUtf8Bytes Job-IIQBKK main 253.7 ns 0.78 ns 0.61 ns 253.8 ns 252.9 ns 254.6 ns 1.00 Base 0.00 0.0063 - - 64 B
DeserializeFromUtf8Bytes Job-MTBECC feature 273.5 ns 1.40 ns 1.24 ns 272.8 ns 272.3 ns 276.0 ns 1.08 Slower 0.01 0.0055 - - 64 B
DeserializeFromStream Job-IIQBKK main 549.6 ns 22.76 ns 26.22 ns 534.3 ns 527.8 ns 593.8 ns 1.00 Base 0.00 0.0131 - - 144 B
DeserializeFromStream Job-MTBECC feature 558.0 ns 3.34 ns 3.12 ns 557.0 ns 553.5 ns 564.4 ns 1.01 Same 0.05 0.0133 - - 144 B

ReadJson_MyEventsListerViewModel

Method Job Toolchain Mean Error StdDev Median Min Max Ratio MannWhitney(5%) RatioSD Gen 0 Gen 1 Gen 2 Allocated
DeserializeFromString Job-IIQBKK main 299.2 μs 12.72 μs 14.64 μs 289.3 μs 288.2 μs 322.4 μs 1.00 Base 0.00 6.9444 2.3148 - 77 KB
DeserializeFromString Job-MTBECC feature 308.2 μs 0.42 μs 0.37 μs 308.1 μs 307.6 μs 309.0 μs 1.02 Same 0.05 7.3801 2.4600 - 77 KB
DeserializeFromUtf8Bytes Job-IIQBKK main 293.4 μs 12.11 μs 13.95 μs 285.8 μs 281.7 μs 316.7 μs 1.00 Base 0.00 6.8104 2.2701 - 77 KB
DeserializeFromUtf8Bytes Job-MTBECC feature 315.8 μs 10.12 μs 9.94 μs 312.5 μs 311.1 μs 348.9 μs 1.07 Slower 0.04 7.5377 2.5126 - 77 KB
DeserializeFromStream Job-IIQBKK main 331.4 μs 1.52 μs 1.19 μs 331.1 μs 329.9 μs 333.7 μs 1.00 Base 0.00 6.6050 1.3210 - 78 KB
DeserializeFromStream Job-MTBECC feature 355.5 μs 2.04 μs 1.59 μs 354.9 μs 354.0 μs 358.7 μs 1.07 Slower 0.00 7.0522 1.4104 - 78 KB

The comparison results can fluctuate across runs, however the trend clearly seems to be pointing towards decreased performance.

On the serialization side performance tends to be on par with main:

WriteJson_Int32

Method Job Toolchain Mean Error StdDev Median Min Max Ratio MannWhitney(5%) RatioSD Gen 0 Gen 1 Gen 2 Allocated
SerializeToString Job-IIQBKK main 157.1 ns 6.70 ns 7.71 ns 152.5 ns 151.0 ns 170.3 ns 1.00 Base 0.00 0.0179 - - 184 B
SerializeToString Job-MTBECC feature 136.2 ns 0.33 ns 0.29 ns 136.1 ns 135.9 ns 136.9 ns 0.86 Faster 0.04 0.0178 - - 184 B
SerializeToUtf8Bytes Job-IIQBKK main 123.3 ns 2.14 ns 1.78 ns 122.6 ns 122.1 ns 127.4 ns 1.00 Base 0.00 0.0183 - - 184 B
SerializeToUtf8Bytes Job-MTBECC feature 122.0 ns 0.57 ns 0.51 ns 121.8 ns 121.4 ns 122.9 ns 0.99 Same 0.02 0.0181 - - 184 B
SerializeToStream Job-IIQBKK main 195.7 ns 9.95 ns 11.46 ns 188.5 ns 187.6 ns 216.2 ns 1.00 Base 0.00 0.0151 - - 152 B
SerializeToStream Job-MTBECC feature 203.9 ns 2.27 ns 1.90 ns 203.0 ns 202.4 ns 208.8 ns 1.03 Slower 0.06 0.0146 - - 152 B
SerializeObjectProperty Job-IIQBKK main 247.7 ns 0.99 ns 0.83 ns 247.5 ns 246.6 ns 249.4 ns 1.00 Base 0.00 0.0189 - - 200 B
SerializeObjectProperty Job-MTBECC feature 239.8 ns 1.00 ns 0.89 ns 239.7 ns 238.3 ns 241.7 ns 0.97 Same 0.00 0.0189 - - 200 B

WriteJson_SimpleStructWithProperties

Method Job Toolchain Mean Error StdDev Median Min Max Ratio MannWhitney(5%) RatioSD Gen 0 Gen 1 Gen 2 Allocated
SerializeToString Job-IIQBKK main 253.4 ns 0.77 ns 0.60 ns 253.2 ns 252.7 ns 254.9 ns 1.00 Base 0.00 0.0250 - - 256 B
SerializeToString Job-MTBECC feature 253.5 ns 0.93 ns 0.78 ns 253.3 ns 252.6 ns 255.0 ns 1.00 Same 0.00 0.0243 - - 256 B
SerializeToUtf8Bytes Job-IIQBKK main 218.6 ns 1.04 ns 0.87 ns 218.5 ns 217.6 ns 220.7 ns 1.00 Base 0.00 0.0229 - - 232 B
SerializeToUtf8Bytes Job-MTBECC feature 227.3 ns 10.50 ns 12.09 ns 219.8 ns 217.7 ns 248.2 ns 1.03 Same 0.05 0.0225 - - 232 B
SerializeToStream Job-IIQBKK main 339.3 ns 1.59 ns 1.41 ns 338.8 ns 337.6 ns 342.3 ns 1.00 Base 0.00 0.0177 - - 184 B
SerializeToStream Job-MTBECC feature 340.5 ns 0.51 ns 0.46 ns 340.3 ns 340.0 ns 341.5 ns 1.00 Same 0.00 0.0172 - - 184 B
SerializeObjectProperty Job-IIQBKK main 482.0 ns 1.67 ns 1.39 ns 481.7 ns 480.7 ns 485.9 ns 1.00 Base 0.00 0.0641 - - 648 B
SerializeObjectProperty Job-MTBECC feature 456.9 ns 17.27 ns 19.89 ns 445.8 ns 443.3 ns 498.3 ns 0.95 Faster 0.04 0.0612 - - 616 B

WriteJson_MyEventsListerViewModel

Method Job Toolchain Mean Error StdDev Median Min Max Ratio MannWhitney(5%) RatioSD Gen 0 Gen 1 Gen 2 Allocated
SerializeToString Job-IIQBKK main 391.3 μs 11.55 μs 12.84 μs 385.9 μs 380.5 μs 421.9 μs 1.00 Base 0.00 47.2561 47.2561 47.2561 256 KB
SerializeToString Job-MTBECC feature 386.5 μs 15.73 μs 18.12 μs 375.8 μs 374.2 μs 420.6 μs 0.99 Same 0.06 46.1310 46.1310 46.1310 256 KB
SerializeToUtf8Bytes Job-IIQBKK main 346.8 μs 2.92 μs 2.44 μs 345.8 μs 344.5 μs 351.9 μs 1.00 Base 0.00 16.3043 2.7174 - 182 KB
SerializeToUtf8Bytes Job-MTBECC feature 349.5 μs 5.97 μs 4.98 μs 346.4 μs 344.8 μs 360.4 μs 1.01 Same 0.02 17.6630 2.7174 - 182 KB
SerializeToStream Job-IIQBKK main 338.8 μs 1.15 μs 0.90 μs 338.7 μs 337.6 μs 340.5 μs 1.00 Base 0.00 10.6383 - - 109 KB
SerializeToStream Job-MTBECC feature 360.5 μs 23.25 μs 26.78 μs 346.2 μs 339.9 μs 421.0 μs 1.08 Same 0.09 10.8696 - - 109 KB
SerializeObjectProperty Job-IIQBKK main 389.3 μs 16.53 μs 19.04 μs 378.4 μs 375.2 μs 423.0 μs 1.00 Base 0.00 46.0526 46.0526 46.0526 257 KB
SerializeObjectProperty Job-MTBECC feature 379.9 μs 11.72 μs 12.04 μs 375.9 μs 369.9 μs 416.7 μs 0.97 Same 0.06 45.1389 45.1389 45.1389 256 KB

@eiriktsarpalis eiriktsarpalis added NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons) area-System.Text.Json labels Jun 8, 2021
@eiriktsarpalis eiriktsarpalis self-assigned this Jun 8, 2021
@ghost
Copy link

ghost commented Jun 8, 2021

Tagging subscribers to this area: @eiriktsarpalis, @layomia
See info in area-owners.md if you want to be subscribed.

Issue Details

This is a draft PR showcasing the prototype implementation for polymorphic serialization support in System.Text.Json (#45189). No APIs have been approved for this feature yet. Please see this PR for an overview of the proposed design.

Overview

Conceptually, the implementation can be broken into two distinct sets of changes:

  1. The System.Text.Json infrastructural changes required to accommodate the new polymorphism features and
  2. The classes and tests implementing the polymorphism features themselves.

Infrastructural Changes

Here is a high-level overview of the infrastructural changes being made:

  • The CanBePolymorphic and CanUseDirectReadOrWrite properties from JsonConverter have been moved to JsonTypeInfo, since the value of these flags can now be influenced by user configuration or attributes.
  • Rewrites the WriteStack/ReadStack structs to accommodate root-level polymorphic objects: the existing implementation depends on root-level values not being polymorphic, however this invariant can no longer be honored.
  • Removes the JsonConverter.RuntimeType property: its only purpose is to construct a default ConstructorDelegate instance for interface converters, however it is being passed around pervasively in the JsonTypeInfo/JsonPropertyInfo infrastructure and it can result in unsound behaviour in the context of polymorphic deserialization. The changes replace the property with a ConstructorDelegate? property in JsonConverter. In order to minimize churn, I have kept all other RuntimeTypeInfo members in our infrastructure for now, but at this point they are all guaranteed to equal TypeToConvert. The final PR should be removing these altogether.
  • Reworks the ReferenceResolver.IngoreCycles implementation: the current implementation contains bespoke code to avoid polymorphic converters triggering false-negative cycles. I have refactored so that the cycle detection logic is combined with polymorphic serialization.
  • Makes the ConverterList class generic so that it can be reused with other list types in JsonSerializerOptions.

Functional Changes

Please see the design document for a high-level overview of the proposed features.

Author: eiriktsarpalis
Assignees: eiriktsarpalis
Labels:

* NO MERGE *, area-System.Text.Json

Milestone: -

@dotnet-issue-labeler
Copy link

Note regarding the new-api-needs-documentation label:

This serves as a reminder for when your PR is modifying a ref *.cs file and adding/modifying public APIs, to please make sure the API implementation in the src *.cs file is documented with triple slash comments, so the PR reviewers can sign off that change.

@olivier-spinelli
Copy link

Not sure if it's the right place to address this, but I have a question about the type discriminator, not about the lax/strict mode but about the way it is expressed: has anybody considered using a 2-cell array ["typeName", value] instead of injecting an extra property in the Json whenever the type is required?

@eiriktsarpalis
Copy link
Member Author

has anybody considered using a 2-cell array ["typeName", value] instead of injecting an extra property in the Json whenever the type is required?

Why would this be preferable? If anything it might make more difficult to evaluate what "typeName" is meant to signify. Note that the serializer already emits other types of metadata ($id and $ref fields, in the context of reference preservation) so we need to be able to combine those features in a single format. The $type format is also used by other JSON serializers that implement polymorphism.

@olivier-spinelli
Copy link

This is a bit more compact, avoids name clashes, and refs and ids can also be wrapped into such 2-cells arrays.

Anther aspect that this introduces is that the data shape is actually different whether you know the type or not. This (up to me) provides a "kind of type check": a datum that is polymorphic must be read by handling the cell wrapper, it cannot be (wrongly) confused with a typed object (that is not expected), by silently skipping the $type discriminator property.
=> Polymorphism appears somehow "structurally", rather than "conjoncturally" in the data.

I perfectly understand that you'd go the "classical" way but I think this approach is (more) interesting. (And yes, I know that others do this but - imho - this is not a fundamental argument.)

@eiriktsarpalis
Copy link
Member Author

Anther aspect that this introduces is that the data shape is actually different whether you know the type or not.

That's not true, collection types can also be polymorphic (following the { "$type" : "typeName", "$values" : [1,2,3] } format also used by other serializers when appending metadata to JSON arrays).

@olivier-spinelli
Copy link

Absolutely. In the "classical way", polymorphic collections are already structurally different.

@olivier-spinelli
Copy link

Just a simple example with:

  • Person { string Name { get; set; } }
  • Student : Person { int Age { get; set; } }
  • Teacher : Person { bool IsChief { get; set; } }

An array of Student (the item's type is final - no ambiguities) would be [{"Name":"A", "Age":12},{"Name":"B", "Age":13}].

An array of Person would be [["Student",{"Name":"A", "Age":12}],["Teacher",{"Name":"T","IsChief:false}]] (items require their type).

The same array of Persons except that it is known only as an object[] (or any other non-unique "base" type), the array itself requires its type:
["A(Person)",[["Student",{"Name":"A", "Age":12}],["Teacher",{"Name":"T","IsChief:false}]]]

(The type name "A(Person)" is just an example. I just use ( instead of < because of JSON \u(nicode handling).)

Dictionaries ("M(string,Person)"), Sets, List etc. are of course as easily mapped.

@olivier-spinelli
Copy link

Another interesting point that I'd like to higlight: with this schema, the de/serialization of the type itself is exactly the same, it's up to the container/referencer/caller to wrap (or not) the type. Even the type name is not necessarily known/required by the de/serialization code. This allows a clean kind of "onion handling" in code: null(type(data))...

(I don't sell anything and I don't want to waste your time: if you're interested feel free to continue this thread or contact me.)

@eiriktsarpalis eiriktsarpalis force-pushed the polymorphic-serialization branch from e2f9976 to 9a2b86d Compare June 9, 2021 15:59
@olivier-spinelli
Copy link

My last try (promised!) - This is a doc I wrote for my implementation (at least I would have tried everything I could to gain attention on this)...

Json polymorphism support

One of the big issue to solve when de/serializing objects is to manage polymorphism: when an abstraction must be serialized, the type of the serialized object must also be written so that deserialization know what to instantiate.

The classical approach

The classical approach here is to inject a $type (or _$type$_ or any strange enough property that contains a string with the name of the type) into the object properties.
Since in JSON, collections (arrays, lists, sets) are expressed with [arrays] (and this cannot hold any property), another syntactical
construct is required for collections, typically something like { "$type" : "A(int)", "$values" : [1,2,3] }.

Note that, in this approach, serialized objects with or without a type are structurally identical (only the existence of the $type property makes the difference) whereas serialized collections ARE structurally different (they appear inside a "wrapper" object).

When is a $type NOT required? When the object's type is statically known, when the de/serialized type is unambiguous. As soon as
an ambiguity exists, polymorphism is at stake.

The object type being the "worst case of polymorphism", but any interface or base class with specializations are ambiguous), the runtime type of the object MUST be serialized and MUST be used to deserialize the object.

We consider "Structural Difference" between unambiguous and polymorphic objects to be a good thing since this obliges the reader to take care of the object's type if there is one.
Without any difference in the shape of the data, a polymorphic object can be falsely read as an object of an unambiguous (wrong) type.

Structural Polymorphism

We are using a different approach that offers the following advantages:

  • No "magic name" like $type or $values.
  • Uniform type handling for objects and collections: there is always a "Structural Difference" between a polymorphic and an unambiguous shape.

Thanks to this, the code is simple and can be split in 3 layers:

  • Nullable layer: the null occurrence is handled.
  • Polymorphic layer: a 2-cells array is expected, the first cell containing the type name, the second one the unambiguous object's representation.
  • Unambiguous layer (the exact type is known): the serialize/deserialize code handles purely the object's data.

Last (but not least) advantage of this approach: when deserializing, the type is known BEFORE the object's data. This enable deserializer to easily work in "pure streaming" (no lookup required, no intermediate instantiation): the Utf8Reader can be used directly.

Below is a simple example with:

class Person
{
   public string Name { get; set; }
}

class Student : Person
{
    public int Age { get; set; }
}

class Teacher : Person
{
    public bool IsChief { get; set; }
}

The C# Student[] is an array of Student. There is no specialization, hence no ambiguity: [{"Name":"A", "Age":12},{"Name":"B", "Age":13}].

The C# Person[] can contain 3 different types of objects. Each item needs to be "typed":
[["Student",{"Name":"A", "Age":12}],["Person",{"Name":"E"}],["Teacher",{"Name":"T","IsChief:false}]].

When the same array of Persons is known only as an object[] (or any other non-unique "base" type like the non-generic ICollection), then the array itself requires its type:
["A(Person)",[["Student",{"Name":"A", "Age":12}],["Person",{"Name":"E"}],["Teacher",{"Name":"T","IsChief:false}]]]

@BreyerW
Copy link

BreyerW commented Jun 15, 2021

I would like to ask if mirroring ReferenceResolver API for polymorphic support with appropiate tweaks was considered? This would allow providing both newtonsoft compatible and secure with type discriminators implementations of polymorphic support and also other flavours. Secure implementation could be provided by default in bcl and newtonsoft provided in guide to migrate but not in bcl to discourage wide use of it

@eiriktsarpalis
Copy link
Member Author

Closing this draft since the two features are to be reviewed and implemented independently.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Json new-api-needs-documentation NO-MERGE The PR is not ready for merge yet (see discussion for detailed reasons)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants