Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Undocumented breaking change in .NET 5.0 - HashSet moved to another assembly/library #47113

Closed
maciejjarzynski opened this issue Jan 18, 2021 · 22 comments

Comments

@maciejjarzynski
Copy link

maciejjarzynski commented Jan 18, 2021

Hello everyone,
I've stumbled upon a breaking change which as far as I've searched for was not documented anywhere.
It seems that with that pull request #37180 HashSet was moved from System.Collections to System.Private.CoreLib library.
Let's say we have a fair big distributed system based on microservices with an immutable events used for a synchronization between them which are being stored forever.
Newtonsoft.Json is being used as a serializer/deserializer.
The problem with that breaking change is like in the example below:
that's an example of class which was serialized in a netcoreapp3.1 version (with a collection in a dictionary's value being HashSet):

public class TestClass
    {
        public IReadOnlyDictionary<string, IReadOnlyCollection<string>> Dictionary { get; }

        public TestClass(IReadOnlyDictionary<string, IReadOnlyCollection<string>> dictionary)
        {
            Dictionary = dictionary;
        }
    }

resulting in (with a collection in a dictionary's value being HashSet)

{
  "Dictionary": {
    "DictionaryKey": {
      "$type": "System.Collections.Generic.HashSet`1[[System.String, System.Private.CoreLib]], System.Collections",
      "$values": [
        "string value"
      ]
    }
  }
}

whereas in net5.0 it is being serialized as

{
  "Dictionary": {
    "DictionaryKey": {
      "$type": "System.Collections.Generic.HashSet`1[[System.String, System.Private.CoreLib]], System.Private.CoreLib",
      "$values": [
        "string value"
      ]
    }
  }
}

The TypeNameHandling is set to Auto.

The problem of couse occurs when trying to deserialize old-netcoreapp3.1 events/messages into the application running on net5.0

My questions are basically:

  • are there any other such an undocumented breaking changes, like any other types were moved between libraries?
  • are you aware that a mentioned change could have caused some serious implications and have some kind of solution or workaround for that?
  • @JamesNK maybe you'd have some idea, how could we disable the type name handling for the dictionaries and collections from a framework, so upon deserialization the $type would be ignored for those types?
@dotnet-issue-labeler dotnet-issue-labeler bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Jan 18, 2021
@ghost
Copy link

ghost commented Jan 18, 2021

Tagging subscribers to this area: @eiriktsarpalis
See info in area-owners.md if you want to be subscribed.

Issue Details

Hello everyone,
I've stumbled upon a breaking change which as far as I've searched for was not documented anywhere.
It seems that with that pull request #37180 HashSet was moved from System.Collections to System.Private.CoreLib library.
Let's say we have a fair big distributed system based on microservices with an immutable events used for a synchronization between them which are being stored forever.
The problem with that breaking change is like in the example below:
that's an example of class which was serialized in a netcoreapp3.1 version (with a collection in a dictionary's value being HashSet):

public class TestClass
    {
        public IReadOnlyDictionary<string, IReadOnlyCollection<string>> Dictionary { get; }

        public TestClass(IReadOnlyDictionary<string, IReadOnlyCollection<string>> dictionary)
        {
            Dictionary = dictionary;
        }
    }

resulting in (with a collection in a dictionary's value being HashSet)

{
  "Dictionary": {
    "DictionaryKey": {
      "$type": "System.Collections.Generic.HashSet`1[[System.String, System.Private.CoreLib]], System.Collections",
      "$values": [
        "string value"
      ]
    }
  }
}

whereas in net5.0 it is being serialized as

{
  "Dictionary": {
    "DictionaryKey": {
      "$type": "System.Collections.Generic.HashSet`1[[System.String, System.Private.CoreLib]], System.Private.CoreLib",
      "$values": [
        "string value"
      ]
    }
  }
}

The TypeNameHandling is set to Auto.

The problem of couse occurs when trying to deserialize old-netcoreapp3.1 events/messages into the application running on net5.0

My questions are basically:

  • are there any other such an undocumented breaking changes, like any other types were moved between libraries?
  • are you aware that a mentioned change could have caused some serious implications and have some kind of solution or workaround for that?
  • @JamesNK maybe you'd have some idea, how could we disable the type name handling for the dictionaries and collections from a framework, so upon deserialization the $type would be ignored for those types?
Author: maciejjarzynski
Assignees: -
Labels:

area-System.Collections, untriaged

Milestone: -

@huoyaoyuan
Copy link
Member

Moving types among assemblies isn't treated as breaking change for common cases. There's TypeForwardedTo added.

  • are there any other such an undocumented breaking changes, like any other types were moved between libraries?

There are, but not frequently. Types will be moved when the implementation structure requires so.

The problem of couse occurs when trying to deserialize old-netcoreapp3.1 events/messages into the application running on net5.0

Such serialization can be more incompatible with .NET Framework, since many types have moved and the corelib name has changed.

@maciejjarzynski
Copy link
Author

Moving types among assemblies isn't treated as breaking change for common cases. There's TypeForwardedTo added.

If it isn't, I think it should be treated as such.
As per SemVer specification:

Major version X (X.y.z | X > 0) MUST be incremented if any backwards incompatible changes are introduced to the public API.

And of course assembly name for the given type is the public API.

And my question (let me quote myself) was regarding any other such a breaking changes for .NET 5.0 release:

are there any other such an undocumented breaking changes, like any other types were moved between libraries?

@huoyaoyuan
Copy link
Member

If it isn't, I think it should be treated as such.

In public API, it stays remaining in the original assmebly. Serialization using reflection is getting implementation detail.

For example, there is no System.Private.CoreLib in public API. There is only System.Runtime.

Implementation detail can be broken in any release.

@MichalStrehovsky
Copy link
Member

The type has moved many times already. Serializers should ideally respect the TypeForwarded**From** attribute on the type and emit that as the owning assembly:

[TypeForwardedFrom("System.Core, Version=4.0.0.0, Culture=neutral, PublicKeyToken=b77a5c561934e089")]

Libraries have type forwarders (using the above mentioned TypeForwardedTo) to ensure apps looking for HashSet in System.Core will find it. This way the serialized data would be compatible all the way to .NET Framework 2.0.

@maciejjarzynski
Copy link
Author

Then a question to @JamesNK : does the Newtonsoft.Json respect the TypeForwardedFrom attribute, and if not, why?

@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Jan 18, 2021

That question should be asked over at the Newtonsoft.Json repo (https://github.com/JamesNK/Newtonsoft.Json). This repo is specifically for .NET.

(Also, standard disclaimer: TypeNameHandling.Auto should be used with extreme caution. It's a code execution equivalent. Your service's threat model should consider whether the entity supplying the JSON contents to deserialize is trustworthy.)

@JamesNK
Copy link
Member

JamesNK commented Jan 18, 2021

Issue about this here - JamesNK/Newtonsoft.Json#1972

This was suggested as a solution - JamesNK/Newtonsoft.Json#1972 (comment)

@danmoseley
Copy link
Member

Resolving as by design - we move types with Type Forwarding often. The attribute was specifically created so this could be done transparently (if it is respected)

@SimonCropp
Copy link
Contributor

@danmoseley can you comment on this issue JamesNK/Newtonsoft.Json#1972 ? ie should the fix be included in jsondotnet or in the runtime?

@danmoseley
Copy link
Member

Perhaps @steveharter can help there.

@SimonCropp
Copy link
Contributor

SimonCropp commented Feb 18, 2021

@dandanmu @steveharter note the "suggested solution" points to my workaround here, which i would consider a temporary hack, which is also in no way discoverable when the problem occurs

@SimonCropp
Copy link
Contributor

@danmoseley @steveharter can i provide any more information or help on this one?

@SimonCropp
Copy link
Contributor

SimonCropp commented Mar 16, 2021

@danmoseley @steveharter can this be re-opened. since it seems comments on closed issues are often lost in the ether

@danmoseley danmoseley reopened this Mar 16, 2021
@GrabYourPitchforks
Copy link
Member

In theory we could add a reflection API GetSerializedTypeName() : string hanging off of System.Type. But: (a) this doesn't help downlevel consumers unless they are willing to duplicate this code; and (b) putting type information in the serialized payload represents a massive security risk to consumers, and MSFT as a whole has largely abandoned serialization tech which would benefit from such an API. So... I'll put it on the table as a dirty option if we feel it's worth pursuing.

@jkotas
Copy link
Member

jkotas commented Mar 17, 2021

Adding APIs to improve binary serialization would violate the principles we have set in https://github.com/dotnet/designs/blob/main/accepted/2020/better-obsoletion/binaryformatter-obsoletion.md.

the "suggested solution" points to my workaround here, which i would consider a temporary hack

This hack is appropriate to workaround buggy implementation of binary serialization contract in Newtonsoft.Json. The proper fix is to migrate away from using binary serialization.

@SimonCropp
Copy link
Contributor

SimonCropp commented Mar 17, 2021

@jkotas this is happening in non-binary serialization

@GrabYourPitchforks
Copy link
Member

"Binary serialization" is the wrong technical term, but it includes the umbrella of "serializers which embed type information in the payload," so the general concept is applicable here. That's also what I was trying (poorly) to say earlier w.r.t. Microsoft not investing in this category of serialization tech any further.

@SimonCropp
Copy link
Contributor

so what is the plan here? the current status is far from ideal, ie a type load exception with no pointer to the how to resolve it. Expecting people to find a workaround 15 comments down on an issue is, IMO, not the "pitt of success". and the suggestion in binaryformatter-obsoletion.md of "move to System.Text.Json" is a poor answer with System.Text.Json being far from feature parity with json.net, again assuming they even follow the thread far enough to find binaryformatter-obsoletion.md.

@SimonCropp
Copy link
Contributor

but it includes the umbrella of "serializers which embed type information in the payload,"

you might want to update the title of that binaryformatter-obsoletion.md. since i suspect many would read the title and immediately assume "i am not using binaryformatter, so this in no way applies to me"

@GrabYourPitchforks
Copy link
Member

so what is the plan here?

You requested this issue be reactivated - I assumed you wanted to propose something? From the .NET libraries team's perspective, there's nothing for us to do since the issue appears to exist solely within the third-party library Newtonsoft.Json.

What I had suggested earlier is that if it would benefit the ecosystem, we could make BinaryFormatter.GetTypeInformation (or its equivalent) public. That method is the component that reads the [TypeForwardedFrom] attribute from all types in the object graph and reconstructs the original fully qualified type name. But if (if!) we were to do this, the soonest it would be is .NET 6, which means that the best course of action to unblock everybody right now would be to duplicate its implementation in your own code or in Newtonsoft.Json. But once people have duplicated it, there's not a significant need to make the implementation public, especially since it would be encouraging bad practice. So back to the original "would it benefit the ecosystem?" point - I think making an argument that it would be a net positive is an uphill battle.

@eiriktsarpalis
Copy link
Member

I'm going to close this issue since there don't seem to be any actionable proposals from the libraries perspective. Feel free to reopen if this changes.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Apr 19, 2021
@ghost ghost locked as resolved and limited conversation to collaborators May 19, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

9 participants