Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove BinaryFormatter from the shared framework in .NET 5 #29976

Closed
GrabYourPitchforks opened this issue Jun 21, 2019 · 87 comments
Closed

Remove BinaryFormatter from the shared framework in .NET 5 #29976

GrabYourPitchforks opened this issue Jun 21, 2019 · 87 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Milestone

Comments

@GrabYourPitchforks
Copy link
Member

We've known for a long while that BinaryFormatter is a dangerous type and that it's responsible for many of the security vulnerabilities real-world .NET applications have seen in recent years. In .NET Core 1.x, the type was removed from the product entirely, and it was brought back in .NET Core 2.0 under the guise of "app compat".

Though as of the time of this writing there are no known types in .NET Core which can be leveraged by BinaryFormatter to achieve RCE, it is impractical to make this guarantee going forward. As new types are introduced (or existing types from the full Framework are re-introduced) I have no doubt that one of them will contain such a vector that we weren't able to catch during development due to the complexity of this system.

Some applications use a custom SerializationBinder to mitigate attacks both on .NET Core and in full Framework. The engineering team's current understanding is that a properly-written SerializationBinder is sufficient to block RCE attacks. However, this is not a guarantee, and even so, a custom SerializationBinder cannot mitigate other attacks like DoS within the BinaryFormatter infrastructure itself.

All of this said, the conclusion is that BinaryFormatter is not an appropriate serialization format for application developers to be using in a cloud-connected world. It was a mistake for us to bring it back in .NET Core 2.0, as us re-introducing it into the shared framework gives the type an undeserved air of respectability, like we're nodding and implicitly approving its usage by modern applications.

To address this, and to eventually migrate applications off of BinaryFormatter, I propose the following steps. These steps will only affect customers who target .NET 5 or above for their applications.

  1. Remove BinaryFormatter from the shared framework and into its own standalone package that must be manually retrieved and installed into the application.

  2. Mark the BinaryFormatter type itself with [Obsolete], or include in the package a code analyzer which warns on the type's usage. Force developers to explicitly acknowledge that they're calling an extremely dangerous API and that they must be certain the data is coming from a trusted location.

  3. Document the BinaryFormatter code base as legacy only. Clearly communicate that Microsoft is making no further investments in this code base, and all bug reports or feature requests will be immediately closed.

  4. Create a separate "safe" (type-limiting, bounds-checking, etc.) serializer whose payload format is BinaryFormatter-compatible and which can be used to deserialize existing payloads in a safe fashion, even if those payloads originate from an untrusted source. The API surface will look different than the traditional BinaryFormatter.Deserialze call due to needing to configure the serializer on a per-call basis, but at minimum it gives applications a way to read existing payloads. This allows them to migrate their data without leaving them hanging. This serializer could be in the same package that BinaryFormatter is in.

These steps would serve to modernize the runtime and SDK by eliminating a type which has no place in today's development cycle, it serves as a forcing function to get application developers to fix potential vulnerabilities in their code, and it provides on an opt-in basis a way for applications to continue to work if they really must continue using the type going forward.

@blowdart
Copy link
Contributor

tenor

@Symbai
Copy link

Symbai commented Jun 21, 2019

BinaryFormatter has the advantage that it's incredibly fast. Faster than any other formatter, by a lot. If you consider to remove it, please consider to provide a reasonable alternative, which is fast and secure. If you do, then I don't see any reason to not agree with that, regardless whether my personal opinion counts or not.

@NickCraver
Copy link
Member

@Symbai I think a very important thing to keep in mind here is that BinarySerializer is not fast and secure. It is fast largely because it is not secure.

@GrabYourPitchforks
Copy link
Member Author

@Symbai If you're looking for something where the payload format is human-readable, check out the work taking place in this repo in the System.Text.Json namespace. Or if you want pure performance over human readability, you may want to check out Google's ProtoBuf project, which has C# APIs.

In the proposal I have in this issue, I do suggest that we should create a BinaryFormatter-compatible "safe" deserializer, but performance wouldn't be a driving factor of such a serializer. (Though we wouldn't do anything to intentionally tank performance.) One of the other formats might be more appropriate for your needs given this.

@ChrisMcKee
Copy link

Pushing it into its own package seems a reasonable compromise.

@tebeco
Copy link

tebeco commented Jun 21, 2019

is it a good idea to refer to benaadams/System.Ben#107
for "fast and secure" binary formatter ?

Too soon ?

@samsosa
Copy link

samsosa commented Jun 22, 2019

Or if you want pure performance over human readability, you may want to check out Google's ProtoBuf project, which has C# APIs.

Than maybe replace the BinaryFormatter implementation under the hood with ProtoBuf?

@ViktorHofer
Copy link
Member

I'm sure you are considering partners here which are relying on BinaryFormatter in .NET Core. Keep in mind to loop them in whenever you are starting this effort.

@mgravell
Copy link
Member

mgravell commented Jun 24, 2019

@samsosa just to be explicit re:

Than maybe replace the BinaryFormatter implementation under the hood with ProtoBuf?

That isn't a realistic option. Protobuf is a powerful tool, but it has a different feature-set to BinaryFormatter, and there is no good way of making one pretend to be the other, in either direction. Just like XmlSerializer, Json.NET etc are all fundamentally serializers, but don't have he exact same feature set. If people want to use protobuf, that's fine, but they'd need to do that knowingly and deliberately.

@mconnew
Copy link
Member

mconnew commented Jun 24, 2019

I would like to try an put a stop to the trend I'm seeing in multiple issues of ProtoBuf being the solution to all things serialization. It is not a good serialization format, and it was never designed to be one. It is designed to be a compact way to represent a limited set of features for sending an object graph to a gRPC endpoint. It has very severe limitations when trying to be used for general purpose serialization.

It doesn't have support for references so if an object appears in an object graph multiple times, it will be completely serialized multiple times. We've had customers get tripped up by this before when they overrode Equals or GetHashCode incorrectly and ended up with two separate objects on deserialization which caused application logic problems.

The lack of references means it can't handle cycles in your object model which will cause infinite recursion on serialization. It simply cannot handle that scenario at all.

There have been attempts to come up with a way to support references to allow for these two scenarios. The one that protobuf.net does needs explicit opt-in for the types/members which need this support. This requires understanding your entire object model and know where cycles might exist and when having duplicates will be a problem. This could be a LOT of work depending on how complex your data model is. It greatly complicates your on-wire representation and none of the benchmarks I've seen have been against an OM using these features. Any benchmarks on size and speed of serialization need to take this into account. My understanding is that the Google implementation of gRPC serialization doesn't support this feature and expects you to support this mechanism at your OM level and not at the serializer level.

As the response from the ProtoBuf team has been that they aren't going to build this capability into the protocol and go implement your own solution, you lose the interoperability that ProtoBuf is supposed to bring as there will always be competing ways to do this by different ProtoBuf libraries.

Protobuf is a bad format for use with an existing set of classes. It's best use is when designing a new OM with these limitation in mind. E.g. I'm creating a new gRPC endpoint and so define my data model using IDL and then convert that to classes. Going the reverse direction is going to be problematic for many people as Protobuf is not a general purpose serialization format.

Incidentally, I read a blog post comparing MessagePack, ProtoBuf, JSON.NET and DataContractSerializer comparing size of output and speed of serialization and deserialization where it showed DataContractSerializer to be the worst on both fronts. I cloned their code and added a variation for DataContractSerializer using a binary XmlDictionaryReader/Writer and the size was about 10% bigger than ProtoBuf, and the time to serialize and deserialize was about 10% slower than ProtoBuf. If you were to have multiple duplicate objects and you turned on the reference support in DCS, you could easily create a case where DCS is both faster and smaller than ProtoBuf.

@GrabYourPitchforks
Copy link
Member Author

@mconnew Thank you for the excellent analysis! You're right - I should have been more careful with my wording in my earlier comment. I didn't intend to suggest that ProtoBuf is The One True Replacement(tm) going forward, just that it's one of many options available to developers depending on their scenarios.

I think part of the solution is that we need to convince the ecosystem that a general purpose serializer that can handle arbitrary object graphs is not a sustainable concept. Serializers like DataContractSerializer handle this to a limited extent, but they also place restrictions on the shape the graph must take and limits on what deserialization looks like. Even support for cyclic object graphs is inherently unsafe unless the application explicitly expects to deserialize such cycles, as DataContractSerializer allows on an opt-in basis.

Whether this is a documentation issue that gives best practices for creating DTOs which can be safely serialized, whether we recommend switching to a safe but restricted-capability serializer like DataContractSerializer, or whether we take some other course of action are all up in the air. (It's also possible that we do nothing, but clearly I'm opposed to that idea since I opened this issue. 😊)

Another option - though one that might be more difficult to swallow since it'll require significant work within the ecosystem - would be to create a set of educational guidelines that explains in detail the concepts behind serialization, trust boundaries, and how the two interact. This would make individual developers responsible for analyzing their specific scenarios and choosing appropriate data models and serialization formats which are appropriate for their needs, while also giving them the knowledge to make trade-offs when necessary. I'm not a fan of this solution by itself because it presents a steep learning curve for our audience and goes against .NET's general premise of creating usable frameworks. But perhaps these educational guidelines could be useful in conjunction with another more approachable solution.

@mconnew
Copy link
Member

mconnew commented Jun 24, 2019

@GrabYourPitchforks, my comment was in response to my seeing multiple places where people were suggesting using ProtoBuf for serialization. It's NOT a serialization format and I just wanted to make sure I put my thoughts out there somewhere contextually relevant.
I believe that DataContractSerializer does about the best job it's possible to do with serializing. We could do with adding some helpers to enable the use of the binary XmlDicationaryReader/Writer as the dictionary needs to be transmitted OOB otherwise you pay the high cost of the Text output.

@mgravell
Copy link
Member

@mconnew I don't want to distract from the conversation about BinaryFormatter, so I don't suggest "here", but I'd love to have an offline conversation with you about what does and doesn't qualify as a "serialization format" and why; it seems we have some different ideas there, and I'd love to hear a new perspective. I'm easy to get hold of if you're willing :)

@MarcoRossignoli
Copy link
Member

but I'd love to have an offline conversation with you about what does and doesn't qualify as a "serialization format" and why;

IMO would be useful for all devs have an "open" discussion/listening on it...all devs have to serialize something sometimes in career and the hide issues and choices are not so trivial.
Maybe a linked "discussion" to this issue.
Not mandatory though.

@mgravell
Copy link
Member

@MarcoRossignoli great idea; I've dropped a discussion here; I'd love your input there if you have the time, @mconnew

@weltkante
Copy link
Contributor

weltkante commented Jul 1, 2019

BinaryFormatter is an established format to transfer data through the clipboard and during drag'n'drop operations and likely to be in use in existing Desktop Framework applications. If you rip that out of the .NET Core framework without a replacement that can produce/consume the same binary data then .NET Core 5 applications will be unable to interact with Desktop Framework applications through drag'n'drop/clipboard when custom formats are in use.

@GrabYourPitchforks
Copy link
Member Author

Thanks @weltkante for your feedback! Your point that there needs to be some replacement is well-taken, and I think the proposal outlined here addresses the issue. Let me elaborate.

The proposal here is to remove BinaryFormatter from the shared framework specifically. The consequence is that any component that is part of the shared Framework (such as WinForms or WPF) or any component that references only the shared Framework without any additional external NuGet packages will not be able to perform arbitrary object deserialization via BinaryFormatter. In fact, both WinForms and WPF have already made strides in this regard (e.g., dotnet/wpf#1132) to disallow arbitrary object deserialization while still retaining support for deserializing very specific allowed types.

In the scenario you describe, even though the Framework itself won't have default built-in support for deserializing arbitrary objects on clipboard / drag-and-drop operations, the application can continue to support the legacy behavior via hooking the clipboard / drag-and-drop handlers. For example, if the application hooks the "paste" handler and determines that the incoming data is a BinaryFormatter-formatted payload, the application's custom handler can call into the deserializer directly. I conceptualized a "safe" BinaryFormatter-payload compatible deserializer earlier in this issue, and I'd encourage application developers to use that deserializer in place of the legacy deserializer whenever possible. But in the end, even though we may discourage it, nobody will prevent application code from invoking the old unsafe arbitrary object deserialization routine. The difference going forward is that the Framework itself shall not call into the unsafe deserializer on the application's behalf; the app developer explicitly needs to write that code.

This should provide a viable outline for applications which absolutely need to retain the existing Full Framework behaviors to be ported to .NET Core.

@weltkante
Copy link
Contributor

Ok, I see where you are going with that, moving it into a separate component may work. However note that just deserialization won't be enough for all compat scenarios, interacting with Desktop Applications through clipboard or drag'n'drop may also require serialization (e.g. if you want to put something into the clipboard that the existing Desktop Application should consume, or drag something onto a Desktop Application Window).

@HaraldMuehlhoffCC
Copy link

HaraldMuehlhoffCC commented Oct 28, 2019

Just a quick reminder that new client / server applications that cost millions to develop and will have to be maintained for more than a decade are currently being built using BinaryFormatter. Even in times of "assume breach" using BinaryFormatter over a secured internal network connection has it's merits and the stability of the API surface is crucial for the credibility of the .NET platform.

Moving the BinaryFormatter to a separate Nuget package in .NET 5 maybe an option but

Document the BinaryFormatter code base as legacy only. Clearly communicate that Microsoft is making no further investments in this code base, and all bug reports or feature requests will be immediately closed.

goes too far from my point of view. Especially as it's not completely clear that a viable (=feature-complete; drop-in - with some configuration) more secure replacement that will handle serialization AND deserialization will be available.

Currently we're using .NET Framework on the client (MvvmCross, WPF) and .NET Core 2.2 (ASP.NET Core) on the server planning to migrate both to .NET Core 3. Having to live with the fear (=project risk) that Microsoft may "drop" / declare obsolete APIs just added to their newest platform is not a great outlook.

Still I appreciate that this discussion is happening in the open and customers can participate and hopefully be heard ;)

@GrabYourPitchforks
Copy link
Member Author

@HaraldMuehlhoffCC Thank you for your feedback. Your comments hit on a few key points which I think are prime examples of why I had proposed this work item to begin with.

You are correct in that new applications (in 2019!) are being built with BinaryFormatter dependencies. IMO this is in large part due to the fact that Microsoft as a whole has not presented a unified message saying "do not use this feature going forward." Yes, there's the occasional blog post or security conference where we mention the problems, and the official docs discourage its use. But out of our entire developer ecosystem only a small fraction are tuned into those channels. The proposal here is intentionally disruptive, forcing developers who are using this type (or who want to use it) to take explicit action to continue using it in future versions. The acknowledgement process doesn't need to be difficult. Perhaps the process is as easy as pulling down another nuget package into the application - that's only a few button clicks or a single NuGet Package Manager command. Or perhaps it's adding a particular line to the .csproj file.

Even in times of "assume breach" using BinaryFormatter over a secured internal network connection has it's merits

As a security professional, I strongly disagree with this statement. This might be an appropriate assumption for your application today, as perhaps you control both the client and the server in tightly restricted environments. But as the application evolves, it's almost certain that this assumption might not hold going forward. Want to move your service to the cloud? Want to deploy the client on customers' workstations? Want to add the ability for additional less-authenticated users to access the application (perhaps in a read-only mode)? These processes are all now significantly more difficult because the application protocol is based on the insecure BinaryFormatter format.

Another way to think about this is to ask the following questions: "Would the server be comfortable allowing the client to upload an arbitrary .exe to the server and run it as part of request processing? Would the client be comfortable allowing the server to transmit an arbitrary .exe and run it as part of response processing? Is it acceptable for this behavior to be a permanent part of the application's request/response protocol?"

I would wager that vanishingly few of our developer audience would answer affirmatively to all three questions. But because our messaging on BinaryFormatter has been somewhat weak, our developers by and large don't realize that this is what they're signing up for. In my mind that's far more perilous for the ecosystem than any API adjustment would be.

@HaraldMuehlhoffCC
Copy link

@GrabYourPitchforks Thanks for your comment. We have indeed control both of the client and the server and it's running in a tightly restricted environment (electric utility companies). And this won't change for the client I'm talking about. Our web client is of course using a different protocol (JSON).

I don't quite get your running arbitrary code analogy. In a sense all protocols use some binary encoding (in the JSON & XML cases it's just UTF-8 ...). Just because the BinaryFormatter isn't using a human readable format doesn't mean that it's creating some sort of executable. Or is there some dark magic inside the implementation generating and executing IL? But still then it would be possible to add sanity / security checks ... this was the whole point about IL.
Also we're not dependant on the exact details of the binary format as we don't store it. So if there is something wrong with the implementation details of the current encoding we'd be happy to switch to another Formatter AS LONG AS it would be feature-compatible and not much slower.

Finally my primary concern with your proposal was

Document the BinaryFormatter code base as legacy only. Clearly communicate that Microsoft is making no further investments in this code base, and all bug reports or feature requests will be immediately closed.

How does this make the systems that will still be relying on BinaryFormatter any safer? Especially the "all bug reports [...] will be immediately closed" part ...

@blowdart
Copy link
Contributor

Just because the BinaryFormatter isn't using a human readable format doesn't mean that it's creating some sort of executable.

I'm afraid you're incorrect. It's not a matter of human readable versus binary, it's how the serialization works. BinaryFormatter puts the class information inside the serialized data. When it's deserialized that information is used to create the objects embedded in the file. This is dangerous, as there are classes within .NET which can be abused to deliver code execution. In fact json.net has the same option, but it's off by default. In BinaryFormatter we can't turn it off at all.

There have been numerous presentations about this at various hacker conferences, the seminal work being Are you my type? and the best recent one is Friday the 13th JSON Attacks.

@mgravell
Copy link
Member

mgravell commented Oct 28, 2019 via email

@blowdart
Copy link
Contributor

And if you want to see examples of binary formatter abuse in action; https://youtu.be/kz7wmRV9xsU?t=1273

@HaraldMuehlhoffCC
Copy link

HaraldMuehlhoffCC commented Oct 28, 2019

Thank you, @blowdart and @mgravell! I get that security is not free & I've been actively following quite a few security folks on Twitter. Nevertheless for our very large object model I wouldn't necessarily want to add as much metadata as protobuf-net seems to require (even though we generate part of our model via a custom ORM and could generate attributes for some of the classes programmatically).

We also do some clever stuff serializing LINQ-Queries using custom serializable Expression trees; even serializing Types (via their names & structural info in some cases). But we make sure on the server that the resulting queries are safe.

I would prefer some sort of whitelisting the classes over all this verbosity. Actually shouldn't I be able to do exactly this in a custom ISurrogateSelector implementation (or custom SerializationBinder)?!

Also we control the server, the client, the network and secure the communication channel; so I'm still not completely sold that the security implications are so pressing in our case.

And as other formatters can be made unsafe by adding types / classes (see https://www.newtonsoft.com/json/help/html/SerializationSettings.htm#TypeNameHandling) BinaryFormatter can be made safe(er) by controlling the (de)serialized types.
So from my point of view there's nothing inherently "evil" with regards to BinaryFormatter.

@blowdart
Copy link
Contributor

As soon as we start adding limitations it's no longer compatible. We can't break the entire world like that. And surrogates have their own problems with DoS and OOM attacks.

@HaraldMuehlhoffCC
Copy link

HaraldMuehlhoffCC commented Oct 28, 2019

@blowdart "With great power comes great responsibility" ;)

The currently available API should make it possible to add whitelisting types if required; see public virtual ISerializationSurrogate GetSurrogate(Type type, StreamingContext context, out ISurrogateSelector selector) in ISurrogateSelector (or SerializationBinder - whichever works best).

I'm with you that it's a good idea to make using the BinaryFormatter a conscious choice; even maybe adding whitelisting as a (changeable) default in .NET 5. I'm not seeing how this would be no longer compatible... (changes would have to be made anyway to reference an external Nuget package).

But forcing your customers to roll their own solutions won't make the world safer either.

Thanks everyone for the enlightening discussion; I have to get a few hours sleep as I have to get up quite early over here in Germany.

@blowdart
Copy link
Contributor

Moving it to a separate package doesn't mean you roll your own, you just have to opt-in to using it, with suitable dire warnings.

And then point dotnet/corefx#4 in Levi's plan would get you a safer one anyway.

@HaraldMuehlhoffCC
Copy link

@blowdart Then just keep it supported (or at the very least hand it to the community like WCF) ... otherwise customers would still be required to "roll their own" or use some other formatter - that wouldn't necessarily be safer - once critical (security) bugs were discovered. (I'm not talking about the architectural shortcomings you've written about)

Document the BinaryFormatter code base as legacy only. Clearly communicate that Microsoft is making no further investments in this code base, and all bug reports or feature requests will be immediately closed.

I will look into adding whitelisting for our BinaryFormatter usage utilizing the existing APIs. So hopefully something good will come from this discussion! ;)

@GrabYourPitchforks
Copy link
Member Author

I will look into adding whitelisting for our BinaryFormatter usage utilizing the existing APIs. So hopefully something good will come from this discussion! ;)

Per @blowdart's earlier comment (and my original description in the issue), surrogates have their own problems that won't always protect you.

The simple and unfortunate fact is that the BinaryFormatter type is unfixably insecure. Full stop.

That fact is what led to this issue being opened in the first place. Most customers aren't aware that the above sentence explains the state of the world. Hence my desire for somewhat drastic measures to make them aware of this, to migrate existing code away from BinaryFormatter, and to institute within their organizations a blanket ban on using the type in all new code.

@stephentoub stephentoub modified the milestones: 5.0.0, Future Jul 7, 2020
@stephentoub
Copy link
Member

I've moved it to future.

@GrabYourPitchforks
Copy link
Member Author

Thanks for moving it. I'm finishing up the new plan and should have something to publish in the next few weeks. That should supersede this issue, at which point this can be closed.

@Ozzard
Copy link

Ozzard commented Jul 10, 2020

sigh Having moved to use Immutable* in a few places, once again I've been hit by this same issue in .Net Core 3.1 vs. Framework - and I thought I'd managed to get away from it when I wrote my own Random ("because we won't add [Serializable] to it" and stopped using delegates. Nope.

Please can I request a more useful response than "Don't use BinaryFormatter"? What tool do I use to serialize/deserialize object graphs of classes where:

  • I have arbitrary graphs,
  • speed is important,
  • versioning is irrelevant,
  • and a third party cannot arbitrarily prevent me from serializing objects of their classes because they don't like my use case?

@GrabYourPitchforks
Copy link
Member Author

and a third party cannot arbitrarily prevent me from serializing objects of their classes because they don't like my use case?

Can you clarify this? I don't quite follow. Your object graph contains instances of some arbitrary type Namespace.Foo, the type author didn't intend for that type to be serializable and didn't do the work to make it serializable, and you're looking for a tech that can correctly serialize these instances? I don't quite see how this scenario can ever work in any reliable fashion without support from the type author.

@Ozzard
Copy link

Ozzard commented Jul 10, 2020

Can you clarify this? I don't quite follow.

Let's see!

Your object graph contains instances of some arbitrary type Namespace.Foo,

Yes.

the type author didn't intend for that type to be serializable

Don't care. Shouldn't be the type author's concern unless they're trying to ensure something special like serialization being compatible between versions, implementations, or architectures.

and didn't do the work to make it serializable,

But also didn't necessarily do any work to make it not serializable. System.Random is a good example of that; for a given version on a given architecture, it serializes just fine.

and you're looking for a tech that can correctly serialize these instances?

No. I'm looking for a tech that allows me to test - for a particular version of all code and libraries, accepting that new versions of any code may change my results - whether a particular type happens to be serializable to within my acceptable limits, and to use those test results if I find out that the type happens to work. I would strongly prefer it if the tech had warnings plastered all over it that that's all it does, that it's insecure, and that you should keep your eyes peeled when using it. Caveat user.

I don't quite see how this scenario can ever work in any reliable fashion without support from the type author.

Same as any other black / white box is made reliable in use: by eyeballing the code, testing, and by deducing that you can't do it if the tests don't pass. If I can't get reliability that's guaranteed by the author - which is quite a high burden for the author - then I want to be able to examine the type and judge reliability for myself, in my own use case.

What I don't want is for someone who doesn't know my use case to be able to prevent me from examining and testing their type to determine whether it's sufficiently usable for my use case.

Does that help?

@weltkante
Copy link
Contributor

weltkante commented Jul 10, 2020

Your problem isn't solvable in general purpose but maybe thats not your goal

Things like interop handles and pointers cannot be serialized without cooperation of the type author. e.g. trying to forcefully serialize classes from a UI framework will not be doable without the frameworks cooperation since there is a lot of interop with the underlying platform.

Even outside of UI framework, even simple things like weak references that are tied with the runtime will not be serializable without cooperation.

Also any class can have numeric indices into static caches/arrays, so unless you serialize the whole program state of static variables those indices will be wrong upon deserialization as well.

@GrabYourPitchforks
Copy link
Member Author

Sounds like you want a "damn the torpedoes" style serializer. Something that recursively iterates over all the fields in a type and bitblts them into a byte buffer somewhere, with no regard for any deserialization-related attributes or API conventions. That's an interesting use case I've never heard of before. I'm not aware of any in-box serializer that follows this logic. But in theory it should be straightforward enough to write one, assuming the serializer isn't intending to special-case pointers / handles / reflection types / etc, and just relies on the caller not passing it object graphs which contain those fields?

@Ozzard
Copy link

Ozzard commented Jul 10, 2020

@weltkante - agree entirely that this isn't solvable in general, and you're also absolutely correct that that isn't my goal. I agree with all of your points about ways of stopping this from working, and I've no doubt there are many more. I've got considerable experience at VM and image level with Smalltalk, so the whole special classes / weak references / oddnesses thing isn't new to me. Many Smalltalks can save image segments, which are effectively portions of the Smalltalk object graph; support for reloading those and, in particular, re-linking them with what's outside the segment is distinctly variable.

@GrabYourPitchforks - Almost. It's definitely a "Doctor, it hurts when I pass the serializer something that breaks it / Well, don't do that!" serializer. I want something that respects serialization information provided by type authors when it is present, but does not conflate a lack of explicit serialization provision ("I haven't said you can...") with an explicit lack of serialization provision ("... therefore you can't"). That's more complex than a straight blitter. It's actually very close to BinaryFormatter as it currently stands in Framework; except for the combination of BinaryFormatter assuming it's locked out of a type unless it sees SerializableAttribute, and the base library authors only having SerializableAttribute to mark information for multiple different use cases and being very conservative with the types they choose to mark.

I wouldn't even mind having to define explicitly when instantiating the serializer whether it should serialize types that don't provide any guarantees on architecture or version. It's perfectly reasonable to provide barriers to this behaviour such that folks who wish to damn the torpedos have to opt into said damning, and make it very explicit that you're opting in to behaviour that is not guaranteed by the type authors.

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jul 10, 2020

@Ozzard I can attest that nuget contains many serialisers that have the capabilities you require, though like you said, caveat user. So I guess the question is, what can we do from the perspective of the framework to better support such libraries?

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Jul 10, 2020

folks who wish to damn the torpedos have to opt into said damning, and make it very explicit that you're opting in to behaviour that is not guaranteed by the type authors.

This is already achievable to a large extent through opt-in via a surrogate selector. It allows you to take full control of the serialization pipeline for arbitrary types, including types that aren't otherwise serializable. But as you mentioned, caveat user.

using System;
using System.IO;
using System.Reflection;
using System.Runtime.Serialization;
using System.Runtime.Serialization.Formatters.Binary;

namespace Demo
{
    class Program
    {
        static void Main(string[] args)
        {
            Person p = new Person("John", "Doe");

            var ms = new MemoryStream();
            var bf = new BinaryFormatter();
            // bf.Serialize(ms, p); // this line would throw

            // !! WARNING !!
            // This is not a good idea, and it's not officially supported. Attempting to BinaryFormatter serialize
            // an instance of a type that is not marked as serializable can lead to all sorts of problems, and
            // even a seemingly innocuous servicing release could bust your application. But if you wish to
            // serialize them anyway, this sample shows how to do so.
            SurrogateSelector selector = new SurrogateSelector();
            selector.AddSurrogate(typeof(Person), new StreamingContext(StreamingContextStates.All), new SimpleReflectionSurrogate());
            bf.SurrogateSelector = selector;

            bf.Serialize(ms, p);
            ms.Position = 0; // rewind stream
            var deserializedObj = (Person)bf.Deserialize(ms);

            Console.WriteLine("Deserialized: " + deserializedObj);
        }
    }

    // n.b. Not marked [Serializable]!
    class Person
    {
        // parameterful ctor
        public Person(string firstName, string lastName)
        {
            FirstName = firstName;
            LastName = lastName;
        }

        public string FirstName { get; }
        public string LastName { get; }

        public override string ToString() => $"{LastName}, {FirstName}";
    }

    class SimpleReflectionSurrogate : ISerializationSurrogate
    {
        public void GetObjectData(object obj, SerializationInfo info, StreamingContext context)
        {
            // walk type hierarchy if necessary
            var fieldInfos = obj.GetType().GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance);
            foreach (var fieldInfo in fieldInfos)
            {
                info.AddValue(fieldInfo.Name, fieldInfo.GetValue(obj));
            }
        }

        public object SetObjectData(object obj, SerializationInfo info, StreamingContext context, ISurrogateSelector selector)
        {
            // walk type hierarchy if necessary
            var fieldInfos = obj.GetType().GetFields(BindingFlags.Public | BindingFlags.NonPublic | BindingFlags.Instance);
            foreach (var fieldInfo in fieldInfos)
            {
                fieldInfo.SetValue(obj, info.GetValue(fieldInfo.Name, fieldInfo.FieldType));
            }
            return obj;
        }
    }
}

@mgravell
Copy link
Member

mgravell commented Jul 10, 2020 via email

@Ozzard
Copy link

Ozzard commented Jul 13, 2020

@eiriktsarpalis - well, you asked :-):

  • Within the framework, for as many types as possible, provide a standard, known-long-term-supported way to round-trip their state, including support that allows serializers to serialize and reconstruct directed (potentially cyclic) graphs of objects.
  • In particular, this support should include all the common types that people use for data structures, including Immutable*.
  • Ideally include support for other stateful classes, such as System.Random, where the state is purely managed data.
  • Be clear within the framework where serialization is guaranteed to be portable between framework versions or processor architectures, so that users can accept the limitation otherwise that deserialization will fail without identical versions and architectures.
  • Ideally provide a serializer to make use of these capabilities, though indicating suitable third-party serializers might also work (in the same way that Newtonsoft.Json is sometimes indicated).

@GrabYourPitchforks - many thanks, Levi! I'm very happy with the warnings in the code. I expect to trip over a significant number of gotchas in use that will need ironing out; but at least it's possible to iron them out now.

@mgravell - I know the feeling. I started looking at general object serialization for C in the late '80s using (of all things) Sun's XDR libraries, as part of a project to ship object graphs between processes across TCP, and have done quite a bit of work on it since, though by no means most of my career!

I don't like the idea of taking over serialization of types without the co-operation of the type author, but consider my options here:

  • I apparently cannot rely on the .Net Core team supporting serialization of any type in the future; there is repeated talk of these approaches being deprecated.
  • The scope of the supported .Net Core types for serialization is already reduced from .Net Framework (which has already caused me problems), so there's evidence that this preference for reducing scope is being acted on.
  • I'm a one-person dev house. It may well be cheaper for me to change platforms to something that supports serialization than to write and maintain my own copies of things like Dictionary and ensure they remain performant.
  • These are not "hostile" types in the sense that someone can inject arbitrary types into the object graph or bytes into the stream for deserialization; they are known types, of known versions, whose structure and behaviour I can understand. There are clearly situations where that wouldn't be the case. I wouldn't use this approach in those situations.

As with almost all technology development, my job is to maximise the total benefit to my client of a solution I create versus the effort involved in its creation. "Benefit" is quite broadly drawn here, including things like reliability, security, and performance as well as pure functionality; and note that different projects have different trade-offs there. Given the options of a) not doing it, b) duplicating and maintaining my own versions of many already-implemented types, or c) bypassing safeguards on a few types in a known way and analysing and testing the heck out of it, my rational option is C. I'd prefer d) already having serialization support in the types I use, because that would improve my ability to get benefit for my clients out of .Net Core, but it would appear that that the .Net Core team won't give me that as an option; so let's go with what's possible.

@GrabYourPitchforks
Copy link
Member Author

This plan is now out-of-date and I am closing the issue. Thanks all for the feedback!

@danmoseley
Copy link
Member

Linking dotnet/designs#141

@dje-dev
Copy link

dje-dev commented Jul 14, 2020

I have used .NET professionally for more than 15 years and my company has built a multi-million-line codebase which supports our business operations. Binary serialization is a key tool we leverage extensively internally. There is no security concern because (among other reasons) all the development is internal within a small team. The ability to serialize arbitrarily complex graphs with high performance allows us to store and then retrieve intermediate results of our complex models, facilitating validation and incremental work. Please do not remove, this would be very negatively impactful to remove an existing key functionality that we have built upon for 15 years. Feel free to reach out to me off-line for more background.

@danmoseley
Copy link
Member

@dje-dev the design linked above might be a place to share input.

@Ozzard
Copy link

Ozzard commented Jul 14, 2020

@danmosemsft - given that the team appears to have ignored all other input in this area that demonstrates that people are using this functionality in real products, is there any chance of @dje-dev being listened to or are they wasting their effort?

@danmoseley
Copy link
Member

@Ozzard I think it is very likely that we will implement the linked proposal in some form, because of the long and continuing history of highly impactful security issues generated by BinaryFormatter use - even though we appreciate there are contexts where all data is trusted. This one is only the most recent.

Feedback on the process, timeframe and approach are certainly welcome. I know that we need to provide more guidance on other serialization options and @GrabYourPitchforks is working on that.

@danmoseley
Copy link
Member

It's also worth noting that the proposal relates to .NET Core 5+ (really, 6+). .NET Framework applications are not affected and .NET Framework will be supported a very long time.

https://github.com/dotnet/designs/pull/141/files#diff-9324781e4d7a839a358bc384f65a8327R7

@frankabbruzzese
Copy link

@danmosemsft ,

I think there is a confusion between serialization for communication and for saving/restoring the state of an object graph.

The security problem is caused by BinaryFormatter supporting Polymorphism, which is a necessary requirement for saving/restoring the state of object graphs.

While it is right banning polymorphic serializers from communication protocols, since both parts must agree on the objects to exchange, by banning them in general, we de facto prevent saving/restoring an object graph that represents the state of an ongoing computation, which is unacceptable!

Applications, of saving/restoring the state of an on-going computation, include but are not limited to:

  1. Saving the state of a complex application (video-games, design applications, and so on).
  2. Virtualizing stateful microservices instances, by removing them from RAM when they are not used.

While I can accept the removal of the BinaryFormatter, I can't accept the general principle of banning polymorphic serializers. While reading, the long term plan for BinaryFormatter removal it appears that the whole infrastructure will be removed including attributes, interfaces,....to prevent anyone else from providing an alternative.

Is possible to ask, at least, that the attribute infrastructure is not removed? This way people that are using BinaryFormatter for saving/restore can provide a custom alternative WITHOUT losing tonnes of code decorated with those attributes?

@weltkante
Copy link
Contributor

Is possible to ask, at least, that the attribute infrastructure is not removed? This way people that are using BinaryFormatter for saving/restore can provide a custom alternative WITHOUT losing tonnes of code decorated with those attributes?

If its not removed but not supported either you have high risk that it erodes over time. Newly added classes won't be annotated correctly. Previously annotated classes will be changed in incompatible ways. If you keep the annotations you need to keep testing and supporting them for their purpose.

(Just stating this as a fact, not as an argument for or against removing things.)

@frankabbruzzese
Copy link

@weltkante,
Also possible to remove BinaryFormatter but still to keep support for the infrastructure, that is, for custom implementations of IFormatter.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Runtime
Projects
None yet
Development

No branches or pull requests