Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deserialize byte array which was serialized by the native BinaryFormatter of .NET framework 4.5.2 #13

Closed
sonnatager opened this issue May 17, 2024 · 5 comments
Labels
question Further information is requested

Comments

@sonnatager
Copy link

I have the following Situation:

I had an old .NET Framework 4.5.2 application that in the past stored the following objects serialized by the native BinaryFormatter in a SQL database:

[Serializable]
[DataContract]
public class MyClass
{
    [DataMember]
    private ConcurrentDictionary<DaylightFactorType, double[]> _values = new();
}

Now I have upgraded the application to ASP.NET 8 and when I now try to load and deserialize the data using the native BinaryFormatter, I get the following exception:

Type 'System.Collections.Concurrent.ConcurrentDictionary`2[[MyClass, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null],[System.Double[], System.Private.CoreLib, Version=8.0.0.0, Culture=neutral, PublicKeyToken=7cec85d7bea7798e]]' in Assembly 'System.Collections.Concurrent, Version=8.0.0.0, Culture=neutral, PublicKeyToken=b03f5f7f11d50a3a' is not marked as serializable.

So I thought I'd use the KGySoft.CoreLibraries to deserialize the data, because it should be possible to deserialize types that are no longer marked as [Serializable].

But the problem now is that I always get null back from the method BinarySerializer.DeserializeFromStream<T>(stream).

How can I solve this? Because the DB is actually read-only for me, and it would be a big effort to fill the database again.

@koszeggy koszeggy added the question Further information is requested label May 18, 2024
@koszeggy
Copy link
Owner

koszeggy commented May 18, 2024

So I thought I'd use the KGySoft.CoreLibraries to deserialize the data, because it should be possible to deserialize types that are no longer marked as [Serializable].

But the problem now is that I always get null back from the method BinarySerializer.DeserializeFromStream<T>(stream).

Please note that the binary stream format of my serializer is not compatible with BinaryFormatter. See also this SO question for a very similar scenario.

If you use .NET 8 now, you might have already noticed that as of today, making BinaryFormatter work requires an explicit configuration (EnableUnsafeBinaryFormatterSerialization). Please note that in the future .NET versions BinaryFormatter will be completely removed, though some read-only object graph reader is being implemented for .NET 9. Keeping that in mind, you have the following options:

1. If you still MUST use the original BinaryFormatter stream:

I assume here that the DB is read-only, so it still must use the .NET Framework 4.x payloads, but the front-end switches to .NET 8. Whereas I understand that this may happen in a project's life, I must mention that this is a very suboptimal situation as you will not be able to switch to .NET 9+. And actually this is the exact scenario why BinaryFormatter is being obsoleted: deserializing data from a database is not a secure scenario at all as the data can be manipulated so your system is vulnerable against various possible security attacks.

In this case you cannot use my serializer to deserialize the original data. You still have to use BinaryFormatter with a custom surrogate selector that allows you to manually process the problematic subrapgh. Actually this is the reason why I created the CustomSerializerSurrogateSelector class. It allows supervising every step during [de]serialization. But please note that I also obsoleted this class in .NET 8 and above because it just helps using the insecure solutions by providing hacky workarounds.

In trivial cases (ie. when the only change is that the [Serializable] attribute has been removed) you don't even need to do anything else than assigning a CustomSerializerSurrogateSelector instance to the SurrogateSelector property. But I'm afraid that deserializing a ConcurrentDictionary by BinaryFormatter in .NET 8 can be more complicated. In .NET Framework there used to be an m_serializationArray field, which is no longer there.

So you should use the surrogate selector like this (warning: untested, adjust if needed):

// using the legacy BinaryFormatter with a custom deserialization surrugate selector:
var surrogateSelector = new CustomSerializerSurrogateSelector();
surrogateSelector.Deserializing += SurrogateSelector_Deserializing;
var formatter = new BinaryFormatter { SurrogateSelector = surrogateSelector };
return (MyClass)formatter.Deserialize(databaseStreamUsingNetFramework4xPayload);

// [...]

// processing the legacy .NET Framework 4.x ConcurrentDictionary payload in .NET 8:
private void SurrogateSelector_Deserializing(object sender, EventHandler<DeserializingEventArgs> e)
{
    // Customizing ConcurrentDictionary<DaylightFactorType, double[]> deserialization only:
    if (e.Object is not ConcurrentDictionary<DaylightFactorType, double[]> cdict)
        return;

    // cdict is an uninitialized object here that we need to initialize from e.SerializationInfo.
    // Not even the default constructor was executed, so we do that by reflection, and then
    // populate the dictionary as a normal instance.
    // NOTE: If your dictionary uses a non-default comparer, then use another constructor
    // with the m_comparer entry from e.SerializationInfo!
    MethodBase ctor = typeof(ConcurrentDictionary<DaylightFactorType, double[]>).GetConstructor(Type.EmptyTypes);

    // Late-execution of the default constructor as if it was just a method.
    ctor.Invoke(e.Object, Array.Empty<object>());

    // populating the dictionary from m_serializationArray
    var entries = e.SerializationInfo.GetValueOrDefault<KeyValuePair<DaylightFactorType, double[]>[]>("m_serializationArray")
        ?? throw new InvalidOperationException("m_serializationArray not found. Not a .NET Framework 4.x payload?");

    foreach (var entry in entries)
        cdict[entry.Key] = entry.Value;

    // Notifying the selector that we initialized the object so it should not process e.SerializationInfo again
    e.Handled = true;
}

2. The recommended solution:

You really should migrate the database and abandon the old BinaryFormatter stream. Not just because the solution above is complicated but also because it's now a security threat. If the back-end is not under your control, you need to consult with the responsible guys and/or the architects.

As my BinarySerializationFormatter is fully compatible with the IFormatter infrastructure, the most convenient solution would be to migrate to that one (alternatively, you can migrate to some XML or JSON serialization but it might be a big refactoring if your serialized objects don't expose everything by public properties). As I mentioned, the binary format is not compatible, so a one-time migration would be necessary. But in the end the size of the database will be much smaller, as it's demonstrated in this online example.

To migrate your database, the following steps are necessary:

  1. Deserialize all entries from the database. Either use the workaround above in .NET 8, or do this step in .NET Framework 4.5.2 with the original client
  2. Re-serialize the entries by BinarySerializationFormatter. You can do this either in .NET Framework or .NET 8, the binary stream will be the same for ConcurrentDictionary, as my serializer supports it natively. Please note though, that your custom DaylightFactorType will be saved by assembly identity. So don't use a different assembly name/version in the different frameworks to keep things simple.
  3. Now you can permanently switch to BinarySerializationFormatter in .NET 8.

@sonnatager
Copy link
Author

Hello,
thank you for your quick response and suggestions.
In the short term I am not able to migrate the database. But in long run, I will definitely get rid of the native binary formatter.
I will try your suggestion today and report here to let you know if it worked.

@sonnatager
Copy link
Author

Hi, it works for the above described concurrent dictionary.

I have another concurrent dictionary with an enum as key (TypeB) and a object with two properties in it (TypeC with interface ITypeC), which is not working:

[Serializable]
[DataContract]
public class TypeA : ITypeA
{
    [DataMember]
    private ConcurrentDictionary<TypeB, ITypeC> _values = new();
}

TypeC:

[Serializable]
[DataContract]
public class TypeC : ITypeC
{
    public double Property1 { get; set; }
    public double[] Property2 { get; set; }
}

TypeB is a enum.

I changed you code above to the following:

// processing the legacy .NET Framework 4.x ConcurrentDictionary payload in .NET 8:
private void SurrogateSelector_Deserializing(object sender, DeserializingEventArgs e)
{
    // Customizing ConcurrentDictionary<TypeB, ITypeC> deserialization only:
    if (e.Object is not ConcurrentDictionary<TypeB, ITypeC> cdict)
        return;

    // cdict is an uninitialized object here that we need to initialize from e.SerializationInfo.
    // Not even the default constructor was executed, so we do that by reflection, and then
    // populate the dictionary as a normal instance.
    // NOTE: If your dictionary uses a non-default comparer, then use another constructor
    // with the m_comparer entry from e.SerializationInfo!
    MethodBase ctor = typeof(ConcurrentDictionary<TypeB, ITypeC>).GetConstructor(Type.EmptyTypes);

    // Late-execution of the default constructor as if it was just a method.
    ctor.Invoke(e.Object, Array.Empty<object>());

    // populating the dictionary from m_serializationArray
    var entries = e.SerializationInfo.GetValueOrDefault<KeyValuePair<TypeB, ITypeC>[]>("m_serializationArray")
        ?? throw new InvalidOperationException("m_serializationArray not found. Not a .NET Framework 4.x payload?");

    foreach (var entry in entries)
        cdict[entry.Key] = entry.Value;

    // Notifying the selector that we initialized the object so it should not process e.SerializationInfo again
    e.Handled = true;
}

At the line with the foreach i get an array of key value pairs in the entries property, which all have as key the first enum value of the enum and a null value in it.
There is no exception.

Best regards and thank you for your help 😄

@koszeggy
Copy link
Owner

At the line with the foreach i get an array of key value pairs in the entries property, which all have as key the first enum value of the enum and a null value in it.

Hmm... It's strange that in one case ConcurrentDictionary works but not in another one. 🤔 Considering that in case 1. the serialization entries of the SerializationInfo are restored by the BinaryFormatter, it must be a breaking change/bug in BinaryFormatter itself between the .NET Framework and .NET [Core] versions. If it's really the case, then the bad news is that it's not much you can do. Normally you should report the bug in the dotnet/runtime repo but as BinaryFormatter will be completely removed in .NET 9 I doubt they would fix anything.

Can you confirm that if you execute that code in .NET Framework (using the surrogate selector that would not be necessary for the Framework but still helps debugging the inner steps), then m_serializationArray is restored correctly? Or that without the serialization surrogate you get a different result in .NET Framework?

Please also note that if you handle multiple types/scenarios in the Deserializing event take extra care about when e.Handled is set. Do it only when you really initialized e.Object manually.

@sonnatager
Copy link
Author

This has not worked, so i decided to recreate the database. But this time i used your library to serialize data.

Thank you for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants