Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make BinaryFormatter faster #20569

Closed
Alois-xx opened this issue Mar 10, 2017 · 10 comments · Fixed by dotnet/corefx#17949
Closed

Make BinaryFormatter faster #20569

Alois-xx opened this issue Mar 10, 2017 · 10 comments · Fixed by dotnet/corefx#17949
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Milestone

Comments

@Alois-xx
Copy link
Contributor

When BinaryFormatter encounters a larger object list it gets quadratic deserialization times due to the linear search in

>	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.FindObjectHolder(long objectID) Line 68	C#
 	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.FindOrCreateObjectHolder(long objectID) Line 81	C#
 	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.RegisterFixup(System.Runtime.Serialization.FixupHolder fixup, long objectToBeFixed, long objectRequired) Line 888	C#
 	ConsoleApp2.dll!System.Runtime.Serialization.ObjectManager.RecordArrayElementFixup(long arrayToBeFixed, int[] indices, long objectRequired) Line 966	C#

Sample

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.IO;
using System.Runtime.Serialization.Formatters.Binary;

namespace ConsoleApp2
{
    [Serializable]
    class Book
    {
        public string Name;
        public string Id;
    }

    class Program
    {
        static void Main(string[] args)
        {
            var formatter = new BinaryFormatter();
            List<Book> books = new List<Book>();
            var mem = new MemoryStream();
            for(int i=0;i<500*1000;i++)
            {
                books.Add(new Book { Id = i.ToString() });
            }

            var sw = Stopwatch.StartNew();
            formatter.Serialize(mem, books);
            sw.Stop();
            Console.WriteLine($"Serialization time {sw.Elapsed.TotalSeconds:F2}s");
            mem.Position = 0;
            sw = Stopwatch.StartNew();
            List<Book> booksDeser = (List<Book>)formatter.Deserialize(mem);
            sw.Stop();
            Console.WriteLine($"Deserialization {sw.Elapsed.TotalSeconds:F2}s");
        }
    }
}

Serialization time 2.31s
Deserialization 21.16s

This caused some unexpected "slowdowns" in production code when "real" big objects (e.g. 20 - 100 MB) object graphs are deserialized. Deserialization times of 10 minutes are not uncommon due to this. It would be great if this ugly thing gets cleaned up in .NET Core.

@stephentoub
Copy link
Member

stephentoub commented Mar 11, 2017

A trivial change here that would help would simply be to increase the allowed size of the array used by the ObjectManager, e.g. here:
https://github.com/dotnet/corefx/blob/master/src/System.Runtime.Serialization.Formatters/src/System/Runtime/Serialization/ObjectManager.cs#L14
change:

private const int MaxArraySize = 0x1000;

to:

private const int MaxArraySize = 0x10000;

That should make a very measurable difference in the deserialization time for a large graph like the one in the repro, at the expense of using some more memory, but not a ton more, especially given that it'll be relatively proportional to the size of the graph in question. Subsequently a different data structure could be considered, e.g. a tree of some kind, an array of trees of some kind, etc., rather than an array of lists.

@morganbr
Copy link
Contributor

@Alois-xx , can you please tell us more about the way you're using BinaryFormatter? I'd like to better understand the parts of deserialization performance that are important to your scenario.

@Alois-xx
Copy link
Contributor Author

I have found in our regular .NET based product large deserialization times which did lead to strange architectural changes because the root cause was never fully understood. I have now checked .NET Core if BinaryFormatter has become better and in fact it did not. Hence my request to improve at least .NET Core and if possible backmerge it to the .NET Desktop BCL as well.

@gnilsson2
Copy link

Deserialization of an 114Mbyte Dictionary, takes about 10 minutes, and uses 1Gbyte memory!
When finished only 316MBytes memory is used, 175MB for the dictionary.
(Serialization in < 1 minute)

So there is a big potential for improvement here

Object Type    Count diff.  Size Diff.        Inclusive...
ObjectHolder	-730 231	-75 944 024	-184 019 992	3	312	488
LongList	-365 116	-29 209 280	-29 209 280	1	80	80
FixupHolderList	-365 116	-26 288 352	-40 893 112	1	72	72

It seems that it's all forward references that needs fixup.
"If the reference in the serialized stream is a forward reference, then the Formatter can register a fixup with the ObjectManager."

@Alois-xx
Copy link
Contributor Author

Will this change be merged to the Full .NET Framework? Will this be part of a minor update and what is the expected timeframe for it to arrive?

@stephentoub
Copy link
Member

The change is marked for consideration for being ported to the .NET Framework, but there are currently no plans for if/when that would happen.

@mludlum
Copy link

mludlum commented Sep 26, 2017

Is there a way I can integrate this change to ObjectManager into my application without replacing the complete framework I'm pointing to? There doesn't seem to be a way to substituted a type or it's methods without modifying the original type somehow. I could do this easily in Java just by bumping a jar up in the classpath.

@Alois-xx
Copy link
Contributor Author

The next .NET Framework update 4.7.3? will contain a fix for the regular .NET Framework. For .NET Core the fix is already in with .NET Core 2.0.

@jrocha
Copy link

jrocha commented Oct 25, 2017

Hope it come soon to .NET Framework, it is really painful to wait for large graph deserialization.

@jkotas
Copy link
Member

jkotas commented Oct 26, 2017

This fix is scheduled to be included in .NET Framework 4.7.2 update. It won't be enabled by default. It will be only enabled when Switch.System.Runtime.Serialization.UseNewMaxArraySize config switch is set.

Early access preview builds of .NET Framework 4.7.2 should start showing up on https://github.com/Microsoft/dotnet-framework-early-access/ soon. Please give it a try once the build with this fix is available.

For reference, the .NET Framework port is internal MS bug DevDiv 443438.

cc @AlexGhiondea

dotnet-bot referenced this issue in dotnet/corefx Mar 19, 2018
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
stephentoub referenced this issue in dotnet/corefx Mar 19, 2018
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
ericstj referenced this issue in ericstj/corefx Mar 28, 2018
Signed-off-by: dotnet-bot-corefx-mirror <dotnet-bot@microsoft.com>
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.0.0 milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 25, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Runtime enhancement Product code improvement that does NOT require public API changes/additions tenet-performance Performance related issue
Projects
None yet
8 participants