BinaryWriter perf and memory improvements #47316

GrabYourPitchforks · 2021-01-22T03:17:01Z

This addresses some low-hanging fruit in the BinaryWriter class, reducing overall memory footprint and wall clock time for common operations. It also removed use of the unsafe keyword where possible.

I spoke offline with @adamsitnik about the consequences behind changing a bunch of Steam.Write(byte[], int, int) call sites to read Stream.Write(ROS<byte>) instead. Technically this could result in worse performance if the wrapped stream doesn't override the ROS<byte>-based overloads, since the default implementations of those overloads will rent from the array pool, copy, and forward to the array-based overloads. But honestly, it's 2021, most of the built-in stream types override these methods correctly, and we're already discussing ways to flag with warnings user-defined types which don't override these methods. I don't think we should handicap the common case of using fully-compliant built-in stream types just on the off-chance somebody might have used a custom type.

Perf results:

Method	Job	Toolchain	Mean	Error	StdDev	Ratio	Gen 0	Gen 1	Gen 2	Allocated
DefaultCtor	Job-LUELTV	master	28.972 ns	0.5700 ns	0.5332 ns	1.00	0.0181	-	-	152 B
DefaultCtor	Job-GIFVTY	pr	13.240 ns	0.3233 ns	0.3594 ns	0.46	0.0048	-	-	40 B

WriteUInt32	Job-LUELTV	master	2.665 ns	0.0159 ns	0.0133 ns	1.00	-	-	-	-
WriteUInt32	Job-GIFVTY	pr	2.090 ns	0.0352 ns	0.0294 ns	0.78	-	-	-	-

WriteUInt64	Job-LUELTV	master	1.876 ns	0.0130 ns	0.0109 ns	1.00	-	-	-	-
WriteUInt64	Job-GIFVTY	pr	2.101 ns	0.0137 ns	0.0121 ns	1.12	-	-	-	-

Method	Job	Toolchain	StringLengthInChars	Mean	Error	StdDev	Median	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
WriteCharArray	Job-XZHCJM	master	4	24.38 ns	0.208 ns	0.194 ns	24.35 ns	1.00	0.00	0.0038	-	-	32 B
WriteCharArray	Job-EAPOXA	pr	4	35.05 ns	0.113 ns	0.100 ns	35.02 ns	1.44	0.01	-	-	-	-

WriteString	Job-XZHCJM	master	4	20.39 ns	0.066 ns	0.058 ns	20.39 ns	1.00	0.00	-	-	-	-
WriteString	Job-EAPOXA	pr	4	14.80 ns	0.034 ns	0.026 ns	14.81 ns	0.73	0.00	-	-	-	-

WriteCharArray	Job-XZHCJM	master	16	27.92 ns	0.176 ns	0.165 ns	27.93 ns	1.00	0.00	0.0048	-	-	40 B
WriteCharArray	Job-EAPOXA	pr	16	35.11 ns	0.115 ns	0.108 ns	35.14 ns	1.26	0.01	-	-	-	-

WriteString	Job-XZHCJM	master	16	24.15 ns	0.088 ns	0.078 ns	24.13 ns	1.00	0.00	-	-	-	-
WriteString	Job-EAPOXA	pr	16	16.96 ns	0.111 ns	0.098 ns	16.93 ns	0.70	0.00	-	-	-	-

WriteCharArray	Job-XZHCJM	master	512	75.80 ns	0.923 ns	0.864 ns	75.51 ns	1.00	0.00	0.0640	-	-	536 B
WriteCharArray	Job-EAPOXA	pr	512	52.93 ns	0.242 ns	0.227 ns	52.92 ns	0.70	0.01	-	-	-	-

WriteString	Job-XZHCJM	master	512	137.70 ns	0.903 ns	0.845 ns	137.74 ns	1.00	0.00	-	-	-	-
WriteString	Job-EAPOXA	pr	512	52.54 ns	0.537 ns	0.449 ns	52.53 ns	0.38	0.00	-	-	-	-

WriteCharArray	Job-XZHCJM	master	10000	1,044.99 ns	26.915 ns	78.936 ns	1,023.76 ns	1.00	0.00	1.1959	-	-	10024 B
WriteCharArray	Job-EAPOXA	pr	10000	358.84 ns	1.177 ns	1.044 ns	358.70 ns	0.35	0.02	-	-	-	-

WriteString	Job-XZHCJM	master	10000	2,748.17 ns	24.340 ns	22.767 ns	2,748.62 ns	1.00	0.00	-	-	-	-
WriteString	Job-EAPOXA	pr	10000	352.63 ns	1.601 ns	1.498 ns	353.18 ns	0.13	0.00	-	-	-	-

WriteCharArray	Job-XZHCJM	master	100000	39,499.04 ns	784.557 ns	1,894.794 ns	39,615.28 ns	1.00	0.00	31.1890	31.1890	31.1890	100025 B
WriteCharArray	Job-EAPOXA	pr	100000	3,193.04 ns	18.645 ns	17.441 ns	3,192.28 ns	0.08	0.00	-	-	-	-

WriteString	Job-XZHCJM	master	100000	27,025.32 ns	139.589 ns	123.742 ns	26,993.91 ns	1.00	0.00	-	-	-	-
WriteString	Job-EAPOXA	pr	100000	3,483.60 ns	20.989 ns	19.633 ns	3,485.38 ns	0.13	0.00	-	-	-	-

WriteCharArray	Job-XZHCJM	master	500000	90,273.26 ns	1,036.222 ns	918.584 ns	90,217.61 ns	1.00	0.00	113.7695	113.7695	113.7695	500021 B
WriteCharArray	Job-EAPOXA	pr	500000	17,491.07 ns	69.793 ns	65.285 ns	17,526.21 ns	0.19	0.00	-	-	-	48 B

WriteString	Job-XZHCJM	master	500000	137,453.00 ns	649.887 ns	607.905 ns	137,337.48 ns	1.00	0.00	-	-	-	-
WriteString	Job-EAPOXA	pr	500000	30,467.77 ns	182.114 ns	170.350 ns	30,530.05 ns	0.22	0.00	-	-	-	48 B

WriteCharArray	Job-XZHCJM	master	2000000	766,902.63 ns	15,059.390 ns	23,445.656 ns	769,048.83 ns	1.00	0.00	95.7031	95.7031	95.7031	2000032 B
WriteCharArray	Job-EAPOXA	pr	2000000	71,244.09 ns	236.232 ns	209.414 ns	71,242.82 ns	0.09	0.00	-	-	-	160 B

WriteString	Job-XZHCJM	master	2000000	645,989.09 ns	4,823.895 ns	4,512.275 ns	647,063.96 ns	1.00	0.00	-	-	-	-
WriteString	Job-EAPOXA	pr	2000000	126,507.40 ns	719.938 ns	673.430 ns	126,473.10 ns	0.20	0.00	-	-	-	160 B

The WriteChars tests for small values requires further discussion. The original implementation allocates a new array on each invocation, while the new implementation uses the array pool. The indirection through the array pool adds a few nanoseconds fixed overhead, which causes the ratio difference between the old and new code to be exaggerated. I believe the new code is more appropriate for the common case since it reduces the overall memory footprint of the application, even with this overhead. There is also some overhead due to the delegate invocation in the workhorse routine. When the delegate is first created, it points to a stub routine rather than directly to the target method, adding a few extra jumps. This is a long-standing behavioral nit in delegates and if it's solved all-up in the runtime then we'll just get the benefits here for free.

Benchmark code below.

using BenchmarkDotNet.Attributes;
using System;
using System.IO;
using System.Threading;
using System.Threading.Tasks;

namespace ConsoleAppBenchmark
{
    [MemoryDiagnoser]
    public class BinaryWriterRunner
    {
        private BinaryWriter _bw;

        [GlobalSetup]
        public void Setup()
        {
            _bw = new BinaryWriter(new NullWriteStream());
        }

        [Benchmark]
        public BinaryWriter DefaultCtor() => new BinaryWriter(Stream.Null);

        [Benchmark]
        public void WriteUInt32()
        {
            _bw.Write((uint)0xdeadbeef);
        }

        [Benchmark]
        public void WriteUInt64()
        {
            _bw.Write((ulong)0xdeadbeef_aabbccdd);
        }
    }

    [MemoryDiagnoser]
    public class BinaryWriterRunner_Extended
    {
        private string _input;
        private char[] _inputAsChars;
        private readonly BinaryWriter _bw;

        [Params(4, 16, 512, 10_000, 100_000, 500_000, 2_000_000)]
        public int StringLengthInChars;

        public BinaryWriterRunner_Extended()
        {
            _bw = new BinaryWriter(new NullWriteStream());
        }

        [GlobalSetup]
        public void Setup()
        {
            _input = new string('x', StringLengthInChars);
            _inputAsChars = _input.ToCharArray();
        }

        [Benchmark]
        public void WriteCharArray()
        {
            _bw.Write(_inputAsChars);
        }

        [Benchmark]
        public void WriteString()
        {
            _bw.Write(_input);
        }
    }

    internal class NullWriteStream : Stream
    {
        public override bool CanRead => false;

        public override bool CanSeek => false;

        public override bool CanWrite => true;

        public override long Length => throw new NotSupportedException();

        public override long Position { get => throw new NotSupportedException(); set => throw new NotSupportedException(); }

        public override void Flush() { }

        public override int Read(byte[] buffer, int offset, int count)
        {
            throw new NotSupportedException();
        }

        public override long Seek(long offset, SeekOrigin origin)
        {
            throw new NotSupportedException();
        }

        public override void SetLength(long value)
        {
            throw new NotSupportedException();
        }

        public override void Write(byte[] buffer, int offset, int count) { }

        public override void Write(ReadOnlySpan<byte> buffer) { }

        public override Task WriteAsync(byte[] buffer, int offset, int count, CancellationToken cancellationToken)
        {
            return Task.CompletedTask;
        }

        public override void WriteByte(byte value) { }

        public override ValueTask WriteAsync(ReadOnlyMemory<byte> buffer, CancellationToken cancellationToken = default)
        {
            return ValueTask.CompletedTask;
        }
    }
}

adamsitnik

@GrabYourPitchforks thank you for another amazing perf improvement!

Could you please extend benchmarks with Write(double), Write(float) and Write(short) and contribute them to the performance repo? If we merge the benchmarks before this change the perf infra will show improvements (or regressions) for x64, x86, and ARM64.

adamsitnik · 2021-01-22T06:11:01Z

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs

@@ -15,32 +15,30 @@ namespace System.IO
    //
    public class BinaryWriter : IDisposable, IAsyncDisposable
    {
+        private const int MaxArrayPoolRentalSize = 1024 * 1024; // ArrayPool<T>.Shared allocates beyond this point


it would have been great if ArrayPool was exposing the value as an internal const.

adamsitnik · 2021-01-22T06:22:07Z

src/libraries/System.IO/tests/BinaryWriter/BinaryWriter.EncodingTests.cs

+        public void Ctor_Utf8EncodingDerivedTypeWithWrongCodePage_DoesNotUseFastUtf8()
+        {
+            Mock<UTF8Encoding> mockEncoding = new Mock<UTF8Encoding>();
+            mockEncoding.Setup(o => o.CodePage).Returns(65000 /* UTF-7 code page */);


I am ok with using Mock<T> as long as we don't have any AOT test suite that is going to fail.

I do not think it is worth it to pick up this heavy dependency here to just save like 3 lines.

For cases where it is really worth, we just need a way how to conditionally disable tests using test techniques that are incompatible with runtime mode (single file, trimming, no JIT, no reflection emit, no private reflection, etc.).

We use Moq in a few other test projects in this repo (see search results). I can remove the dependency for this project, but what does that mean for the general test framework guidance?

See Steve's comment at #47316 (comment) and my response there for a little more context on why I'm using (and mocking) the CodePage property in the first place. We could tweak that logic and render the whole thing moot.

I think using heavy test framework for testing high-level libraries like Microsoft.Extensions.* is fine.

I do not think it is a good practice to use these heavy test frameworks for testing core platform (ie stuff in CoreLib).

I agree that we should have this test (as long as the implementation stays what it is). Do you agree that the use of Moq saves you like 3 lines on code in this case?

adamsitnik · 2021-01-22T06:24:09Z

src/libraries/System.IO/tests/BinaryWriter/BinaryWriter.EncodingTests.cs

+
+                Assert.Equal(3_000_000_000, outStream.Position);
+            }
+        }


thank you for writing all the tests! and especially covering this particular edge case! 👍

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs

adamsitnik · 2021-01-22T06:41:29Z

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs

+                // We prefer GetMaxByteCount because it's a constant-time operation.
+
+                int maxByteCount = _encoding.GetMaxByteCount(chars.Length);
+                if (maxByteCount <= MaxArrayPoolRentalSize)


would it be possible (and worth it) to add a stackallock code path for small char arrays? Similar to what you have done for small strings in Write(string value)?

jkotas · 2021-01-22T15:32:26Z

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs

@@ -15,32 +15,30 @@ namespace System.IO
    //
    public class BinaryWriter : IDisposable, IAsyncDisposable
    {
+        private const int MaxArrayPoolRentalSize = 1024 * 1024; // ArrayPool<T>.Shared allocates beyond this point


This should not really depend on ArrayPool implementation details. It would be better to set this to size where we start to see diminishing results. I would expect it to be like 64kB.

Also, very large buffers tend to not work that well since they do not fit into processor cache.

Would it make sense to refactor the Stream copy buffer size const out into an internal field, then reference it from here? That would provide a single place to look across our code when it needs to figure out a good default buffer size.

runtime/src/libraries/System.Private.CoreLib/src/System/IO/Stream.cs

Line 122 in 8a52f1e

const int DefaultCopyBufferSize = 81920;

I am not sure. It is not clear whether the DefaultCopyBufferSize is actually a good default buffer size.

The original justification for 81920 was that it is right under default LOH threshold and good for GC. This argument does not hold with ArrayPool that was not used originally. ArrayPool will round it up to the next power of 2, so 81920 will turn into 128k that is right above LOH threshold... .

FWIW, some time last year I tried decreasing the size so it would be back under the LOH threshold even after the pool rounded up, but there were quite measurable regressions for certain operations on microbenchmarks due to the much smaller buffer size, so I left it as is until we had a pressing scenario highlighting it was worth a change.

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs

stephentoub · 2021-01-22T20:17:38Z

src/libraries/Common/tests/TestUtilities/System/Runtime/InteropServices/SafeBufferUtil.cs

+            {
+#if !NETCOREAPP
+                RuntimeHelpers.PrepareConstrainedRegions();
+#endif


This is just a test file... is this really necessary?

No. But everything is non-shipping code until it gets copied & pasted into a shipping product. :)

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs

GrabYourPitchforks · 2021-01-27T22:43:13Z

@adamsitnik I sent a PR at dotnet/performance#1639 with these tests, plus added the tests you suggested + fleshed out a few others.

GrabYourPitchforks · 2021-01-27T23:34:02Z

@adamsitnik @jkotas I didn't see much of a difference between using a 32k vs. a 64k max rental size. Perf results below.

Method	Toolchain	StringLengthInChars	Mean	Error	StdDev	Ratio	RatioSD	Gen 0	Gen 1	Gen 2	Allocated
WriteCharArray	pr32k	4	33.00 ns	1.138 ns	0.062 ns	0.99	0.01	-	-	-	-
WriteCharArray	pr64k	4	33.27 ns	2.128 ns	0.117 ns	1.00	0.00	-	-	-	-

WriteString	pr32k	4	12.04 ns	0.612 ns	0.034 ns	0.99	0.00	-	-	-	-
WriteString	pr64k	4	12.12 ns	1.073 ns	0.059 ns	1.00	0.00	-	-	-	-

WriteCharArray	pr32k	16	34.69 ns	0.626 ns	0.034 ns	1.00	0.01	-	-	-	-
WriteCharArray	pr64k	16	34.85 ns	7.338 ns	0.402 ns	1.00	0.00	-	-	-	-

WriteString	pr32k	16	13.27 ns	0.396 ns	0.022 ns	1.00	0.00	-	-	-	-
WriteString	pr64k	16	13.23 ns	0.745 ns	0.041 ns	1.00	0.00	-	-	-	-

WriteCharArray	pr32k	512	59.53 ns	5.954 ns	0.326 ns	1.00	0.04	-	-	-	-
WriteCharArray	pr64k	512	59.61 ns	37.954 ns	2.080 ns	1.00	0.00	-	-	-	-

WriteString	pr32k	512	51.82 ns	4.056 ns	0.222 ns	0.99	0.01	-	-	-	-
WriteString	pr64k	512	52.21 ns	2.390 ns	0.131 ns	1.00	0.00	-	-	-	-

WriteCharArray	pr32k	8192	302.99 ns	9.689 ns	0.531 ns	1.02	0.00	-	-	-	-
WriteCharArray	pr64k	8192	295.62 ns	8.826 ns	0.484 ns	1.00	0.00	-	-	-	-

WriteString	pr32k	8192	296.68 ns	47.442 ns	2.600 ns	0.93	0.11	-	-	-	-
WriteString	pr64k	8192	320.96 ns	701.383 ns	38.445 ns	1.00	0.00	-	-	-	-

WriteCharArray	pr32k	16384	632.55 ns	9.539 ns	0.523 ns	1.03	0.01	0.0057	-	-	48 B
WriteCharArray	pr64k	16384	614.72 ns	156.201 ns	8.562 ns	1.00	0.00	-	-	-	-

WriteString	pr32k	16384	1,054.91 ns	93.902 ns	5.147 ns	1.67	0.01	0.0057	-	-	48 B
WriteString	pr64k	16384	630.59 ns	21.631 ns	1.186 ns	1.00	0.00	-	-	-	-

WriteCharArray	pr32k	131072	4,646.23 ns	182.919 ns	10.026 ns	1.05	0.00	0.0153	-	-	160 B
WriteCharArray	pr64k	131072	4,427.34 ns	126.210 ns	6.918 ns	1.00	0.00	0.0153	-	-	160 B

WriteString	pr32k	131072	8,025.57 ns	377.238 ns	20.678 ns	1.00	0.00	0.0153	-	-	160 B
WriteString	pr64k	131072	8,025.43 ns	364.648 ns	19.988 ns	1.00	0.00	0.0153	-	-	160 B

WriteCharArray	pr32k	1048576	38,643.52 ns	9,283.063 ns	508.836 ns	1.03	0.02	-	-	-	160 B
WriteCharArray	pr64k	1048576	37,528.87 ns	6,376.444 ns	349.514 ns	1.00	0.00	-	-	-	160 B

WriteString	pr32k	1048576	68,994.16 ns	4,548.266 ns	249.306 ns	1.03	0.01	-	-	-	160 B
WriteString	pr64k	1048576	67,256.93 ns	3,381.513 ns	185.352 ns	1.00	0.00	-	-	-	161 B

The strlen = 8192, 64kb buffer test has a large stddev, so I'm not worrying too much about it. The strlen = 16384 test is very different between 32kb and 64kb because on 64kb the data fits into a single buffer, while on 32kb it does not so we need to go down the slow "two-pass" path.

I think this means we can stick with 64k.

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs

GrabYourPitchforks · 2021-01-29T01:32:48Z

Android test runner seems to be crashing on the test that attempts to allocate 6.5GB of memory. I have a try / catch (OOM) around the test itself to bail if we're in a low-mem condition, but looks like the code's dying before it even gets to this point. The watchdog looks like it's killing other processes on the box, which eventually results in the entire test infrastructure falling over.

01-29 06:08:59.833  7947  8489 I DOTNET  : Test collection for System.IO.Tests.BinaryWriter_EncodingTests
01-29 06:08:59.841  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.Ctor_NewUtf8Encoding_UsesFastUtf8(emitIdentifier: False, throwOnInvalidBytes: False)
01-29 06:08:59.842  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.Ctor_NewUtf8Encoding_UsesFastUtf8(emitIdentifier: True, throwOnInvalidBytes: True)
01-29 06:08:59.842  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.Ctor_NewUtf8Encoding_UsesFastUtf8(emitIdentifier: True, throwOnInvalidBytes: False)
01-29 06:08:59.842  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.Ctor_NewUtf8Encoding_UsesFastUtf8(emitIdentifier: False, throwOnInvalidBytes: True)
01-29 06:09:00.841  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.WriteChars_FastUtf8(stringLengthInChars: 262144)
01-29 06:09:00.965  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.WriteChars_FastUtf8(stringLengthInChars: 32768)
01-29 06:09:00.997  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.WriteChars_FastUtf8(stringLengthInChars: 8192)
01-29 06:09:01.000  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.WriteSingleChar_FastUtf8(ch: '├⌐')
01-29 06:09:01.001  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.WriteSingleChar_FastUtf8(ch: 'x')
01-29 06:09:01.002  7947  8489 I DOTNET  : 	[PASS] System.IO.Tests.BinaryWriter_EncodingTests.WriteSingleChar_FastUtf8(ch: 'Γä░')
01-29 06:09:03.457   857   857 E lowmemorykiller: Kill 'com.google.android.ims' (5004), uid 10147, oom_adj 999 to free 37460kB
01-29 06:09:03.462   857   857 I lowmemorykiller: Reclaimed 37460kB, cache(318216kB) and free(46940kB)-reserved(45844kB) below min(322560kB) for oom_adj 950
01-29 06:09:03.474  1384  1721 D ConnectivityService: ConnectivityService NetworkRequestInfo binderDied(NetworkRequest [ TRACK_DEFAULT id=35, [ Capabilities: INTERNET&NOT_RESTRICTED&TRUSTED Uid: 10147] ], android.os.BinderProxy@6fbe136)
01-29 06:09:03.475   857   857 E lowmemorykiller: Kill 'com.qualcomm.telephony' (6855), uid 10087, oom_adj 999 to free 25108kB
01-29 06:09:03.475   857   857 I lowmemorykiller: Reclaimed 25108kB, cache(286964kB) and free(43812kB)-reserved(45844kB) below min(322560kB) for oom_adj 950
01-29 06:09:03.476   800   800 I Zygote  : Process 5004 exited due to signal 9 (Killed)

Is there a recommendation for how I can work around this? Best I can think of is to skip the test on android, but that doesn't seem like the right solution.

danmoseley · 2021-01-29T01:48:54Z

Why not skip that test on Android? Is it likely there will be an Android specific bug in that one jumbo allocating test..

GrabYourPitchforks · 2021-01-29T18:40:57Z

@danmosemsft I ended up taking your advice. It just makes me feel dirty to hard-code a platform block rather than to query the environment about whether something will succeed or fail.

danmoseley · 2021-01-29T18:58:58Z

@GrabYourPitchforks I had to do something similar in 5aef85a because Ubuntu 18.04 specifically was more aggressive with the OOM killer. I felt OK about it because the chances of an OS specific Regex bug are very low, and in these specific tests only even lower.

adamsitnik · 2021-04-08T10:03:44Z

@GrabYourPitchforks The WriteAsciiCharArray benchmark has regressed for small inputs, I assume that this was a by-design tradeoff?

System.IO.Tests.BinaryWriterExtendedTests.WriteAsciiCharArray(StringLengthInChars: 32)

Result	Base	Diff	Ratio	Alloc Delta	Modality	Operating System	Bit	Processor Name	Base V	Diff V
Same	38.12	41.05	0.93	-56		Windows 10.0.19042	X64	AMD Ryzen Threadripper 2990WX	5.0.421.11614	6.0.21.16201
Slower	27.45	38.27	0.72	-56		Windows 10.0.21337	X64	AMD Ryzen 9 3900X	5.0.421.11614	6.0.21.16701
Slower	27.98	40.61	0.69	-56		Windows 10.0.21337	X64	AMD Ryzen Threadripper 3990X	5.0.421.11614	6.0.21.16701
Slower	34.75	45.44	0.76	-56		Windows 10.0.18363.1440	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.421.11614	6.0.21.16201
Same	316.55	223.49	1.42	-56	bimodal	Windows 10.0.21337	X64	Intel Core i5-4300U CPU 1.90GHz (Haswell)	5.0.421.11614	6.0.21.16701
Slower	30.64	38.92	0.79	-56		Windows 10.0.19042	X64	Intel Core i7-6700 CPU 3.40GHz (Skylake)	5.0.421.11614	6.0.21.16201
Slower	31.43	41.54	0.76	-56		Windows 10.0.19042	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	5.0.421.11614	6.0.21.16201
Same	41.36	45.42	0.91	-56	bimodal	Windows 10.0.19042	X64	Intel Core i7-8650U CPU 1.90GHz (Kaby Lake R)	5.0.421.11614	6.0.21.16408
Slower	28.70	37.92	0.76	-56		Windows 10.0.19042	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)	5.0.421.11614	6.0.21.16201
Slower	28.69	34.34	0.84	-56		Windows 10.0.19042	X64	Intel Core i7-8700 CPU 3.20GHz (Coffee Lake)	5.0.421.11614	6.0.21.16408
Same	153.43	166.67	0.92	-56		Windows 10.0.19042	X64	Intel Atom x7-Z8700 CPU 1.60GHz	5.0.421.11614	6.0.21.16309
Slower	39.35	45.83	0.86	-56		ubuntu 18.04	X64	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.421.11614	6.0.21.16309
Slower	37.91	49.86	0.76	-56		alpine 3.11	X64	Intel Core i7-7700 CPU 3.60GHz (Kaby Lake)	5.0.421.11614	6.0.21.16601
Slower	113.27	135.60	0.84	-56		ubuntu 16.04	Arm64	Unknown processor	5.0.421.11614	6.0.21.17806
Slower	115.08	139.23	0.83	-56		ubuntu 16.04	Arm64	Unknown processor	5.0.421.11614	6.0.21.17806
Slower	49.36	66.11	0.75	-56		Windows 10.0.19042	Arm64	Microsoft SQ1 3.0 GHz	5.0.421.11614	6.0.21.16309
Slower	51.67	67.08	0.77	-56		Windows 10.0.19042	Arm64	Microsoft SQ1 3.0 GHz	5.0.421.11614	6.0.21.16201
Slower	44.80	55.94	0.80	-44		Windows 10.0.18363.1440	X86	Intel Xeon CPU E5-1650 v4 3.60GHz	5.0.421.11614	6.0.21.16701
Slower	94.83	108.70	0.87	-44		Windows 10.0.19042.867	Arm	Microsoft SQ1 3.0 GHz	5.0.421.11614	6.0.21.17905
Slower	53.46	62.62	0.85	-56		macOS 11.2	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)	5.0.421.11614	6.0.21.16408
Slower	54.03	61.54	0.88	-56		macOS 11.2.3	X64	Intel Core i5-4278U CPU 2.60GHz (Haswell)	5.0.421.11614	6.0.21.16601
Slower	44.18	51.20	0.86	-56		macOS 11.2.2	X64	Intel Core i7-4870HQ CPU 2.50GHz (Haswell)	5.0.421.11614	6.0.21.16601
Slower	45.26	57.47	0.79	-56		macOS Mojave 10.14.5	X64	Intel Core i7-5557U CPU 3.10GHz (Broadwell)	5.0.421.11614	6.0.21.16309

GrabYourPitchforks · 2021-04-08T16:16:05Z

@adamsitnik I guess that's not too surprising for small inputs. The old code allocated small arrays every time, and the new code uses the array pool. There's certainly some overhead from fetching and returning pooled arrays.

That said, I don't know a good non-breaking way to resolve this without reintroducing the intermediate allocations. And it looks like BinaryWriter.Write(char[]) is a very infrequently used API anyway. (This makes sense given that the calling pattern to get this back in via BinaryReader.Read(char[], ...) is sloppy, so it doesn't surprise me that very few people in practice actually do this.)

So while this might be a regression for this scenario, I think we can say that the regression is small (~10 - 20 ns fixed overhead) and the scenario is rare, so we may want to just swallow it.

GrabYourPitchforks added 4 commits January 20, 2021 22:40

First pass at BinaryWriter perf improvements

e5e9d20

Quick cleanup + add unit tests

02285ba

Merge remote-tracking branch 'origin/master' into binarywriter

05da1a2

Further perf improvements

2584f97

GrabYourPitchforks added area-System.IO tenet-performance Performance related issue labels Jan 22, 2021

GrabYourPitchforks requested review from carlossanlop, adamsitnik and jozkee January 22, 2021 03:17

adamsitnik approved these changes Jan 22, 2021

View reviewed changes

runfoapp bot mentioned this pull request Jan 22, 2021

Testing out the new system jaredpar/runfo#71

Open

jkotas reviewed Jan 22, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs Outdated Show resolved Hide resolved

stephentoub reviewed Jan 22, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs Show resolved Hide resolved

stephentoub reviewed Jan 22, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs Outdated Show resolved Hide resolved

GrabYourPitchforks added 2 commits January 27, 2021 14:45

Merge remote-tracking branch 'origin/master' into binarywriter

97250f2

PR feedback

e1d9cdb

jkotas reviewed Jan 28, 2021

View reviewed changes

src/libraries/System.Private.CoreLib/src/System/IO/BinaryWriter.cs Outdated Show resolved Hide resolved

runfoapp bot mentioned this pull request Jan 28, 2021

Cannot access a disposed object.\r\nObject name: 'System.Net.Sockets.Socket' from System.Net.Sockets.Tests.ReceiveFrom*.ClosedDuringOperation_Throws_ObjectDisposedExceptionOrSocketException #47561

Closed

Remove unused methods

20f5326

jkotas approved these changes Jan 28, 2021

View reviewed changes

Skip mem-heavy BinaryWriter test on Android

915e54a

adamsitnik added this to the 6.0.0 milestone Jan 29, 2021

GrabYourPitchforks merged commit 55a5a0c into dotnet:master Jan 29, 2021

GrabYourPitchforks deleted the binarywriter branch January 29, 2021 20:06

runfoapp bot mentioned this pull request Feb 2, 2021

slicebuffers_success variant tests failing sporadically #47734

Closed

ghost locked as resolved and limited conversation to collaborators Feb 28, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BinaryWriter perf and memory improvements #47316

BinaryWriter perf and memory improvements #47316

GrabYourPitchforks commented Jan 22, 2021

adamsitnik left a comment

adamsitnik Jan 22, 2021

adamsitnik Jan 22, 2021

jkotas Jan 22, 2021

GrabYourPitchforks Jan 22, 2021

jkotas Jan 23, 2021

adamsitnik Jan 22, 2021

adamsitnik Jan 22, 2021

jkotas Jan 22, 2021

jkotas Jan 22, 2021

GrabYourPitchforks Jan 23, 2021

jkotas Jan 23, 2021

jkotas Jan 23, 2021

stephentoub Jan 23, 2021

stephentoub Jan 22, 2021

GrabYourPitchforks Jan 23, 2021

GrabYourPitchforks commented Jan 27, 2021

GrabYourPitchforks commented Jan 27, 2021

GrabYourPitchforks commented Jan 29, 2021 •

edited

Loading

danmoseley commented Jan 29, 2021

GrabYourPitchforks commented Jan 29, 2021

danmoseley commented Jan 29, 2021

adamsitnik commented Apr 8, 2021

System.IO.Tests.BinaryWriterExtendedTests.WriteAsciiCharArray(StringLengthInChars: 32)

GrabYourPitchforks commented Apr 8, 2021

BinaryWriter perf and memory improvements #47316

BinaryWriter perf and memory improvements #47316

Conversation

GrabYourPitchforks commented Jan 22, 2021

adamsitnik left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

GrabYourPitchforks commented Jan 27, 2021

GrabYourPitchforks commented Jan 27, 2021

GrabYourPitchforks commented Jan 29, 2021 • edited Loading

danmoseley commented Jan 29, 2021

GrabYourPitchforks commented Jan 29, 2021

danmoseley commented Jan 29, 2021

adamsitnik commented Apr 8, 2021

System.IO.Tests.BinaryWriterExtendedTests.WriteAsciiCharArray(StringLengthInChars: 32)

GrabYourPitchforks commented Apr 8, 2021

GrabYourPitchforks commented Jan 29, 2021 •

edited

Loading