Skip to content

internal method ReadFullAsync(Stream data, int currentPartSize) has terrible performance #1312

@pbolduc

Description

@pbolduc

The method Task<ReadOnlyMemory<byte>> ReadFullAsync(Stream data, int currentPartSize) has a number of performance issues.

  1. Allocates too much memory: with large files, the part size will be 16MB. Depending on the Stream's implementation and data availability, this method can allocate a lot of memory due to creating buffers in a loop.
  2. Makes too many copies of the read data
  3. Copies read data byte-by-byte.

internal async Task<ReadOnlyMemory<byte>> ReadFullAsync(Stream data, int currentPartSize)
{
Memory<byte> result = new byte[currentPartSize];
var totalRead = 0;
while (totalRead < currentPartSize)
{
Memory<byte> curData = new byte[currentPartSize - totalRead];
var curRead = await data.ReadAsync(curData[..(currentPartSize - totalRead)]).ConfigureAwait(false);
if (curRead == 0) break;
for (var i = 0; i < curRead; i++)
curData.Slice(i, 1).CopyTo(result[(totalRead + i)..]);
totalRead += curRead;
}
if (totalRead == 0) return null;
if (totalRead == currentPartSize) return result;
Memory<byte> truncatedResult = new byte[totalRead];
for (var i = 0; i < totalRead; i++)
result.Slice(i, 1).CopyTo(truncatedResult[i..]);
return truncatedResult;
}

I have will be submitting a PR in the coming days. I created a benchmark comparing my version with the current one. I will provide the source to this benchmark as well. Here is the updated version I am using in my benchmark,

        private async Task<ReadOnlyMemory<byte>> ReadFullAsync2(Stream data, int currentPartSize)
        {
            Memory<byte> result = new byte[currentPartSize];
            var totalRead = 0;
            while (totalRead < currentPartSize)
            {
                var curData = result[totalRead..currentPartSize];
                var curRead = await data.ReadAsync(curData).ConfigureAwait(false);
                if (curRead == 0) break;
                totalRead += curRead;
            }

            if (totalRead == 0) return null;

            // Return only the valid portion without allocating a new buffer
            return result[..totalRead];
        }

In my benchmark, I am using a virtual random data stream

internal class VirtualStream : Stream
{
    private long position;
    private readonly long length;

    public VirtualStream(long length)
    {
        position = 0;
        this.length = length;
    }

    public override bool CanRead => true;
    public override bool CanSeek => false;
    public override bool CanWrite => false;
    public override long Length => length;
    public override long Position
    {
        get => position;
        set => position = (int)value;
    }
    public override void Flush() { }

    public override int Read(byte[] buffer, int offset, int count)
    {
        // Calculate how many bytes are left to read
        var remaining = length - position;
        if (remaining <= 0)
            return 0; // End of stream

        // Limit the read to a maximum of 4096 bytes per call
        var maxRead = Math.Min(4096, count);
        var toRead = (int)Math.Min(maxRead, remaining);
        Random.Shared.NextBytes(buffer.AsSpan(offset, toRead));
        position += toRead;

        return toRead;
    }

    public override void SetLength(long value)
    {
#pragma warning disable MA0025
        throw new NotSupportedException();
#pragma warning restore MA0025
    }

    public override void Write(byte[] buffer, int offset, int count)
    {
#pragma warning disable MA0025
        throw new NotSupportedException();
#pragma warning restore MA0025
    }

    public override long Seek(long offset, SeekOrigin origin)
    {
#pragma warning disable MA0025
        throw new NotImplementedException();
#pragma warning restore MA0025
    }
}

When running the benchmarks for varying part sizes, you can see in the worse case senario with a part size of 16MB, 333x times longer, allocates 1000x more data and GC's are through the roof.

Method PartSize Mean Error StdDev Median Ratio RatioSD Gen0 Gen1 Gen2 Allocated Alloc Ratio
ReadFullAsync_Benchmark 16777216 3,711,505.3 us 60,184.75 us 50,256.99 us 3,705,554.8 us 1.000 0.02 457000.0000 457000.0000 457000.0000 33581040.77 KB 1.000
ReadFullAsync2_Benchmark 16777216 9,830.6 us 188.57 us 185.20 us 9,776.0 us 0.003 0.00 - - - 16833.01 KB 0.001
ReadFullAsync_Benchmark 65536 700.4 us 13.95 us 37.71 us 685.6 us 1.00 0.07 - - - 611.15 KB 1.00
ReadFullAsync2_Benchmark 65536 103.2 us 6.56 us 19.13 us 106.0 us 0.15 0.03 - - - 66.76 KB 0.11
ReadFullAsync_Benchmark 1048576 19,956.5 us 615.71 us 1,776.47 us 19,487.5 us 1.01 0.12 29000.0000 29000.0000 29000.0000 132834.36 KB 1.000
ReadFullAsync2_Benchmark 1048576 847.5 us 29.89 us 88.13 us 839.3 us 0.04 0.01 - - - 1053.01 KB 0.008

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions