Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data corruption in BinaryReader.ReadChars #30637

Closed
carlossanlop opened this issue Aug 20, 2019 · 9 comments · Fixed by dotnet/coreclr#26324
Closed

Data corruption in BinaryReader.ReadChars #30637

carlossanlop opened this issue Aug 20, 2019 · 9 comments · Fixed by dotnet/coreclr#26324
Assignees
Labels
area-System.IO tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@carlossanlop
Copy link
Member

This issue was first reported internally as a bug in .NET Framework, but it is also reproducible in .NET Core.

UTF-16

When using a 2-byte encoding (i.e. UTF-16), and _stream.Read() returns an uneven number of bytes (e.g. a network stream), then in the next iteration of the loop, numBytes can be off by one, and the next read can go past the end of the encoded data in the stream. This can either cause an unexpected end of stream, and/or corruption of any subsequent data read from it.

The repro below is supposed to print all "True"s, as it does when a MemoryStream is used. However, when using a mock NetStream that simulates short reads, data corruption occurs.

Update: It seems that this issue has always been there, so it's not a new regression.

class Program
{
    class NetStream : MemoryStream
    {
        public override int Read(byte[] buffer, int offset, int count)
        {
            return base.Read(buffer, offset, count > 10 ? count - 3 : count);
        }

        public override int Read(Span<byte> destination)
        {
            return base.Read(destination.Length > 10 ? destination.Slice(0, destination.Length - 3) : destination);
        }
    }

    static void Main(string[] args)
    {
        char[] data1 = "hello world".ToCharArray();
        uint data2 = 0xABCDEF01;
        uint data3 = 0;

        using (Stream stream = new NetStream())
        {
            using (BinaryWriter writer = new BinaryWriter(stream, Encoding.Unicode, leaveOpen: true))
            {
                writer.Write(data1);
                writer.Write(data2);
                writer.Write(data3);
            }

            Console.WriteLine(stream.Length == data1.Length * sizeof(char) + 2 * sizeof(uint));

            stream.Seek(0, SeekOrigin.Begin);
            using (BinaryReader reader = new BinaryReader(stream, encoding: Encoding.Unicode, leaveOpen: true))
            {
                char[] data1b = reader.ReadChars(data1.Length);
                Console.WriteLine(data1.AsSpan().SequenceEqual(data1b));
                Console.WriteLine(stream.Position == data1.Length * sizeof(char));

                uint data2b = reader.ReadUInt32();
                Console.WriteLine(data2 == data2b);
                Console.WriteLine(stream.Position == data1.Length * sizeof(char) + sizeof(uint));
            }
        }
    }
}

UTF-8

The code below passes on .NET Framework and .NET Core 2.2, but fails on 3.0, so it's a new regression:

class Program
{
    // version for UTF-8: over-reads if 2 bytes, 3 chars left
    // Fails on .NET Core 3.0, passes on 2.2.
    static void Main(string[] args)
    {
        char[] data1 = "hello world 😃x".ToCharArray(); // 14 code points, 15 chars in UTF-16, 17 bytes in UTF-8
        uint data2 = 0xABCDEF01;
        uint data3 = 0;

        using (Stream stream = new MemoryStream())
        {
            using (BinaryWriter writer = new BinaryWriter(stream, Encoding.UTF8, leaveOpen: true))
            {
                writer.Write(data1);
                writer.Write(data2);
                writer.Write(data3);
            }

            Console.WriteLine(stream.Length == (data1.Length - 1) + 3 + 2 * sizeof(uint));

            stream.Seek(0, SeekOrigin.Begin);
            using (BinaryReader reader = new BinaryReader(stream, encoding: Encoding.UTF8, leaveOpen: true))
            {
                char[] data1b = reader.ReadChars(data1.Length);
                Console.WriteLine(data1.AsSpan().SequenceEqual(data1b));
                Console.WriteLine(stream.Position == (data1.Length - 1) + 3);

                uint data2b = reader.ReadUInt32();
                Console.WriteLine(data2 == data2b);
                Console.WriteLine(stream.Position == (data1.Length - 1) + 3 + sizeof(uint));
            }
        }
    }
}

Remarks

  • Both versions of BinaryReader fail the UTF-16 repro.
  • The UTF-8 case does not need a short read (i.e. network stream), it fails deterministically on any kind of underlying stream.
  • It seems that the reason why the UTF-8 code fails in 3.0 but passes in 2.2, is because the 2.2 runtime uses a version of BinaryReader that has a special case for handling t`his edge case for single-byte encodings:

The special case in 2.2: dotnet/coreclr/blob/ce1d090d33b400a25620c0145046471495067cc7/src/mscorlib/src/System/IO/BinaryReader.cs#L377-L386

...
numBytes = charsRemaining;

// special case for DecoderNLS subclasses when there is a hanging byte from the previous loop
DecoderNLS decoder = _decoder as DecoderNLS;
if (decoder != null && decoder.HasState && numBytes > 1)
{
    numBytes -= 1;
}

if (_2BytesPerChar)
...

No special case in 3.0 and 5.0: dotnet/coreclr/blob/9642de76d4f5e563150a90f5923b304d87d7a8b1/src/System.Private.CoreLib/shared/System/IO/BinaryReader.cs#L387-L389

...
numBytes = charsRemaining;

if (_2BytesPerChar)
...
@carlossanlop
Copy link
Member Author

Since this issue exists since .NET Framework, I will give it the 5.0 milestone.

@stephentoub
Copy link
Member

stephentoub commented Aug 21, 2019

Since this issue exists since .NET Framework, I will give it the 5.0 milestone.

@carlossanlop, you wrote above for the UTF8 issue "The code below passes on .NET Core 2.2 and fails on 3.0 when using UTF-8"... that issue passes or fails on .NET Framework? Regardless, it sounds like a regression from 2.2 that should be addressed in 3.0, no?

@carlossanlop
Copy link
Member Author

It fails on .NET Framework, which is why it was originally opened as an internal bug. Then we tested it on Core, and since it fails, I opened a bug here too.

@stephentoub
Copy link
Member

It fails on .NET Framework

I just tried your UTF8 example from above... it outputs true/true/true/true on both .NET Framework 4.8 and .NET Core 2.2, and fails with an exception on .NET Core 3.0.

@carlossanlop
Copy link
Member Author

Ah you're correct. You quoted the text I wrote above the second case.
The first case is the one that fails for all versions.
I'll update the second case example to mention it passes in Framework.

@carlossanlop
Copy link
Member Author

This issue can be split into two:

  • The UTF16 problem has always been there.
  • The UTF8 was fixed in Framework (I found the changeset that fixed it internally back in 2010), but it got removed for 3.0.

@nagya
Copy link

nagya commented Aug 22, 2019

I believe there is only one root cause, with two different symptoms. The code attempts to establish an upper limit for the number of bytes remaining in the stream that represent the char array being read, but it fails to account for the internal state of the decoder being used, and sometimes chooses an upper limit that’s too high. In 3.0, it ignores such state altogether, so the UTF-8 and UTF-16 symptoms manifest. In 2.2, it tries to account for the state, but in a manner that’s only sufficient for UTF-8, not for UTF-16, so only the UTF-16 symptom manifests. However, the root cause in either case is the same.

@jkotas
Copy link
Member

jkotas commented Aug 22, 2019

The UTF16 problem has always been there.

It is even listed as known problem in the documentation: https://docs.microsoft.com/en-us/dotnet/api/system.io.binaryreader.readchars?view=netcore-3.0#remarks

@carlossanlop
Copy link
Member Author

carlossanlop commented Aug 22, 2019

I ran the UTF8 example in both 2.2 and 3.0 and compared the code that is being executed:

  • In both versions, the BinaryReader constructor sets the value of the private readonly Decoder _decoder member by calling encoding.GetDecoder().
  • The GetDecoder() method is different between versions:
    • In 2.2, we do the following: return new UTF8Decoder(this);.
    • in 3.0, we do the following: return new DecoderNLS(this);.
  • In 2.2 and 3.0, DecoderNLS has a property internal virtual bool HasState, which always returns false.
    • In 2.2, because UTF8Decoder inherits from DecoderNLS, it overrides the HasState property with: return (this.bits != 0);.
    • In 3.0, UTF8Decoder was deleted.
  • The UTF8 example pasted above:
    • Passes in 2.2 because we have the special case inside InternalReadChars(), and when it calls HasState, it returns true when there are hanging bytes from the previous loop.
    • Fails in 3.0 because we do not have the special case. Even if I add it back, we will call HasState and it will always return false.
  • When we call _decoder.GetChars() at the end of the while loop in InternalReadChars(), it:
    • Passes in 2.2.
    • Throws in 3.0 when we call GetChars. The code is very different to the code we used to call in 2.2.

This is the exception callstack:

        System.ArgumentException : The output char buffer is too small to contain the decoded characters, encoding 'Unicode (UTF-8)' fallback 'System.Text.DecoderReplacementFallback'. (Parameter 'chars')
        Stack Trace:
          D:\coreclr\src\System.Private.CoreLib\shared\System\Text\Encoding.cs(1155,0): at System.Text.Encoding.ThrowCharsOverflow(DecoderNLS decoder, Boolean nothingDecoded)
          D:\coreclr\src\System.Private.CoreLib\shared\System\Text\Encoding.Internal.cs(1211,0): at System.Text.Encoding.GetCharsWithFallback(ReadOnlySpan`1 bytes, Int32 originalBytesLength, Span`1 chars, Int32 originalCharsLength, DecoderNLS decoder)
          D:\coreclr\src\System.Private.CoreLib\shared\System\Text\UTF8Encoding.cs(646,0): at System.Text.UTF8Encoding.GetCharsWithFallback(ReadOnlySpan`1 bytes, Int32 originalBytesLength, Span`1 chars, Int32 originalCharsLength, DecoderNLS decoder)
          D:\coreclr\src\System.Private.CoreLib\shared\System\Text\Encoding.Internal.cs(1165,0): at System.Text.Encoding.GetCharsWithFallback(Byte* pOriginalBytes, Int32 originalByteCount, Char* pOriginalChars, Int32 originalCharCount, Int32 bytesConsumedSoFar, Int32 charsWrittenSoFar, DecoderNLS decoder)
          D:\coreclr\src\System.Private.CoreLib\shared\System\Text\Encoding.Internal.cs(1017,0): at System.Text.Encoding.GetChars(Byte* pBytes, Int32 byteCount, Char* pChars, Int32 charCount, DecoderNLS decoder)
          D:\coreclr\src\System.Private.CoreLib\shared\System\Text\DecoderNLS.cs(144,0): at System.Text.DecoderNLS.GetChars(Byte* bytes, Int32 byteCount, Char* chars, Int32 charCount, Boolean flush)
          D:\coreclr\src\System.Private.CoreLib\shared\System\IO\BinaryReader.cs(443,0): at System.IO.BinaryReader.InternalReadChars(Span`1 buffer)
          D:\coreclr\src\System.Private.CoreLib\shared\System\IO\BinaryReader.cs(475,0): at System.IO.BinaryReader.ReadChars(Int32 count)
          D:\corefx\src\System.IO\tests\BinaryReader\BinaryReaderTests.cs(214,0): at System.IO.Tests.BinaryReaderTests.ReadChars_UTF8()

@jkotas since your PR refactored the BinaryReader/Writer code, can you please take a look at the UTF8 regression?

CC @JeremyKuhne

jkotas referenced this issue in jkotas/coreclr Aug 22, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455
jkotas referenced this issue in jkotas/coreclr Aug 22, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455
@jkotas jkotas reopened this Aug 24, 2019
jkotas referenced this issue in jkotas/coreclr Aug 24, 2019
…et#26324)

BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corefx Aug 25, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corert Aug 25, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/mono Aug 25, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub referenced this issue in dotnet/corert Aug 26, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub referenced this issue in dotnet/corefx Aug 26, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
akoeplinger referenced this issue in mono/mono Aug 26, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jkotas referenced this issue in dotnet/coreclr Aug 26, 2019
…) (#26356)

BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455
Dotnet-GitSync-Bot referenced this issue in Dotnet-GitSync-Bot/corefx Aug 26, 2019
…356)

BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
stephentoub referenced this issue in dotnet/corefx Aug 26, 2019
BinaryReader.ReadChars incorrectly read more than necessary from the underlying Stream when multi-byte characters straddled the read chunks.

Fixes https://github.com/dotnet/corefx/issues/40455

Signed-off-by: dotnet-bot <dotnet-bot@microsoft.com>
jonpryor referenced this issue in dotnet/android Dec 3, 2019
Changes: mono/api-snapshot@fc50bc4...45a61d9

        $ git diff --shortstat fc50bc4f...45a61d93
         22 files changed, 775 insertions(+), 474 deletions(-)

Changes: dotnet/cecil@a6c8f5e...a6a7f5c

        $ git diff --shortstat a6c8f5e1...a6a7f5c0
         55 files changed, 818 insertions(+), 530 deletions(-)

Changes: mono/corefx@1f87de3...49f1c45

        $ git diff --shortstat e4f7102b...49f1c453
         38 files changed, 1171 insertions(+), 419 deletions(-)

Changes: dotnet/linker@ebe2a1f...e8d054b

        $ git diff --shortstat ebe2a1f4...e8d054bf
         137 files changed, 5360 insertions(+), 1781 deletions(-)

Changes: mono/mono@8946e49...18920a8

        $ git diff --shortstat 8946e49a...18920a83
         1811 files changed, 47240 insertions(+), 48331 deletions(-)

Changes: xamarin/xamarin-android-api-compatibility@a61271e...50a3c52

        $ git diff --shortstat a61271e0...50a3c52d
         1 file changed, 2 insertions(+), 791 deletions(-)

Fixes: #3619

Context: https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1005448
Context: https://devdiv.visualstudio.com/DefaultCollection/DevDiv/_workitems/edit/967582
Context: https://github.com/dotnet/coreclr/issues/26370
Context: https://github.com/dotnet/coreclr/issues/26479
Context: https://github.com/dotnet/corefx/issues/40455
Context: https://github.com/dotnet/corefx/issues/40578
Context: mono/mono#7377
Context: mono/mono#12421
Context: mono/mono#12586
Context: mono/mono#14080
Context: mono/mono#14725
Context: mono/mono#14772
Context: mono/mono#15261
Context: mono/mono#15262
Context: mono/mono#15263
Context: mono/mono#15307
Context: mono/mono#15308
Context: mono/mono#15310
Context: mono/mono#15646
Context: mono/mono#15687
Context: mono/mono#15805
Context: mono/mono#15992
Context: mono/mono#15994
Context: mono/mono#15999
Context: mono/mono#16032
Context: mono/mono#16034
Context: mono/mono#16046
Context: mono/mono#16192
Context: mono/mono#16308
Context: mono/mono#16310
Context: mono/mono#16369
Context: mono/mono#16380
Context: mono/mono#16381
Context: mono/mono#16395
Context: mono/mono#16411
Context: mono/mono#16415
Context: mono/mono#16486
Context: mono/mono#16570
Context: mono/mono#16605
Context: mono/mono#16616
Context: mono/mono#16689
Context: mono/mono#16701
Context: mono/mono#16712
Context: mono/mono#16742
Context: mono/mono#16759
Context: mono/mono#16803
Context: mono/mono#16808
Context: mono/mono#16824
Context: mono/mono#16876
Context: mono/mono#16879
Context: mono/mono#16918
Context: mono/mono#16943
Context: mono/mono#16950
Context: mono/mono#16974
Context: mono/mono#17004
Context: mono/mono#17017
Context: mono/mono#17038
Context: mono/mono#17040
Context: mono/mono#17083
Context: mono/mono#17084
Context: mono/mono#17133
Context: mono/mono#17139
Context: mono/mono#17151
Context: mono/mono#17180
Context: mono/mono#17278
Context: mono/mono#17549
Context: mono/mono#17569
Context: mono/mono#17665
Context: mono/mono#17687
Context: mono/mono#17737
Context: mono/mono#17790
Context: mono/mono#17924
Context: mono/mono#17931
Context: https://github.com/mono/mono/issues/26758
Context: https://github.com/mono/mono/issues/37913
Context: dotnet/macios#7005
jonpryor referenced this issue in dotnet/android Dec 3, 2019
Changes: mono/api-snapshot@fc50bc4...45a61d9

        $ git diff --shortstat fc50bc4f...45a61d93
         22 files changed, 775 insertions(+), 474 deletions(-)

Changes: dotnet/cecil@a6c8f5e...a6a7f5c

        $ git diff --shortstat a6c8f5e1...a6a7f5c0
         55 files changed, 818 insertions(+), 530 deletions(-)

Changes: mono/corefx@1f87de3...49f1c45

        $ git diff --shortstat e4f7102b...49f1c453
         38 files changed, 1171 insertions(+), 419 deletions(-)

Changes: dotnet/linker@ebe2a1f...e8d054b

        $ git diff --shortstat ebe2a1f4...e8d054bf
         137 files changed, 5360 insertions(+), 1781 deletions(-)

Changes: mono/mono@8946e49...18920a8

        $ git diff --shortstat 8946e49a...18920a83
         1811 files changed, 47240 insertions(+), 48331 deletions(-)

Changes: xamarin/xamarin-android-api-compatibility@a61271e...50a3c52

        $ git diff --shortstat a61271e0...50a3c52d
         1 file changed, 2 insertions(+), 791 deletions(-)

Fixes: #3619

Context: https://dev.azure.com/devdiv/DevDiv/_workitems/edit/1005448
Context: https://devdiv.visualstudio.com/DefaultCollection/DevDiv/_workitems/edit/967582
Context: https://github.com/dotnet/coreclr/issues/26370
Context: https://github.com/dotnet/coreclr/issues/26479
Context: https://github.com/dotnet/corefx/issues/40455
Context: https://github.com/dotnet/corefx/issues/40578
Context: mono/mono#7377
Context: mono/mono#12421
Context: mono/mono#12586
Context: mono/mono#14080
Context: mono/mono#14725
Context: mono/mono#14772
Context: mono/mono#15261
Context: mono/mono#15262
Context: mono/mono#15263
Context: mono/mono#15307
Context: mono/mono#15308
Context: mono/mono#15310
Context: mono/mono#15646
Context: mono/mono#15687
Context: mono/mono#15805
Context: mono/mono#15992
Context: mono/mono#15994
Context: mono/mono#15999
Context: mono/mono#16032
Context: mono/mono#16034
Context: mono/mono#16046
Context: mono/mono#16192
Context: mono/mono#16308
Context: mono/mono#16310
Context: mono/mono#16369
Context: mono/mono#16380
Context: mono/mono#16381
Context: mono/mono#16395
Context: mono/mono#16411
Context: mono/mono#16415
Context: mono/mono#16486
Context: mono/mono#16570
Context: mono/mono#16605
Context: mono/mono#16616
Context: mono/mono#16689
Context: mono/mono#16701
Context: mono/mono#16712
Context: mono/mono#16742
Context: mono/mono#16759
Context: mono/mono#16803
Context: mono/mono#16808
Context: mono/mono#16824
Context: mono/mono#16876
Context: mono/mono#16879
Context: mono/mono#16918
Context: mono/mono#16943
Context: mono/mono#16950
Context: mono/mono#16974
Context: mono/mono#17004
Context: mono/mono#17017
Context: mono/mono#17038
Context: mono/mono#17040
Context: mono/mono#17083
Context: mono/mono#17084
Context: mono/mono#17133
Context: mono/mono#17139
Context: mono/mono#17151
Context: mono/mono#17180
Context: mono/mono#17278
Context: mono/mono#17549
Context: mono/mono#17569
Context: mono/mono#17665
Context: mono/mono#17687
Context: mono/mono#17737
Context: mono/mono#17790
Context: mono/mono#17924
Context: mono/mono#17931
Context: https://github.com/mono/mono/issues/26758
Context: https://github.com/mono/mono/issues/37913
Context: dotnet/macios#7005
@msftgits msftgits transferred this issue from dotnet/corefx Feb 1, 2020
@msftgits msftgits added this to the 3.0 milestone Feb 1, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 12, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.IO tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants