Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Span CopyTo Span - Crashes .Net #13701

Closed
abbotware opened this issue Oct 31, 2019 · 21 comments
Closed

Span CopyTo Span - Crashes .Net #13701

abbotware opened this issue Oct 31, 2019 · 21 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-reliability Reliability/stability related issue (stress, load problems, etc.) tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Milestone

Comments

@abbotware
Copy link

abbotware commented Oct 31, 2019

There appears to be some sort of threading issue in the System.Memory 4.5.3 running on .Net Framework for the following code:

Memory<byte> memory = new byte[100];
ReadOnlySpan<byte> temp = new byte[30];
temp.CopyTo(memory.Span);

attached is a sample program that will crash the error:

The process was terminated due to an internal error in the .NET Runtime at IP 00007FFE09357BFD (00007FFE091A0000) with exit code 80131506.

category:correctness
theme:gc-info
skill-level:expert
cost:medium

@abbotware
Copy link
Author

abbotware commented Oct 31, 2019

This program crashes on .Net Framework 4.72 and 4.8 using System.Memory" Version="4.5.3"
CPU has 8 cores

Program was written as a stress test so it does the bare minimum amount of code to reproduce the crash

Program.txt

@scalablecory scalablecory transferred this issue from dotnet/corefx Oct 31, 2019
@GrabYourPitchforks
Copy link
Member

GrabYourPitchforks commented Oct 31, 2019

Confirmed on Win10 18363.418 w/ .NET Framework 4.8 x64. Crash is coming from the GC, which might imply that there's a ref byte somewhere in System.Memory.dll that isn't properly being GC tracked.

@abbotware
Copy link
Author

GC sounds plausible - from my tests - If you reduce the number of new / allocations (ie move the allocation out of the loops)... it won't crash..

@eerhardt
Copy link
Member

eerhardt commented Nov 21, 2019

I caught a Managed Debugging Assistant with this code:

Managed Debugging Assistant 'FatalExecutionEngineError' 
  Message=Managed Debugging Assistant 'FatalExecutionEngineError' : 'The runtime has 
encountered a fatal error. The address of the error was at 0x1996a6d6, on thread 0xb2078. The error 
code is 0xc0000005. This error may be a bug in the CLR or in the unsafe or non-verifiable portions of 
user code. Common sources of this bug include user marshaling errors for COM-interop or PInvoke, 
which may corrupt the stack.'

@jkotas
Copy link
Member

jkotas commented Nov 22, 2019

https://github.com/dotnet/coreclr/issues/27924 is the root cause of this crash.

@abbotware
Copy link
Author

@jkotas - if a root cause has been identified in the jitter .cpp code - I suspect this can't be fixed with a new System.Memory dll... will it require a patch to the framework ?

@jkotas
Copy link
Member

jkotas commented Nov 24, 2019

It can be worked around in System.Memory by avoiding the code pattern that hits the JIT bug.

@abbotware
Copy link
Author

any update on a fix for this issue?

@jkotas
Copy link
Member

jkotas commented Jan 3, 2020

We are evaluating several options for addressing this. It is going to take several months to get the fix out for .NET Framework, current ETA May 2020.

CarolEidt referenced this issue in CarolEidt/coreclr Jan 10, 2020
This is the fix for #27924. This is a GC hole bug that was found externally, #27590.
The cause is that the JIT was using the target type of the subtract when it needed
to make a copy of the source, but it needs to use the source type.

## Customer Impact
Corruption of state that is non-deterministic and hard to track down.

## Regression?
Not a recent regression, but exposed by Unsafe.ByteOffset.

## Testing
The fix has been verified in the runtime repo.

## Risk
Low: The fix is straightfoward and only impacts 3 lines of code.
@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@jkotas jkotas added tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed reliability labels Feb 4, 2020
@GrabYourPitchforks GrabYourPitchforks added this to the 5.0 milestone Feb 21, 2020
@maryamariyan maryamariyan added the untriaged New issue has not been triaged by the area owner label Feb 26, 2020
@ericstj ericstj removed the untriaged New issue has not been triaged by the area owner label Jul 9, 2020
@Anipik
Copy link
Contributor

Anipik commented Jul 17, 2020

It seems like it was fixed for net core by #1059
@jkotas any update on fix for .NetFramework ?

@Anipik Anipik modified the milestones: 5.0.0, Future Jul 17, 2020
@jkotas
Copy link
Member

jkotas commented Jul 17, 2020

The fix for .NET Framework is tracked by https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1026837 in the internal bug database.

@JulieLeeMSFT Could you please look into what we want to do here?

cc @dotnet/jit-contrib @eerhardt @GrabYourPitchforks

@JulieLeeMSFT
Copy link
Member

@briansull please port the .NET Core fix to Framework.

@JulieLeeMSFT JulieLeeMSFT added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 17, 2020
@GrabYourPitchforks
Copy link
Member

Huh, not sure why that internal issue was assigned to me. I've given it to Brian.

@briansull
Copy link
Contributor

@jkotas
Is there a repro that can be run on the desktop?

D:\Public\Bugs\Runtime_13818>csc /o /unsafe repro.cs
Microsoft (R) Visual C# Compiler version 3.6.0-4.20251.5 (910223b6)
Copyright (C) Microsoft Corporation. All rights reserved.

repro.cs(18,18): error CS0103: The name 'Unsafe' does not exist in the current context

@eerhardt
Copy link
Member

repro.cs(18,18): error CS0103: The name 'Unsafe' does not exist in the current context

You need to add a reference to https://www.nuget.org/packages/System.Runtime.CompilerServices.Unsafe/

@briansull
Copy link
Contributor

How do I add that reference?

@eerhardt
Copy link
Member

  1. Create an empty directory and a Repro.csproj with the following contents:
<Project Sdk="Microsoft.NET.Sdk">

  <PropertyGroup>
    <OutputType>Exe</OutputType>
    <TargetFramework>net48</TargetFramework>
  </PropertyGroup>

  <ItemGroup>
    <PackageReference Include="System.Runtime.CompilerServices.Unsafe" Version="4.7.1" />
  </ItemGroup>
</Project>
  1. Add the code from Bad GCInfo generated for Unsafe.ByteOffset #13818 to a repro.cs file next to the above .csproj file.
  2. dotnet run -c Release from the directory with Repro.csproj and repro.cs files.

@briansull
Copy link
Contributor

Thanks, I have a repro case and have tested the fix. It looks good

@briansull
Copy link
Contributor

I have checked in the fix for this issue into the Desktop .Net runtime for 4.8 and 4.7

The fix is expected to ship in the latter part of September as part of that month’s non security release. (barring any unexpected issues)

@briansull
Copy link
Contributor

1026837 | [4.8] Possible data corruption in .NET Framework when using Span-based APIs
https://devdiv.visualstudio.com/DevDiv/_workitems/edit/1026837

@CarolEidt
Copy link
Contributor

Thanks @briansull!

@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-reliability Reliability/stability related issue (stress, load problems, etc.) tracking-external-issue The issue is caused by external problem (e.g. OS) - nothing we can do to fix it directly
Projects
None yet
Development

No branches or pull requests

10 participants