-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process crash with "Internal CLR error" (0x80131506) in Roslyn CI #45557
Comments
Tagging subscribers to this area: @cston Issue DetailsDescriptionRoslyn CI failed with the following error:
CLR version 5.0.100 on Windows.
|
Is a dump possible @tmat? Or is there any repro? |
Dump is available in the build's artifacts: https://dev.azure.com/dnceng/public/_build/results?buildId=906787&view=artifacts&pathAsName=false&type=publishedArtifacts |
Crash during GC:
Due to corrupted GC heap:
The bad reference should be array of object references (wrapped by ImmutableArray). |
This is hitting Roslyn multiple times a day right now. If you need more dumps to narrow down the problem you can follow the results from this search for more logs |
This crash has a very regular pattern. It is always bad reference to ImmutableArray from @JulieLeeMSFT Could you please find somebody on your team to investigate? This is hitting Roslyn CI multiple times a day right now. |
@jaredpar It would help us a lot to get a crash dump with these stress log settings:
Any chance you can enable these stress log settings in the Roslyn CI temporarily to get a dump with stresslog? We should be able to root cause the problem immediately from a dump with stresslog. The stresslog will make the runtime run a bit slower and consume more memory, but it should not be prohibitive (like 20% slower). |
Yes we can add those. Our CoreCLR jobs are not the long pole in CI so the extra 20% here will be fine. |
This sets environment variables that will help make the dumps for a CLR bug more actionable by the runtime team. dotnet/runtime#45557
@BruceForstall Please look into this. |
Had a few crashes since we added the variables that @jkotas recommended. Here is a link to one of the dumps |
This is bad GC info generated for |
Relevant fragment of the stress log:
|
Method IL:
|
Method disassembly with GC Info:
|
The bug:
|
I agree, I don't see a much better way to verify this. Once this change makes it to the SDK (and hence there is a drop available) I can open a PR that uses that drop and just script AzDO to rebuild that PR for a few days. Another option is I could just run the test in a loop on my machine for a day. That would be a pretty good approximation of our CI system. If there is a Note: when running this locally should I use the environment variables Jan suggested earlier? Basically do they make it more likely to catch the problem or does it just add more info when a problem happens? |
These env variables just add more info. These env variables should not affect the chance of hitting the problem. |
fyi, it looks like the fix should make it into 5.0.2. |
Fix dotnet#45557 for arm/arm64 Fixes dotnet#46023
Port dotnet#46059 to release/5.0 Fix dotnet#45557 for arm/arm64 Fixes dotnet#46023
@jaredpar Does this Roslyn scenario ever run on arm32 or arm64? |
Yes: This is a pretty core method in the compiler and we do have customers using the SDK on those platforms. |
Thanks for the confirmation. This helps our case to port the arm32/arm64 fix to .NET 5 servicing. |
The associated runtime version contains the fix for a CLR crash that we are hitting in our infrastructure. Moving should help us increase stability here. dotnet/runtime#45557
The associated runtime version contains the fix for a CLR crash that we are hitting in our infrastructure. Moving should help us increase stability here. dotnet/runtime#45557
Description
Roslyn CI failed with the following error:
https://dev.azure.com/dnceng/public/_build/results?buildId=906787&view=logs&j=7817cc08-46f6-529a-b675-9c2d2b016c0b&t=0fd245be-2f58-5546-b778-7c0bac845221&l=68
CLR version 5.0.100 on Windows.
The text was updated successfully, but these errors were encountered: