Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[x86 Windows] GC stackwalking crash with R2R PInvoke frames #31809

Closed
ViktorHofer opened this issue Feb 5, 2020 · 9 comments
Closed

[x86 Windows] GC stackwalking crash with R2R PInvoke frames #31809

ViktorHofer opened this issue Feb 5, 2020 · 9 comments
Labels
arch-x86 area-ReadyToRun-coreclr os-windows tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Milestone

Comments

@jkotas
Copy link
Member

jkotas commented Feb 5, 2020

We crash during GC while stackwalking GetFullPathNameW:

08 00dfc9e0 712d180b coreclr!WKS::GCHeap::WaitUntilGCComplete+0x24 [F:\workspace\_work\1\s\src\coreclr\src\gc\gcee.cpp @ 292] 
09 00dfca24 7134bcfe coreclr!Thread::RareDisablePreemptiveGC+0xc9 [F:\workspace\_work\1\s\src\coreclr\src\vm\threadsuspend.cpp @ 2530] 
0a 00dfcaa0 70d68b6b coreclr!JIT_PInvokeEndRarePath+0x2e [F:\workspace\_work\1\s\src\coreclr\src\vm\jithelpers.cpp @ 4923] 
0b 00dfcaf0 099055f7 Interop+Kernel32.GetFullPathNameW(Char ByRef, UInt32, Char ByRef, IntPtr)+0x57
0c 00dfcb08 09907cc2 System_Private_CoreLib!System.IO.PathHelper.GetFullPathName(System.ReadOnlySpan`1<Char>, System.Text.ValueStringBuilder ByRef)+0x989fbb47 [/_/src/libraries/System.Private.CoreLib/src/System/IO/PathHelper.Windows.cs @ 91] 

@jkotas
Copy link
Member

jkotas commented Feb 5, 2020

#31659 hit it as well. That one crashes under:

08 0a66e4c0 73a5bcfe coreclr!Thread::RareDisablePreemptiveGC+0xc9 [F:\workspace\_work\1\s\src\coreclr\src\vm\threadsuspend.cpp @ 2530] 
09 0a66e53c 7347a038 coreclr!JIT_PInvokeEndRarePath+0x2e [F:\workspace\_work\1\s\src\coreclr\src\vm\jithelpers.cpp @ 4923] 
0a 0a66e584 0af36ddd Interop+Ole32.CoCreateGuid(System.Guid ByRef)+0x38
0b 0a66e5a4 0a698408 System_Private_CoreLib!System.Guid.NewGuid()+0x97a54e5d [/_/src/libraries/System.Private.CoreLib/src/System/Guid.Windows.cs @ 23] 
0c 0a66e658 0a6983ad CoreFx_Private_TestUtilities!System.IO.FileCleanupTestBase..ctor()+0x40
0d 0a66e664 0b2f0c8d System_IO_FileSystem_Tests!System.IO.Tests.FileSystemTest..ctor()+0x1d

@jkotas jkotas added blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-reliability Reliability/stability related issue (stress, load problems, etc.) and removed area-System.Xml labels Feb 5, 2020
@jkotas jkotas changed the title System.Xml.Xsl.XslTransformApi.Tests crashing [x86 Windows] GC stackwalking crash with R2R PInvoke frames Feb 5, 2020
@jkotas
Copy link
Member

jkotas commented Feb 5, 2020

@fadimounir This is most likely caused by #26834. I am going to revert it. Sorry...

@jkotas
Copy link
Member

jkotas commented Feb 5, 2020

Revert in #31811

@fadimounir
Copy link
Contributor

@jkotas Ok thanks. I'll take a look at it. Quite interesting... I've been running a couple of instances of gcstress tests with the fix for a couple of days, and none of them crashed

@jkotas
Copy link
Member

jkotas commented Feb 5, 2020

This may be the type of issue that gc stress is not good at catching. Let me know if you need to help with a repro. I think running Guid.NewGuid and some allocations, on multiple threads, with release build, and tiered compilation disabled should reproduce it reliably.

@BruceForstall BruceForstall added this to the 5.0 milestone Feb 6, 2020
@BruceForstall BruceForstall added area-ReadyToRun-coreclr and removed untriaged New issue has not been triaged by the area owner area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Feb 6, 2020
@jkotas jkotas removed the blocking-clean-ci Blocking PR or rolling runs of 'runtime' or 'runtime-extra-platforms' label Mar 1, 2020
@fadimounir
Copy link
Contributor

Let me know if you need to help with a repro

@jkotas I tried to repro this with a multi-threaded test that calls Guid.NewGuid() and does a bunch of allocations, and GC.Collect calls in parallel, but couldn't get it to fail (tested with the pinvoke changes that you reverted). Tried both release and checked builds, with tiered compilation disabled as you suggested to keep using the R2R code.
I'll leave the test running for a while longer, maybe it would get luck and fail...

I think at this point, it would be more useful to investigated a dump file but looks like the one mentioned in this issue is no longer available.

Any suggestions? Create a PR with the changes and keep running CI until it fails?

@jkotas
Copy link
Member

jkotas commented Mar 5, 2020

I am not able to reproduce it either. Yes, create PR and try to rerun x86 legs on it a few times. If you are not able to hit it after a day of trying, commit it and hopefully we are going to get a dump eventually.

@jkotas
Copy link
Member

jkotas commented Apr 17, 2020

Duplicate of #34279

@jkotas jkotas marked this as a duplicate of #34279 Apr 17, 2020
@jkotas jkotas closed this as completed Apr 17, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-x86 area-ReadyToRun-coreclr os-windows tenet-reliability Reliability/stability related issue (stress, load problems, etc.)
Projects
None yet
Development

No branches or pull requests

5 participants