Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot unwind stack when stack probing hits the stack limit on Unix #11495

Closed
janvorli opened this issue Nov 16, 2018 · 4 comments · Fixed by dotnet/coreclr#26807
Closed

Cannot unwind stack when stack probing hits the stack limit on Unix #11495

janvorli opened this issue Nov 16, 2018 · 4 comments · Fixed by dotnet/coreclr#26807
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Milestone

Comments

@janvorli
Copy link
Member

I was trying to enable dumping call stack at stack overflow on Linux and I have discovered that when stack probing in a frame hits the stack limit, the unwinder is unable to unwind from that frame. The reason is that while in the probing loop, the unwinder has no way to find the return address location.
Here is an example of a function prolog with such issue. The line that does the actual probing is marked:

00007fff7d284360 55                   push    rbp
00007fff7d284361 0f1f4000             nop     dword ptr [rax]
00007fff7d284365 488d8424f07fffff     lea     rax, [rsp - 0x8010]
00007fff7d28436d 488da42400f0ffff     lea     rsp, [rsp - 0x1000]
>>> 00007fff7d284375 48850424             test    rax, qword ptr [rsp]
00007fff7d284379 483be0               cmp     rsp, rax
00007fff7d28437c 7def                 jge     0x7fff7d28436d
00007fff7d28437e 488da010800000       lea     rsp, [rax + 0x8010]
00007fff7d284385 4881ec10800000       sub     rsp, 0x8010
00007fff7d28438c 488dac2410800000     lea     rbp, [rsp + 0x8010]

It seems we have the following options here:

  • Add mov rbp, rsp after the push rsp instead of doing lea rbp, [rsp + 0x8010] much later. But currently, we can end up using rbp for the same purpose as rax in the code above (for funclets), so that would have to be sorted out.
  • Instead of inlining the probing, we could have a native helper that we would call and that would do the probing.
  • We could possibly add a new unwind code that would describe that the pushed rbp is at an offset from the stack probing limit register (rax in the code above). Since we have source code of the managed code unwinder in coreclr for Unix, we could afford doing that.

It may be more involved for ARM / ARM64 / x86 cases, so maybe the native helper would be the best option.

category:correctness
theme:stack-allocation
skill-level:expert
cost:large

@janvorli
Copy link
Member Author

cc: @jkotas, @echesakovMSFT, @BruceForstall

@BruceForstall
Copy link
Member

I think the first option won't work (easily) because we currently generate the SP probe and adjustment before we save callee-saved floating-point registers (using mov, not push), and we have to do that before establishing RBP as frame pointer. Maybe we could move FP callee-save register saving earlier, before stack probing, but it might require doing two separate SP adjustments, both properly aligned, the first to allocate space for saved FP registers and the second to allocate the rest of the frame, with the RBP establishing happening after the first SP adjustment.

We could add a Unix-only custom "Windows" unwind code, but I hope we don't have to do that.

@BruceForstall BruceForstall self-assigned this Mar 13, 2019
@echesakov
Copy link
Contributor

I prototyped a native helper on Linux-x64 (dotnet/coreclr@master...echesakovMSFT:JitStackProbeHelper) and both GDB and LLDB report correct backtrace.

echesako@echesako2:~/GitHub_21061$ $CORE_ROOT/corerun bin/Release/netcoreapp2.1/GitHub_21061.dll
Stack overflow.
Aborted (core dumped)
Thread 1 "corerun" received signal SIGSEGV, Segmentation fault.
JIT_CheckStack () at /opt/code/src/vm/amd64/jithelpers_fast.S:528
528     /opt/code/src/vm/amd64/jithelpers_fast.S: No such file or directory.
(gdb) bt
#0  JIT_CheckStack () at /opt/code/src/vm/amd64/jithelpers_fast.S:528
dotnet/coreclr#1  0x00007fff7cf62c2b in ?? ()
dotnet/coreclr#2  0x00007fffffffd710 in ?? ()
dotnet/coreclr#3  0x00007ffff639448b in CallDescrWorkerInternal () at /opt/code/src/pal/inc/unixasmmacrosamd64.inc:866
dotnet/coreclr#4  0x00007ffff622c049 in CallDescrWorkerWithHandler (pCallDescrData=0x7fffffffd8e0, fCriticalCall=0) at /opt/code/src/vm/callhelpers.cpp:70
dotnet/coreclr#5  0x00007ffff622cfb9 in MethodDescCallSite::CallTargetWorker (this=<optimized out>, pArguments=0x7fffffffd9e8, pReturnValue=<optimized out>, cbReturnValue=<optimized out>) at /opt/code/src/vm/callhelpers.cpp:600
dotnet/coreclr#6  0x00007ffff63b121b in MethodDescCallSite::Call (this=0x7fff580114d8, pArguments=0x7fffffffdab8) at /opt/code/src/vm/callhelpers.h:467
dotnet/coreclr#7  RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::$_1::operator()(RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::Param*) const::{lambda(RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::Param*)#1}::operator()(RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::Param*) const (this=<optimized out>, pParam=<optimized out>)
    at /opt/code/src/vm/assembly.cpp:1556
dotnet/coreclr#8  RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::$_1::operator()(RunMain(MethodDesc*, short, int*, REF<PtrArray>*)::Param*) const (this=<optimized out>, __EXparam=<optimized out>) at /opt/code/src/vm/assembly.cpp:1571
dotnet/coreclr#9  RunMain (pFD=<optimized out>, numSkipArgs=<optimized out>, piRetVal=<optimized out>, stringArgs=<optimized out>) at /opt/code/src/vm/assembly.cpp:1571
dotnet/coreclr#10 0x00007ffff63b16b2 in Assembly::ExecuteMainMethod (this=<optimized out>, stringArgs=0x7fffffffdf98, waitForOtherThreads=1) at /opt/code/src/vm/assembly.cpp:1681
dotnet/coreclr#11 0x00007ffff61021fc in CorHost2::ExecuteAssembly (this=<optimized out>, dwAppDomainId=<optimized out>, pwzAssemblyPath=<optimized out>, argc=<optimized out>, argv=0x0, pReturnValue=<optimized out>) at /opt/code/src/vm/corhost.cpp:460
dotnet/coreclr#12 0x00007ffff60c549c in coreclr_execute_assembly (hostHandle=<optimized out>, domainId=<optimized out>, argc=<optimized out>, argv=<optimized out>, managedAssemblyPath=<optimized out>, exitCode=<optimized out>) at /opt/code/src/dlls/mscoree/unixinterface.cpp:412
dotnet/coreclr#13 0x00000000004026ce in ExecuteManagedAssembly (currentExeAbsolutePath=<optimized out>, clrFilesAbsolutePath=<optimized out>, managedAssemblyAbsolutePath=<optimized out>, managedAssemblyArgc=<optimized out>, managedAssemblyArgv=<optimized out>) at /opt/code/src/coreclr/hosts/unixcoreruncommon/coreruncommon.cpp:457
dotnet/coreclr#14 0x000000000040180c in corerun (argc=<optimized out>, argv=<optimized out>) at /opt/code/src/coreclr/hosts/unixcorerun/corerun.cpp:149
dotnet/coreclr#15 0x00007ffff6ca3b97 in __libc_start_main (main=0x401a80 <main(int, char const**)>, argc=2, argv=0x7fffffffe408, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe3f8) at ../csu/libc-start.c:310
dotnet/coreclr#16 0x0000000000401479 in _start ()
(lldb) bt
* thread dotnet/coreclr#1, name = 'corerun', stop reason = signal SIGSEGV: invalid address (fault address: 0x7fffff7fe708)
  * frame #0: 0x00007ffff6394c65 libcoreclr.so`JIT_CheckStack at jithelpers_fast.S:528
    frame dotnet/coreclr#1: 0x00007fff7cf52c2b
    frame dotnet/coreclr#2: 0x00007ffff622c049 libcoreclr.so`CallDescrWorkerWithHandler(pCallDescrData=0x00007fffffffd900, fCriticalCall=NO) at callhelpers.cpp:70
    frame dotnet/coreclr#3: 0x00007ffff622cfb9 libcoreclr.so`MethodDescCallSite::CallTargetWorker(this=<unavailable>, pArguments=0x00007fffffffda08, pReturnValue=<unavailable>, cbReturnValue=<unavailable>) at callhelpers.cpp:600
    frame dotnet/coreclr#4: 0x00007ffff63b121b libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) [inlined] MethodDescCallSite::Call(this=<unavailable>, pArguments=<unavailable>) at callhelpers.h:467
    frame dotnet/coreclr#5: 0x00007ffff63b1204 libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) at assembly.cpp:1556
    frame dotnet/coreclr#6: 0x00007ffff63b0fea libcoreclr.so`RunMain(MethodDesc*, short, int*, REF<PtrArray>*) at assembly.cpp:1571
    frame dotnet/coreclr#7: 0x00007ffff63b0f6b libcoreclr.so`RunMain(pFD=<unavailable>, numSkipArgs=<unavailable>, piRetVal=<unavailable>, stringArgs=<unavailable>) at assembly.cpp:1571
    frame dotnet/coreclr#8: 0x00007ffff63b16b2 libcoreclr.so`Assembly::ExecuteMainMethod(this=<unavailable>, stringArgs=0x00007fffffffdfb8, waitForOtherThreads=YES) at assembly.cpp:1681
    frame dotnet/coreclr#9: 0x00007ffff61021fc libcoreclr.so`CorHost2::ExecuteAssembly(this=<unavailable>, dwAppDomainId=<unavailable>, pwzAssemblyPath=<unavailable>, argc=<unavailable>, argv=0x0000000000000000, pReturnValue=<unavailable>) at corhost.cpp:460
    frame dotnet/coreclr#10: 0x00007ffff60c549c libcoreclr.so`::coreclr_execute_assembly(hostHandle=<unavailable>, domainId=<unavailable>, argc=<unavailable>, argv=<unavailable>, managedAssemblyPath=<unavailable>, exitCode=<unavailable>) at unixinterface.cpp:412
    frame dotnet/coreclr#11: 0x00000000004026ce corerun`ExecuteManagedAssembly(currentExeAbsolutePath=<unavailable>, clrFilesAbsolutePath=<unavailable>, managedAssemblyAbsolutePath=<unavailable>, managedAssemblyArgc=<unavailable>, managedAssemblyArgv=<unavailable>) at coreruncommon.cpp:457
    frame dotnet/coreclr#12: 0x000000000040180c corerun`corerun(argc=<unavailable>, argv=<unavailable>) at corerun.cpp:149
    frame dotnet/coreclr#13: 0x00007ffff6ca3b97 libc.so.6`__libc_start_main(main=(corerun`main at corerun.cpp:161), argc=2, argv=0x00007fffffffe428, init=<unavailable>, fini=<unavailable>, rtld_fini=<unavailable>, stack_end=0x00007fffffffe418) at libc-start.c:310
    frame dotnet/coreclr#14: 0x0000000000401479 corerun`_start + 41

Another advantage of using a native helper is that prolog becomes smaller and the corresponding CodeGen becomes simpler.

@janvorli
Copy link
Member Author

Nice! It would be great to run benchmarks from the https://github.com/dotnet/performance repo with and without this change to see if it has any noticable perf impact.

@msftgits msftgits transferred this issue from dotnet/coreclr Jan 31, 2020
@msftgits msftgits added this to the Future milestone Jan 31, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 15, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI bug
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants