Incompatibility in handling of SIGSEGV between ChakraCore and CoreCLR

I discovered an incompatibility in handling of the segmentation fault signal between the ChakraCore engine and CoreCLR. The issue IMO impacts all recent ChakraCore versions. It is triggered by a segmentation fault in user code (e.g. any `NullReferenceException`) in a CoreCLR-driven process and manifests itself as a stack overflow / stack smashing that crashes the process.

The following occurs after the ChakraCore engine is loaded into a CoreCLR process when a segmentation fault is triggered:

```
Process is terminating due to StackOverflowException.
Aborted (core dumped)
```

The core dump contains the following stack:

```
frame #0: 0x00007ff9a96d7fff libc.so.6`gsignal + 207
frame #1: 0xfffffffe7fffffff
frame #2: 0x00007ff9a979e1f7 libc.so.6`__fortify_fail + 55
frame #3: 0x00007ff9a979e1c0 libc.so.6`__stack_chk_fail + 16
frame #4: 0x00007ff44c0e94a2 libChakraCore.so`sigsegv_handler(int, siginfo_t*, void*) + 171
```

When debugging the crash in _lldb_ we can see a more informative stack:

```
* thread #1, name = 'dotnet', stop reason = signal SIGABRT
  * frame #0: libc.so.6`__GI_raise(sig=2) at raise.c:51
    frame #1: libc.so.6`__GI_abort at abort.c:79
    frame #2: 0x00007f7616872e73 libcoreclr.so`PROCAbort + 19
    frame #3: 0x00007f761683c262 libcoreclr.so`sigsegv_handler(int, siginfo_t*, void*) + 338
    frame #4: 0x00007f760df08187 libclrjit.so`sigsegv_handler(int, siginfo_t*, void*) + 247
    frame #5: 0x00007f757571146b libChakraCore.so`sigsegv_handler(int, siginfo_t*, void*) + 171
```

According to my investigation, the stack overflow is only _virtual_. CoreCLR registers its SIGSEGV handler [with the the `SA_ONSTACK` flag](https://github.com/dotnet/coreclr/blob/75d1d390c1b631eeb99d90f92f0ccdb840367b7d/src/pal/src/exception/signal.cpp#L251) that instruct the kernel to execute the handler on a separate stack. On the other hand, ChakraCore registers its handler [without the flag](https://github.com/Microsoft/ChakraCore/blob/f9a9d43df125835eaf5307b448c215e1e9a12617/pal/src/exception/signal.cpp#L583). The discrepancy is IMO the reason why [the CoreCLR SO detection](https://github.com/dotnet/coreclr/blob/75d1d390c1b631eeb99d90f92f0ccdb840367b7d/src/pal/src/exception/signal.cpp#L467) malfunctions and aborts the process. It seems that the SO detector cannot deal with the fact that the SIGSEGV handler is _not_ executing on the separate stack CoreCLR prepared during PAL initialization.

**A small repro project can be downloaded [here](https://github.com/Microsoft/ChakraCore/files/2904622/SegmentationFaultRepro.zip)**. Crashes consistently on Ubuntu 18.04.

Once I patched the [registration of the SIGSEGV handler](https://github.com/Microsoft/ChakraCore/blob/f9a9d43df125835eaf5307b448c215e1e9a12617/pal/src/exception/signal.cpp) in ChakraCore (to align it with the CoreCLR version) the crash was gone:

```diff
73c73
< static void handle_signal(int signal_id, SIGFUNC sigfunc, struct sigaction *previousAction);
---
> static void handle_signal(int signal_id, SIGFUNC sigfunc, struct sigaction *previousAction, int additionalFlags = 0);
120c120
<     handle_signal(SIGSEGV, sigsegv_handler, &g_previous_sigsegv);
---
>     handle_signal(SIGSEGV, sigsegv_handler, &g_previous_sigsegv, SA_ONSTACK);
579c579
< void handle_signal(int signal_id, SIGFUNC sigfunc, struct sigaction *previousAction)
---
> void handle_signal(int signal_id, SIGFUNC sigfunc, struct sigaction *previousAction, int additionalFlags)
583c583
<     newAction.sa_flags = SA_RESTART;
---
>     newAction.sa_flags = SA_RESTART | additionalFlags;
591a592,598
>
> #ifdef INJECT_ACTIVATION_SIGNAL
>     if ((additionalFlags & SA_ONSTACK) != 0)
>     {
>         sigaddset(&newAction.sa_mask, INJECT_ACTIVATION_SIGNAL);
>     }
> #endif

```

I don't know whether the issue lies within the CoreCLR SO detector or ChakraCore. Either the handler registration has to be reconciled or the SO detector has to account for the fact that an incompatible SIGSEGV handler may be registered later in the process.

The issue has been previously reported as #4893 and #5781. There may be also [another incompatibility](https://github.com/dotnet/diagnostics/issues/122#issuecomment-464617171) between ChakraCore and CoreCLR that prevents analysis of managed core dumps (maybe the ChakraCore PAL is an incompatible copy-paste from CoreCLR?).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Incompatibility in handling of SIGSEGV between ChakraCore and CoreCLR #5973

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Incompatibility in handling of SIGSEGV between ChakraCore and CoreCLR #5973

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions