Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

SIGSEGV and SIGBUS cause an exit signal of SIGILL #26458

Closed
geofft opened this issue Jun 20, 2015 · 5 comments · Fixed by #27338
Closed

SIGSEGV and SIGBUS cause an exit signal of SIGILL #26458

geofft opened this issue Jun 20, 2015 · 5 comments · Fixed by #27338

Comments

@geofft
Copy link
Contributor

geofft commented Jun 20, 2015

As demonstrated on Playpen:

fn main() {
    unsafe {*(0 as *mut u32) = 0};
}
playpen: application terminated abnormally with signal 4 (Illegal instruction)

The reason for this is that, when a UNIX signal is being handled, UNIX blocks that signal while the handler is running, by default. However, this particular signal handler tries to re-raise the signal when it's done with processing. This doesn't do anything since the handler is still running, so execution continues with the next line, intrinsics::abort(), which induces SIGILL. (The code tries to reset the signal disposition to avoid re-entering the same handler, but that's not enough; the mask of blocked signals also needs to be cleared.)

There are a couple of possible solutions to this. The straightforward one is to cause the signal not to be masked. I have a patch for this that depends on my signal-FFI-bindings refactoring in #25784; I can submit it as a PR once that lands.

Another one, which I prefer, is to just let the handler terminate instead of re-raising. There was discussion about this previously, where it was noted that glibc's manual says this is undefined for "program error signals" like SIGSEGV and SIGBUS. POSIX agrees with that. However, most platforms in practice define this behavior, and allow you to return from these handlers, so that you can do things like userspace page-fault handling.

For example,

  • Google's Breakpad crash-handling library returns from SIGSEGV on Linux and on Solaris. (On Darwin they use Mach exceptions.)
  • The Oracle JVM relies on being able to return from a SIGSEGV handler: they use it as a trick for stop-the-world pauses with low overhead. (Periodically each thread will do a single read from a special page, which is ~one instruction. If the GC wants to stop the world, it'll unmap the page to cause each thread to fault, and once it's ready to continue the world, it'll remap the page and make all threads return from their signal handlers.) So any UNIX platform where the JVM works should support this.
  • GNU libsigsegv is a library for doing things like userspace paging, and it's predicated on the assumption that returning from SIGSEGV is possible. The PORTING file has a wide list of supported platforms, including Linux, Darwin, FreeBSD, OpenBSD, NetBSD, Solaris, and MinGW.

So I think that it's merely the case that returning from a program error handler is unspecified in POSIX, but just about all actual OSes we care about support it.

This has the advantage that the exact signal is re-delivered to kill the program, with the right siginfo (indicating it died because of a memory error, not because someone manually sent SIGSEGV), so the last frame in a coredump is right, dmesg prints a line, etc. We can keep the re-raise for unknown platforms, but for platforms where we know that returning from the handler works, that seems both simpler and better. (Breakpad takes this approach for the same reasons.)

The final option is just to remove this code. What it does, as far as I see, is to print an error message and die if the segfault was on the guard page, and just die otherwise. I don't think there's a compelling reason to print a message for stack overflow, especially as caught by SIGSEGV (it makes more sense if it's caught by stack probes or morestack). In any other systems language, overflowing the stack just gets you killed with SIGSEGV, and installing a special handler doesn't seem to match the runtime-removal philosophy. It made sense in the librustrt world, but I'd argue it's not useful now. But if other people are finding the handler / the error message useful, that's fine.

Cc @Zoxc for advice, as the original author of the segfault handling code.

@Zoxc
Copy link
Contributor

Zoxc commented Jun 20, 2015

If we can return from SIGSEGV and SIGBUS on some platforms, I support that option. For other platforms we should reset the mask as you suggest.

I find it very important to be able to quickly identify a stack overflow, as to not raise confusion with way more scary cases of signals. As long as you stick with safe Rust, you should get friendly error messages.

@alexcrichton
Copy link
Member

I agree with @Zoxc that having the error message is quite nice and I'd prefer to not lose it, but it sounds like the option of just returning from the signal handler on known platforms is a good idea! I like the idea of re-raising the exact same signal.

Thanks for the investigation @geofft, and feel free to submit a PR!

@geofft
Copy link
Contributor Author

geofft commented Jul 29, 2015

All the non-Windows ports currently in mk/cfg support returning from a SIGSEGV handler:

(There are some overIaps and some of these sources attest to way more, I just wanted to confirm that at least one source attests to each current Rust target. I don't actually think this is arch-specific on any OS.) So this is safe to do unconditionally, as @alexcrichton has done in PR #27338, and I'm happy to let that PR close this.

Porters to new UNIXes may need to watch for this for correctness. This is also safe on Solaris (per Breakpad and JVM); I think the only non-supporting UNIX that Rust might realistically get ported to is MINIX 3.

@alexcrichton
Copy link
Member

Awesome, thanks for the investigation @geofft!

alexcrichton added a commit to alexcrichton/rust that referenced this issue Aug 10, 2015
This commit removes all morestack support from the compiler which entails:

* Segmented stacks are no longer emitted in codegen.
* We no longer build or distribute libmorestack.a
* The `stack_exhausted` lang item is no longer required

The only current use of the segmented stack support in LLVM is to detect stack
overflow. This is no longer really required, however, because we already have
guard pages for all threads and registered signal handlers watching for a
segfault on those pages (to print out a stack overflow message). Additionally,
major platforms (aka Windows) already don't use morestack.

This means that Rust is by default less likely to catch stack overflows because
if a function takes up more than one page of stack space it won't hit the guard
page. This is what the purpose of morestack was (to catch this case), but it's
better served with stack probes which have more cross platform support and no
runtime support necessary. Until LLVM supports this for all platform it looks
like morestack isn't really buying us much.

cc rust-lang#16012 (still need stack probes)
Closes rust-lang#26458 (a drive-by fix to help diagnostics on stack overflow)
bors added a commit that referenced this issue Aug 10, 2015
This commit removes all morestack support from the compiler which entails:

* Segmented stacks are no longer emitted in codegen.
* We no longer build or distribute libmorestack.a
* The `stack_exhausted` lang item is no longer required

The only current use of the segmented stack support in LLVM is to detect stack
overflow. This is no longer really required, however, because we already have
guard pages for all threads and registered signal handlers watching for a
segfault on those pages (to print out a stack overflow message). Additionally,
major platforms (aka Windows) already don't use morestack.

This means that Rust is by default less likely to catch stack overflows because
if a function takes up more than one page of stack space it won't hit the guard
page. This is what the purpose of morestack was (to catch this case), but it's
better served with stack probes which have more cross platform support and no
runtime support necessary. Until LLVM supports this for all platform it looks
like morestack isn't really buying us much.

cc #16012 (still need stack probes)
Closes #26458 (a drive-by fix to help diagnostics on stack overflow)

r? @brson
@Ecordonnier
Copy link

Is current implementation the intended behavior? On Linux, rust processes don't dump core when receiving SIGSEGV or SIGBUS using e.g. "kill -11 $(pidof process)". Instead, they dump core the second time they receive SIGSEGV/SIGBUS (after the default signal handler has been restored). This is against the Linux semantic of those signals AFAIK. Typically a Linux process dumps core when receiving SIGSEGV.

tavianator added a commit to tavianator/bfs that referenced this issue Nov 23, 2024
This makes the death look exactly like it would if we didn't handle the
signal at all.  Coredumps will point at the right instruction, segfaults
will get logged in dmesg again, etc.

Technically POSIX says this is undefined, but if we get a fault, we've
already done something undefined anyway ;)

Link: rust-lang/rust#26458
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants