-
Notifications
You must be signed in to change notification settings - Fork 13k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Great stack overflow error messages #51405
Comments
I often wish that this was the case, given that most managed languages have this feature and preventing segfaults is a major goal of rust. Unfortunately, this could break panic safety and introduce undefined behavior.
If stack overflows could trigger a panic then both First of all, we could unwind the stack without calling the destructors, which should avoid most cases of untrusted code seeing unsafe states. The other alternative is a system to mark functions as |
Thank you for the pointers. I'm working on the assumption that any thread that stack overflows is going to trigger an abort in the process. I.e. we're not planning on having processes that can survive a stack overflow in one thread. Please disabuse me of this notion if not correct! If an abort is the only outcome of a stack overflow, then leaking memory is probably ok as the OS is just about to tear down the process. I suspect full 'unwinding' is unnecessary - can we walk the stack's return addresses given we're really just after a good stack trace to show how the overflow occurred. For the first cut a list of addresses would show how many frames were in the loop which is more info than none. |
So what you want is just a backtrace, not actual unwinding. That would probably be useful for debugging, yeah. |
Ok so instead of using libunwind if we use glibc's backtrace in this case we should be able to have both safety and debug info on platforms where glibc backtrace is supported. I get that we might not get such a good stacktrace in release mode versus debug mode due to optimisations, but anything's better than nothing. I couldn't see unwinding in glibc's backtrace source: |
Our current backtrace implementation doesn't unwind either. The main complexity in doing this is making sure we have enough stack space to create the backtrace even though we've already run out of stack space. |
On Unix, this is running in signal context, which uses the alternative signal stack, which is usually a few KB in size (smallest size on supported Linux platforms seems to be 8K). It's possible to reconfigure this size using EDIT: On Windows, Rust uses Vectored Exception Handling to register a handler that detects and reports a stack overflow. The stack size guarantee can be set with Now whether it's even possible to use libbacktrace/libunwind or StackWalk64 from the handler while the stack is in this overflowed state yet remains to be seen, but capturing a stacktrace on segfaults seems possible at least. |
If we'd be running the backtrace in the signal handler, we'd need to switch back to libbacktrace's mmap allocator rather than malloc/free which I seem to remember having some pretty severe perf issues. Not sure if the allocator can be configured at runtime. |
Relevant: #51408 (removes libbacktrace) |
Thanks @jonas-schievink, that's great instead of relying on backtrace_symbols or backtrace_symbols_fd. We'd still need to walk the stack frames for a pure rust solution - maybe I'm missing something but I couldn't see anything currently built / being built that tries to walk the stack in pure rust. I tested the above code on OSX and it worked a treat. But it depends on libunwind and looks to me like libunwind does indeed unwind when creating the backtrace so as @Techcable mentioned it won't be panic safe. I'm going to try and see if I can get glibc's backtrace working (no luck yet). (Just thinking outside the box, one could spawn a separate thread to output the backtrace and pass it a reference to the overflowed thread's stack and join on the thread. It sounds like the major OSes give you enough stack during a stack overflow that we don't need to go there.) |
Using libunwind is not the same thing as unwinding the stack. Taking a backtrace with libbacktrace or glibc backtrace or backtrace_symbols are all the same in that they do not change the state of the stack. |
It's more accurate to say that libunwind is merely being used to walk the stack to build up a stacktrace (hence why the windows equivalent is called |
The java virtual machine (and others) typically reserve a separate 'yellow zone' and 'red zone' in the stack space which is used for the JVM as reserved space for handling stack overflows [1]. While this approach is often taken by fast managed languages like V8, C#, and the JVM, reserving an extra zone of stack space is not necessarily an appropriate choice for Rust. While it has exactly zero overhead in the common case, this approach requires a few KB of reserved stack space for each thread and extra system calls on thread creation. This is kind of uncharted territory since rust is both a hardcore system programming language like C/C++ but also has much of the safety and convenience of a managed language. |
@sfackler libbacktrace's allocator can't be configured at runtime.However, the mmap allocator is enabled by default when mmap is present so it should already be enabled on all the major unix systems. The presense of the mmap allocator and signal safety can be tested at compile time by the C define flag Also, we may not even need to use libbacktrace at all, since libunwind provides a signal-safe |
We do not use the system libbacktrace. We use our fork: https://github.com/rust-lang-nursery/libbacktrace/tree/f4d02bbdbf8a2c5a31f0801dfef597a86caad9e3, which in particular has the mmap allocator disabled: 2f3c412 |
If it's anyone's interest, I've opened rust-lang/compiler-builtins#304 . Essentially if we add CFI to the |
@da-x totally still want joyful stack overflows. I got as far as getting a stack trace on OSX, but got bogged down trying to find a cross-OS way for https://github.com/gilescope/findshlibs to deal with sections and segments as they're a bit different in OSX and Linux. I don't mind how we do it, it would be great to have a good stacktrace. |
@gilescope would using a library like |
@jyn514 However, I'm not sure how
|
On Windows this is reasonably simple. There are few special restrictions on what you can do inside an exception handler, so it would be trivial to get a backtrace and print it out. The exception record provides a |
Is there any update on this? I just ran into this issue in a graphics application I'm working on (MacOS) and I'm still not sure how to get a debug print out. rustc --version: rustc 1.41.0 (5e1a799 2020-01-27) |
Not that I’m aware of. I backed off this a year ago ostensibly because I had bit off a bit more than I could chew... Sent with GitHawk |
@sjep Run it in a debugger and use the debugger's backtrace facility instead. |
As I said, this is a graphics application on MacOS, I can't easily virtualize (into docker for instance) and it's particularly difficult to set up gdb natively on Mac. I do agree that this seems to be the best workaround. I mainly upvote @gilescope in the need for a proper debug stack peek. |
lldb is available on MacOS. |
Thanks for the tip - this is certainly the best workaround 👍 |
I've published a yolo workaround for those who don't want to fire gdb: https://docs.rs/backtrace-on-stack-overflow/0.1.0/backtrace_on_stack_overflow/fn.enable.html |
Rust produces bad error messages on stack overflow, like "thread 'foo' has overflowed its stack" which provides very little insight into where the recursion that caused the stack to overflow occurred. See rust-lang/rust#51405 for details. This commit adds a SIGSEGV handler that attempts to print a backtrace, following the approach in the backtrace-on-stack-overflow crate. I copied the code from that crate into Materialize and tweaked it because it's a very small amount of code that we'll likely need to modify, and I wanted to improve its error handling. In my manual testing this produces a nice backtrace when Materialize overflows its stack.
Rust produces bad error messages on stack overflow, like "thread 'foo' has overflowed its stack" which provides very little insight into where the recursion that caused the stack to overflow occurred. See rust-lang/rust#51405 for details. This commit adds a SIGSEGV handler that attempts to print a backtrace, following the approach in the backtrace-on-stack-overflow crate. I copied the code from that crate into Materialize and tweaked it because it's a very small amount of code that we'll likely need to modify, and I wanted to improve its error handling. In my manual testing this produces a nice backtrace when Materialize overflows its stack.
Rust produces bad error messages on stack overflow, like "thread 'foo' has overflowed its stack" which provides very little insight into where the recursion that caused the stack to overflow occurred. See rust-lang/rust#51405 for details. This commit adds a SIGSEGV handler that attempts to print a backtrace, following the approach in the backtrace-on-stack-overflow crate. I copied the code from that crate into Materialize and tweaked it because it's a very small amount of code that we'll likely need to modify, and I wanted to improve its error handling. In my manual testing this produces a nice backtrace when Materialize overflows its stack.
Rust produces bad error messages on stack overflow, like "thread 'foo' has overflowed its stack" which provides very little insight into where the recursion that caused the stack to overflow occurred. See rust-lang/rust#51405 for details. This commit adds a SIGSEGV handler that attempts to print a backtrace, following the approach in the backtrace-on-stack-overflow crate. I copied the code from that crate into Materialize and tweaked it because it's a very small amount of code that we'll likely need to modify, and I wanted to improve its error handling. In my manual testing this produces a nice backtrace when Materialize overflows its stack.
I just wanted to bump this feature because I'm also interested in it. |
Since `backtrace` requires locking and memory allocation, it cannot be used from inside a signal handler. Instead, this uses `libunwind` and `dladdr`, even though both of them are not guaranteed to be async-signal-safe, strictly speaking. However, at least LLVM's libunwind (used by macOS) has a [test] for unwinding in signal handlers, and `dladdr` is used by `backtrace_symbols_fd` in glibc, which it [documents] as async-signal-safe. In practice, this hack works well enough on GNU/Linux and macOS (and perhaps some other platforms in the future). Realistically, the worst thing that can happen is that the stack overflow occurred inside the dynamic loaded while it holds some sort of lock, which could result in a deadlock if that happens in just the right moment. That's unlikely enough and not the *worst* thing to happen considering that a stack overflow is already an unrecoverable error and most likely indicates a bug. Fixes rust-lang#51405 [test]: https://github.com/llvm/llvm-project/blob/a6385a3fc8a88f092d07672210a1e773481c2919/libunwind/test/signal_unwind.pass.cpp [documents]: https://www.gnu.org/software/libc/manual/html_node/Backtraces.html#index-backtrace_005fsymbols_005ffd
Since `backtrace` requires locking and memory allocation, it cannot be used from inside a signal handler. Instead, this uses `libunwind` and `dladdr`, even though both of them are not guaranteed to be async-signal-safe, strictly speaking. However, at least LLVM's libunwind (used by macOS) has a [test] for unwinding in signal handlers, and `dladdr` is used by `backtrace_symbols_fd` in glibc, which it [documents] as async-signal-safe. In practice, this hack works well enough on GNU/Linux and macOS (and perhaps some other platforms in the future). Realistically, the worst thing that can happen is that the stack overflow occurred inside the dynamic loaded while it holds some sort of lock, which could result in a deadlock if that happens in just the right moment. That's unlikely enough and not the *worst* thing to happen considering that a stack overflow is already an unrecoverable error and most likely indicates a bug. Fixes rust-lang#51405 [test]: https://github.com/llvm/llvm-project/blob/a6385a3fc8a88f092d07672210a1e773481c2919/libunwind/test/signal_unwind.pass.cpp [documents]: https://www.gnu.org/software/libc/manual/html_node/Backtraces.html#index-backtrace_005fsymbols_005ffd
As a coder
Given I have a fn main() { main() }
Then I expect the following output:
Obviously this is an example of how Java handles stack overflows, but you get the idea.
There didn't seem to be a tracking issue for this, so here is one.
(Some discussion here: https://users.rust-lang.org/t/how-to-diagnose-a-stack-overflow-issues-cause/17320/9 )
I've had a little look and I naively think we need to do something like this in sys_common/util.rs
Quite possibly one can ditch the first panic checks. I'm sure there's lots of concerns here, e.g. have we got enough stack headroom to report without going pop.
The text was updated successfully, but these errors were encountered: