Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exception during close causes hang at "Dumping the backtrace." #82102

Closed
naelstrof opened this issue Sep 22, 2023 · 3 comments · Fixed by #82163
Closed

Exception during close causes hang at "Dumping the backtrace." #82102

naelstrof opened this issue Sep 22, 2023 · 3 comments · Fixed by #82163

Comments

@naelstrof
Copy link
Contributor

naelstrof commented Sep 22, 2023

Godot version

v4.2.dev.mono.custom_build.fe5b1c8d4, v4.1.1

System information

Arch linux, AMD Ryzen 7 5800X3D x86_64, Radeon RX 6950 XT

Issue description

When closing godot-mono via a GetTree().Quit() or by alt-f4ing, occasionally (once every five runs), a SIGSEV exception gets thrown (caught via debugger):

__cxxabiv1::__dynamic_cast(const void *, const __cxxabiv1::__class_type_info *, const __cxxabiv1::__class_type_info *, ptrdiff_t) (@__dynamic_cast:18)
RefCounted* Object::cast_to<RefCounted>(Object*) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/object.h:774)
CSharpLanguage::_instance_binding_reference_callback(void*, void*, unsigned char) (/home/naelstrof/projects/godot-mono/src/godot-mono/modules/mono/csharp_script.cpp:1370)
Object::_instance_binding_reference(bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/object.h:670)
RefCounted::unreference() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.cpp:89)
Ref<InputEvent>::unref() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.h:209)
Ref<InputEvent>::~Ref() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.h:222)
HashSet<Ref<InputEvent>, HashMapHasherDefault, HashMapComparatorDefault<Ref<InputEvent>>>::clear() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/templates/hash_set.h:248)
HashSet<Ref<InputEvent>, HashMapHasherDefault, HashMapComparatorDefault<Ref<InputEvent>>>::~HashSet() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/templates/hash_set.h:465)
Input::~Input() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/input/input.cpp:1623)
void memdelete<Input>(Input*) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/os/memory.h:109)
Main::cleanup(bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/main/main.cpp:3739)
main (/home/naelstrof/projects/godot-mono/src/godot-mono/platform/linuxbsd/godot_linuxbsd.cpp:76)
___lldb_unnamed_symbol3190 (@___lldb_unnamed_symbol3190:29)
__libc_start_main (@__libc_start_main:44)
_start (@_start:15)

No big deal, you'd think the crash handler would catch it and give a backtrace. But it gets to here... then immediately throws another SIGSEV exception (also caught via debugger):

OS_Unix::execute(String const&, List<String, DefaultAllocator> const&, String*, int*, bool, MutexImpl<std::recursive_mutex>*, bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/drivers/unix/os_unix.cpp:480)
handle_crash(int) (/home/naelstrof/projects/godot-mono/src/godot-mono/platform/linuxbsd/crash_handler_linuxbsd.cpp:104)
___lldb_unnamed_symbol15378 (@___lldb_unnamed_symbol15378:35)
___lldb_unnamed_symbol15368 (@___lldb_unnamed_symbol15368:114)
___lldb_unnamed_symbol3300 (@___lldb_unnamed_symbol3300:4)
__cxxabiv1::__dynamic_cast(const void *, const __cxxabiv1::__class_type_info *, const __cxxabiv1::__class_type_info *, ptrdiff_t) (@__dynamic_cast:15)
RefCounted* Object::cast_to<RefCounted>(Object*) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/object.h:774)
CSharpLanguage::_instance_binding_reference_callback(void*, void*, unsigned char) (/home/naelstrof/projects/godot-mono/src/godot-mono/modules/mono/csharp_script.cpp:1370)
Object::_instance_binding_reference(bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/object.h:670)
RefCounted::unreference() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.cpp:89)
Ref<InputEvent>::unref() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.h:209)
Ref<InputEvent>::~Ref() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/object/ref_counted.h:222)
HashSet<Ref<InputEvent>, HashMapHasherDefault, HashMapComparatorDefault<Ref<InputEvent>>>::clear() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/templates/hash_set.h:248)
HashSet<Ref<InputEvent>, HashMapHasherDefault, HashMapComparatorDefault<Ref<InputEvent>>>::~HashSet() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/templates/hash_set.h:465)
Input::~Input() (/home/naelstrof/projects/godot-mono/src/godot-mono/core/input/input.cpp:1623)
void memdelete<Input>(Input*) (/home/naelstrof/projects/godot-mono/src/godot-mono/core/os/memory.h:109)
Main::cleanup(bool) (/home/naelstrof/projects/godot-mono/src/godot-mono/main/main.cpp:3739)
main (/home/naelstrof/projects/godot-mono/src/godot-mono/platform/linuxbsd/godot_linuxbsd.cpp:76)
___lldb_unnamed_symbol3190 (@___lldb_unnamed_symbol3190:29)
__libc_start_main (@__libc_start_main:44)

From here it just permanently hangs at "Dumping the backtrace", taking 100% CPU and being unresponsive. I'm figuring the exception handler is just looping-- my debugger can't follow it, something causes my debugger to detach when it starts looping.

image

Happens on stable as well, I think during cleanup, some of the machinery that the exception handler uses is cleaned up already so it fails to give a stack trace. I'm not sure though.

Forces me to close it via a TERM or KILL signal via my task manager, which is very hurtful to my iteration speeds.

Steps to reproduce

Repeatedly open the project while mashing the Esc key, it will hang when it fails.
The crash was surprisingly consistent, yet random. Sometimes taking 10 tries or so.

I couldn't find a 100% consistent way to crash. I also couldn't get the crash to happen on a blank project, so I must be doing something strange in the project that I've included. I shrunk it down the best I could.

Minimal reproduction project

It was 45mb, so it didn't fit on github, here's a few mirrors:

https://drive.google.com/file/d/1OIefwglUPwOiieQhzDBxWgs6vwCWJ-QS/view?usp=sharing

https://cdn.discordapp.com/attachments/437504631577509899/1154695790212689971/MRP.zip

@naelstrof
Copy link
Contributor Author

naelstrof commented Sep 22, 2023

It appears as though the second exception is because the ->execute virtual function is pointing to reclaimed memory space during the crash. This explains the randomness of the crash as well, since reclaimed memory can still be owned by the process by chance.

I think it's due to the OS singleton running a variety of cleanup commands before the crash happens.

Edit: I've learned it's actually because the backtrace method godot uses is not signal-safe. Signals are asynchronous and cause virtually all interactions with existing main-memory objects to be undefined.

The freeze is caused by the non-signal-safe code causing a secondary fatal signal during crash handling. More details below.

@Calinou
Copy link
Member

Calinou commented Sep 22, 2023

As a workaround, you can disable the crash handler by running Godot with the --disable-crash-handler command line argument.

@naelstrof
Copy link
Contributor Author

I've learned that signal is heavily implementation specific. According to the cppreference docs:

When signal handler is set to a function and a signal occurs, it is implementation defined whether signal(sig, SIG_DFL) will be executed immediately before the start of signal handler. Also, the implementation can prevent some implementation-defined set of signals from occurring while the signal handler runs.

This means that we won't know if further exceptions during crash handling will be ignored, cause loops, lockups, or crashes since it's completely implementation specific.

It also states

POSIX recommends sigaction instead of signal, due to its underspecified behavior and significant implementation variations, regarding signal delivery while a signal handler is executed.

I'm not sure if sigaction is a good fit for godot, since godot is multi-platform and sigaction is not. And it's nice keeping the crash handlers somewhat similar in design when possible.

So now for my proposed solution: a very simple fix is to enforce the behavior of signal by starting the crash handler with

static void handle_crash(int sig) {
    signal(SIGSEGV, SIG_DFL);
    signal(SIGFPE,  SIG_DFL);
    signal(SIGILL,  SIG_DFL);
...

This prevents further fatal errors from initiating patty-cake with the kernel for most implementations. The trade-off being that we still won't know if the implementation is currently ignoring any of the signals.

I think making it work for all implementations would require a reassessment for using sigaction.

Regardless, this change should make the crash handlers more sane without being complicated. Tested it and it works okay:
image

Pull request to follow soon.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants