-
-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix some race conditions that happen during initialization #73793
Conversation
@@ -50,8 +50,8 @@ void Thread::_set_platform_functions(const PlatformFunctions &p_functions) { | |||
platform_functions = p_functions; | |||
} | |||
|
|||
void Thread::callback(Thread *p_self, const Settings &p_settings, Callback p_callback, void *p_userdata) { | |||
Thread::caller_id = _thread_id_hash(p_self->thread.get_id()); | |||
void Thread::callback(ID p_caller_id, const Settings &p_settings, Callback p_callback, void *p_userdata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious about why this is needed. I mean, if the pointer was only needed for getting the id, this new version seems to be safer. However, in practice Thread
object shouldn't be destroyed before the thread ends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a call to thread.swap(new_thread)
right after new_thread
gets created, so in the old version, it's ambiguous whether p_self->thread
refers to the old thread
or the just created new_thread
.
Threading is hard 🥲
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides swap
, Thread
technically can be destroyed (and will detach thread in this case) before the thread is finished. So it is safer.
@@ -75,7 +75,7 @@ struct _IP_ResolverPrivate { | |||
Semaphore sem; | |||
|
|||
Thread thread; | |||
bool thread_abort = false; | |||
SafeFlag thread_abort; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The variable is synced with the semaphore. However, I see now that a data race is still possible, if bool
doesn't provide atomicity guarantees across platforms. Therefore, thinking about this (and similar cases), my abandoned RelaxedFlag
project is relevant (#63189). That one asks the implementation for atomicity, but without acquire-release semantics, which the semaphore is already providing.
Thoughts?
P.S.: What was the sanitizer complaint, by the way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It was a standard read/write complaint, it seems like all variable accesses (even to scalars like int
and bool
) are considered data races unless they're explicitly specified as atomic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing it is an IP hostname resolver, it's not performance critical and any overhead for SafeFlag
going to be negligible compared to networking.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about this along the lines of having an idiom for this kind of cases; namely, every time there's a semaphore-paced loop with an exit flag, just use the atomic-but-relaxed version of the flag, without having to pay further attention to the performance considerations.
That said, that can be discussed separately. These changes already have their merit as they are now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlikely any of these will cause a real issue, but changes seems good.
@@ -75,7 +75,7 @@ struct _IP_ResolverPrivate { | |||
Semaphore sem; | |||
|
|||
Thread thread; | |||
bool thread_abort = false; | |||
SafeFlag thread_abort; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Providing it is an IP hostname resolver, it's not performance critical and any overhead for SafeFlag
going to be negligible compared to networking.
@@ -50,8 +50,8 @@ void Thread::_set_platform_functions(const PlatformFunctions &p_functions) { | |||
platform_functions = p_functions; | |||
} | |||
|
|||
void Thread::callback(Thread *p_self, const Settings &p_settings, Callback p_callback, void *p_userdata) { | |||
Thread::caller_id = _thread_id_hash(p_self->thread.get_id()); | |||
void Thread::callback(ID p_caller_id, const Settings &p_settings, Callback p_callback, void *p_userdata) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Besides swap
, Thread
technically can be destroyed (and will detach thread in this case) before the thread is finished. So it is safer.
This should be the last one involving initialization. Should I open a new PR or add it to this one? diff --git a/core/os/thread.cpp b/core/os/thread.cpp
index 92865576f3..d533b1c1e2 100644
--- a/core/os/thread.cpp
+++ b/core/os/thread.cpp
@@ -58,13 +58,19 @@ void Thread::callback(ID p_caller_id, const Settings &p_settings, Callback p_cal
if (platform_functions.init) {
platform_functions.init();
}
- ScriptServer::thread_enter(); // Scripts may need to attach a stack.
+
+ // Currently causes a race condition due to the fact that threads are
+ // spawned before ScriptServer is even initialized.
+ // This is currently unused since neither GDScript nor C# require
+ // attaching a stack to new threads, so we just disable for now.
+
+ //ScriptServer::thread_enter();
if (platform_functions.wrapper) {
platform_functions.wrapper(p_callback, p_userdata);
} else {
p_callback(p_userdata);
}
- ScriptServer::thread_exit();
+ //ScriptServer::thread_exit();
if (platform_functions.term) {
platform_functions.term();
}
All of the possible workarounds for this one get ugly really fast since In the meantime, disabling |
If no scripting language is using thread enter/exit callbacks (and there are no near future plans for it), I'd even remove them. However, I'm a bit unsure about the needs that language extensions developed separately may have (JS, Nim, Rust, whatever). I am wondering if maybe we could somehow apply a compromise solution that patches the current situation to keep it functional and non-racey, even if somewhat hacky. I'm considering something like this:
|
Changing Unfortunately, it wouldn't help the main victim of this race condition, I don't know if there's any decision that can be taken here that doesn't leave something broken, including just leaving it as-is 🤷 |
The ScriptServer workaround has been split off into #74501 |
Thanks! |
Cherry-picked for 4.0.2. |
Needed for #73777