Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Workaround resource loading crashes due to buggy TLS #78977

Merged
merged 1 commit into from
Jul 3, 2023

Conversation

RandomShaper
Copy link
Member

My current diagnosis of the issue is that MinGW's TLS implementation is causing a double-free of the load_paths_stack vector, with the second attempt happening when the memory is already tampered and therefore it's not just as nice as seeing the allocation pointer is already null, which would cause no crash.

Ideally, this would be a temporary fix.

Workarounds #78734.

@RandomShaper RandomShaper added this to the 4.1 milestone Jul 3, 2023
@RandomShaper RandomShaper requested a review from a team as a code owner July 3, 2023 11:57
@@ -296,8 +296,10 @@ void ResourceLoader::_thread_load_function(void *p_userdata) {
// Thread-safe either if it's the current thread or a brand new one.
CallQueue *mq_override = nullptr;
if (load_nesting == 0) {
load_paths_stack = memnew(Vector<String>);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this seems to be the main potential behavior change here, where a new stack will be allocated each time this function is called.

For my peace of mind, can you confirm whether this function is only called once for each ResourceLoader instance, so it's equivalent to when the vector was a member of it? I.e. there's no risk of losing a previous load path when de-allocating this at the end of the function, compared to the previous behavior where it would subsist as class data?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It will be allocated per outermost stack frame, per resource load thread. The overhead of that is not ideal, but it's minimal. In any case, the data block of the vector will have to be allocated and resized normally, so this is just adding one additional allocation to the mix.

Regarding the persistence of data, I could have added a DEV_ASSERT(load_paths_stack.size()) == 0 to this function before this PR. Therefore, there no loss of data.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In any case, this is a shot in the dark, I would need a production-like build to be kind of sure it does the trick. The other idea of making TLS data compilation-unit local may be better. I will make another PR and hopefully you can make builds with both ideas?

@akien-mga
Copy link
Member

As pointed out on RC by @bitsawer, this has the same pattern as a Windows + LTS + MinGW GCC/Clang issue observed in the past, such as #47368 fixed by @hpvb. Moving the thread_local to the .cpp might solve the issue too in a cleaner way (and if so, we should re-assess all thread_local usage in headers and make sure we prevent against those in the future).

@akien-mga
Copy link
Member

Potential alternative (untested):

diff --git a/core/io/resource_loader.cpp b/core/io/resource_loader.cpp
index 4e001efcf8..b6aff109a6 100644
--- a/core/io/resource_loader.cpp
+++ b/core/io/resource_loader.cpp
@@ -202,6 +202,10 @@ void ResourceFormatLoader::_bind_methods() {
 
 ///////////////////////////////////
 
+static thread_local int load_nesting = 0;
+static thread_local WorkerThreadPool::TaskID caller_task_id = 0;
+static thread_local Vector<String> load_paths_stack;
+
 // This should be robust enough to be called redundantly without issues.
 void ResourceLoader::LoadToken::clear() {
 	thread_load_mutex.lock();
@@ -401,6 +405,10 @@ Error ResourceLoader::load_threaded_request(const String &p_path, const String &
 	}
 }
 
+bool ResourceLoader::is_within_load() {
+	return load_nesting > 0;
+}
+
 Ref<Resource> ResourceLoader::load(const String &p_path, const String &p_type_hint, ResourceFormatLoader::CacheMode p_cache_mode, Error *r_error) {
 	if (r_error) {
 		*r_error = OK;
@@ -1168,10 +1176,6 @@ bool ResourceLoader::create_missing_resources_if_class_unavailable = false;
 bool ResourceLoader::abort_on_missing_resource = true;
 bool ResourceLoader::timestamp_on_load = false;
 
-thread_local int ResourceLoader::load_nesting = 0;
-thread_local WorkerThreadPool::TaskID ResourceLoader::caller_task_id = 0;
-thread_local Vector<String> ResourceLoader::load_paths_stack;
-
 template <>
 thread_local uint32_t SafeBinaryMutex<ResourceLoader::BINARY_MUTEX_TAG>::count = 0;
 SafeBinaryMutex<ResourceLoader::BINARY_MUTEX_TAG> ResourceLoader::thread_load_mutex;
diff --git a/core/io/resource_loader.h b/core/io/resource_loader.h
index 592befb603..febd178eb3 100644
--- a/core/io/resource_loader.h
+++ b/core/io/resource_loader.h
@@ -180,9 +180,6 @@ private:
 
 	static void _thread_load_function(void *p_userdata);
 
-	static thread_local int load_nesting;
-	static thread_local WorkerThreadPool::TaskID caller_task_id;
-	static thread_local Vector<String> load_paths_stack;
 	static SafeBinaryMutex<BINARY_MUTEX_TAG> thread_load_mutex;
 	static HashMap<String, ThreadLoadTask> thread_load_tasks;
 	static bool cleaning_tasks;
@@ -196,7 +193,7 @@ public:
 	static ThreadLoadStatus load_threaded_get_status(const String &p_path, float *r_progress = nullptr);
 	static Ref<Resource> load_threaded_get(const String &p_path, Error *r_error = nullptr);
 
-	static bool is_within_load() { return load_nesting > 0; };
+	static bool is_within_load();
 
 	static Ref<Resource> load(const String &p_path, const String &p_type_hint = "", ResourceFormatLoader::CacheMode p_cache_mode = ResourceFormatLoader::CACHE_MODE_REUSE, Error *r_error = nullptr);
 	static bool exists(const String &p_path, const String &p_type_hint = "");

@akien-mga
Copy link
Member

Here's a build for the original commit in this PR (41c0785): https://downloads.tuxfamily.org/godotengine/testing/Godot_v4.1-rc_pr78977_original_win64.x86_64.zip

And here's a build for the proposed alternative moving the thread_locals to the .cpp: https://downloads.tuxfamily.org/godotengine/testing/Godot_v4.1-rc_pr78977_alternative_win64.x86_64.zip

Copy link
Member

@akien-mga akien-mga left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did further testing with @RandomShaper on #core and confirmed that the original approach in this PR fixes the issue. The alternative I suggested didn't.

@akien-mga akien-mga merged commit cdd2313 into godotengine:master Jul 3, 2023
@akien-mga
Copy link
Member

Thanks!

Comment on lines +362 to +365
if (mq_override) {
memdelete(mq_override);
}
memdelete(load_paths_stack);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might be good practice to set pointers to nullptr after calling memdelete to reduce risk of a dangling pointer being inadvertently referenced.

That said, I assume there must already be some sort of guarantee that these variables are not referenced outside of threaded load, so it's not something to change for 4.1 ... just an observation. (I'm not too familiar with this code)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. At that point the thread is about to die and no one could access the variable anymore. That said, it is when assumptions fail to be true for whatever reason (code evolution, etc.) when one would have wished to have put guidelines like set-to-null-after-free in practice without further consideration.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants