Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix crash when generating thumbnails for certain texture types #91768

Closed
wants to merge 1 commit into from

Conversation

mihe
Copy link
Contributor

@mihe mihe commented May 9, 2024

Fixes #58818.
Fixes #69861.
Fixes #70974.

This is unlikely to be an acceptable fix for this, but I figured this could maybe start a conversation about what else can be done.

I'm able to reproduce the above linked issue in a fairly big project, even on latest master, which is c4279fe as of writing this. The repro is unfortunately not consistent at all. Sometimes I can start the editor dozens of time without any issue, with or without an existing thumbnail cache, whereas other times the crash can happen several times in a row. Everything seems to point towards a multi-threaded race condition of some sort.

The symptoms seem pretty straight-forward to me. The queued update of these texture types (although there may be more resource types like this) are sometimes called after the instance has been deallocated. I can say with 100% certainty that the instance has indeed been deallocated at that point, but I can't say exactly what allocated it, nor what deallocated it. Any attempts at logging stuff seems to add enough stalls that the race condition doesn't manifest itself anymore.

Seeing as how the ObjectID check in CallableCustomMethodPointer::call (seen here) should prohibit this from ever happening as long as everything's done on a single thread, it stands to reason that this is happening because another thread (like EditorResourcePreviewGenerator) is allocating these resources, thus scheduling the deferred call, and then happens to deallocate them right as CallableCustomMethodPointer::call is being invoked, after the above mentioned ObjectID check. The fact that this issue can also be reproduced by hovering over an editor tab seems to imply that EditorResourcePreviewGenerator is indeed the culprit, presumably EditorResourcePreviewGenerator::generate_from_path.

This PR fixes this in a somewhat brute-force manner, by simply allowing the resource instances to always outlive their deferred call, by bumping their reference counter for the duration of it. I have been unable to reproduce the crash with this PR applied.

If my diagnosis seems incorrect, or if there's a better fix, I'd love to know.

(I suppose it's also possible that this fix instead leads to memory leaks at shutdown if for some reason the message queue isn't flushed after one of these texture updates is queued up.)

@AThousandShips
Copy link
Member

I'm not sure about the safety of doing forced reference, it seems like a hack

@AThousandShips AThousandShips requested a review from a team May 9, 2024 14:57
@mihe
Copy link
Contributor Author

mihe commented May 9, 2024

It's most definitely a hack. I was hoping somebody could chime in with a better solution.

@AThousandShips
Copy link
Member

I suspect this is caused by the generation not doing proper thread safety, the freeing of the texture shouldn't happen on a thread, so something should be done about that side instead I'd say

@mihe
Copy link
Contributor Author

mihe commented May 9, 2024

As far as I can tell the allocation/deallocation is likely happening here:

Ref<Texture2D> EditorResourcePreviewGenerator::generate_from_path(const String &p_path, const Size2 &p_size, Dictionary &p_metadata) const {
Ref<Texture2D> preview;
if (GDVIRTUAL_CALL(_generate_from_path, p_path, p_size, p_metadata, preview)) {
return preview;
}
Ref<Resource> res = ResourceLoader::load(p_path);
if (!res.is_valid()) {
return res;
}
return generate(res, p_size, p_metadata);
}

When that res reaches end-of-scope it prompts the deallocation, resulting in the potential race condition with CallableCustomMethodPointer::call.

@AThousandShips
Copy link
Member

AThousandShips commented May 9, 2024

Then that should probably be called on the main thread, loading shouldn't be done on a thread anyway AFAIK

@mihe
Copy link
Contributor Author

mihe commented May 9, 2024

If my diagnosis is correct then I'd assume that this has the ability to manifest itself in the actual running game/application as well, when doing something like background loading using ResourceLoader.load_threaded_request, so I'm not sure purely addressing EditorResourcePreviewGenerator is the right move.

But again, this hinges on me not having misunderstood the actual cause for this crash.

@mihe
Copy link
Contributor Author

mihe commented May 9, 2024

Although, I suppose ResourceLoader.load_threaded_request wouldn't ever cause a deallocation, because ResourceLoader would hang on to the resource until you call ResourceLoader.load_threaded_get. So the issue is less about the threaded loading and more about doing the deallocation on the thread I guess.

@AThousandShips
Copy link
Member

Using load directly on a thread isn't supported afaik, you need to use the threaded versions, hence them being there, can you see if it works if the operation itself is not done on a thread to to make sure it doesn't load on a thread

@RandomShaper
Copy link
Member

The issue is similar to what I'm trying to address in #91630. Certain resource types do a deferred call as some final readyness step (such as generating the underlying shader of a materials or generating the data for a procedural texture). The issue is that the way ResourceLoader currently works, it accumulates the deferred calls and it eventually transfers them to the main call queue, so they happen on the main thread, where they should be safer to happen.

In this case, it's fine that the resource previewer loads the resource from its thread, the problem being a race condition in the aforementioned mechanism (still not sure exactly what). In order to provide an universal fix for this so the resource previewer doesn't need to apply any ad hoc workaround, we could just try to let all those deferred updates happen on the thread that did the loading. We could, for safety, restrict it for now to certain resource types known to be safe in that regard, via some getter. That said, I have yet to reason about the implications of such an approach in many-threaded loads.

A simpler fix for now would be to the resource previewer wait untill the main call queue is flushed before going on with the generation of the preview.

@RandomShaper
Copy link
Member

Using load directly on a thread isn't supported afaik, you need to use the threaded versions, hence them being there, can you see if it works if the operation itself is not done on a thread to to make sure it doesn't load on a thread

Currently, ResourceLoader doesn't care if the thread performing the load is the main thread, an arbitrary user thread or any thread spawned by itself.

@mihe
Copy link
Contributor Author

mihe commented May 9, 2024

A simpler fix for now would be to the resource previewer wait untill the main call queue is flushed before going on with the generation of the preview.

What would this look like exactly? I'm assuming you would need to block in EditorResourcePreviewGenerator::generate_from_path until some call_deferred flips some synchronization variable?

@mihe
Copy link
Contributor Author

mihe commented May 9, 2024

There's also potentially the issue that you can register your own custom preview generators in "user space", using EditorResourcePreview.add_preview_generator, which will similarly run on this dedicated generation thread, and which presumably can do the exact same thing of ad-hoc loading a resource and discard it immediately.

@RandomShaper
Copy link
Member

Something I stated here (or in private discussion) was inaccurate: #91630 will also affect these resources because the resource loader from now on will flush its own call queue in-thread instead of transferring to the main one. That means that the procedural textures will be generated as part of loading them, which should avoid the race condition.

@mihe Can you test if the issue is still a thing in current master, now #91630 has been merged?

@mihe
Copy link
Contributor Author

mihe commented May 13, 2024

@RandomShaper Does this apply even though EditorResourcePreview uses ResourceLoader::load rather than ResourceLoader::load_threaded_request? If I'm reading the code right your change would only apply to threads with MessageQueue::set_thread_singleton_override, which only seems to apply to the resource loader threads.

Maybe setting an override like that for the EditorResourcePreview thread would be an appropriate follow-up PR if that's the case, or perhaps switching over to ResourceLoader::load_threaded_request?

@mihe
Copy link
Contributor Author

mihe commented May 13, 2024

Ah, nevermind, I misread the code it seems. I see now that it calls set_thread_singleton_override for any non-main thread doing resource loading. That should indeed resolve the race condition. I'm clearly seeing it running the deferred call on the EditorResourcePreview thread as part of the loading, and I can't seem to reproduce the crash when trying out your PR.

I'll go ahead and close this then. Thank you!

@mihe mihe closed this May 13, 2024
@mihe mihe deleted the texture-update-crash branch May 13, 2024 19:55
@akien-mga akien-mga removed this from the 4.3 milestone May 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
4 participants