Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash calling function on null pointer of already freed GDExtension #98182

Open
TokisanGames opened this issue Oct 14, 2024 · 2 comments
Open

Comments

@TokisanGames
Copy link
Contributor

TokisanGames commented Oct 14, 2024

Tested versions

4.2.2, 4.3-stable, 4.3 6699ae7
Master untested but the problem is visible in the code.

godot-cpp 4.2-cherrypicks-7

System information

Windows 11/64, RTX 3070, Vulkan

Issue description

I believe there are two bugs here:

  1. Somewhere in gdextension Dictionary[String] -> TypedArray<GDextensionResource> it isn't properly freeing the GDextResource when it is reassigned to a new dictionary pointer with dict = Dictionary(); That's triggering the engine to go down an unexpected code path on cleanup triggering the second bug:

  2. In the engine, ObjectDB::cleanup() code and Resource::GDCLASS macro assume _extension is a null pointer. However under the above circumstances and maybe others this assumption is wrong and the engine attempts to call is_class("Node") on a gdextension resource where the extension has already been freed. The function call on a null pointer causes a crash.

The second problem is here in this sequence and call stack:

register_core_types.cpp::unregister_core_types() {
	...
	memdelete(gdextension_manager);
	...
	ObjectDB::cleanup(); // crash
}

ObjectDB::cleanup() {
	...
	if (OS::get_singleton()->is_stdout_verbose()) {
		...
		if (obj->is_class("Node")) // crash
}

Resource::GDClass macro extended inline
	virtual bool is_class(const String &p_class) const override {                                                                                
		if (_get_extension() && 
                         _get_extension()->is_class(p_class)) {      // crash
			return true;                                                                                                                         
		}                                                                                                                                        
		return (p_class == (#m_class)) ? true : m_inherits::is_class(p_class);
	}      

In the Resource macro, _get_extension() returns a non-null pointer to already freed memory. It was freed in unregister_core_types(). Since it's freed, it's very difficult to figure out what object was being queried, but with other debugging I've determined this situation is a Terrain3DResource. So, Godot freed the library, _extension is not-null and invalid, then the Resource macro attempts to call a function from the null pointer resulting in the crash.

Deleting gdextensions before everything is shutdown has caused more than one issue. Related #95310. The fix for this didn't solve the fundamental problem.

devenv_2IN0PfTyms

devenv_uiohYSv8rK


Here's more information about this. It may be a gdextension bug.
Using godot-cpp 4.2-cherrypicks-7

  • The object it's crashing on querying is a Terrain3DRegion, which is just a basic custom resource with only native data types. A region stores our data and image maps for a geographical section of the terrain. https://github.com/TokisanGames/Terrain3D/blob/main/src/terrain_3d_region.h
  • The engine crashes in this code block https://github.com/godotengine/godot/blob/4.3/core/object/object.cpp#L2287-L2305
  • If --verbose is specified (but also when I run it in my msvc debugger) this code runs. Terrain3D has already been unloaded at this point. The Terrain3DRegion object still has a pointer to it in _extension. is_class() runs the Resource::GDCLASS macro which calls a function from the null pointer.
  • Of note, whichever objects that are triggered in these conditions, object_slots[i].validator is true. The instances haven't been cleared yet. Not all of my gdextension resources or even Terrain3DRegions fall under these conditions. I can load up regions from disk and not have an issue. The conditions are set only once we start editing data, which means region duplicates, EditorUndoRedoManager and so on.
  • I have cleared the undo history on exit before the offending code runs but it still crashes.
  • Testing each component, I've determined the cause for the conditions comes down to this:
void Terrain3DEditor::start_operation(const Vector3 &p_global_position) {
	//_undo_data = Dictionary(); // ** causes latent crash **
	_undo_data.clear(); // workaround

We've been using the first line to get a new pointer and make our backup work properly. This alone sets up the conditions for a crash. If I replace the first with the second line it no longer crashes.

We also have this other bit of code where _undo_data is used that can also be adjusted to prevent the crash:

void Terrain3DEditor::_store_undo() {
	...
	Dictionary redo_data;
	_undo_data["edited_regions"] = _original_regions; // ** causes latent crash **
	redo_data["edited_regions"] = _edited_regions;

I can return right before the _undo_data line or comment it out and it won't crash. If I return right after or leave it uncommented, it will crash.

_original_regions and _edited_regions are both TypedArray<Terrain3DRegion>.

_undo_data and redo_data are both Dictionaries. They store essentially the same type of data. The only fundamental differences between them are the first one is a class member, and it gets reset by assignment.

  • I was able to fix the problem on our end by using _undo_data.clear() in the first block. And use _undo_data.duplicate() in the second block where we pass the data to the undo/redo manager.

Steps to reproduce

  • Download the artifact below to a new project.
  • Open and restart twice.
  • Run the demo in a debug engine build or in a debugger.
  • Click Terrain3D, then the foliage brush and start painting. Cover a few 32x32 areas. Keep it loaded at least 5 seconds.

On quit it should crash in Resource, on the GDClass macro.

Minimal reproduction project (MRP)

Artifact https://github.com/TokisanGames/Terrain3D/actions/runs/11317731831

@TokisanGames TokisanGames changed the title Crash on close w/ GDExtension Resource Crash on close calling null pointer on already freed GDExtension Oct 19, 2024
@TokisanGames TokisanGames changed the title Crash on close calling null pointer on already freed GDExtension Crash calling function on null pointer of already freed GDExtension Oct 19, 2024
@TokisanGames
Copy link
Contributor Author

I've done further research to narrow the cause for the conditions down. OP is updated.

Undoubtedly there is a bug in the Resource::GDCLASS macro and ObjectDB::cleanup() since it doesn't check for a null pointer before calling it on Object::_extension.

However, I suspect this is an unexpected code path. It's probably expected that gdextension resources have already been freed by that point and any considered objects should have _extension== nullptr. But a second bug has created a condition that triggers this path.

I wonder if the second bug is in GDExtension (godot-cpp 4.2-cherrypicks-7), since there have been problems with Dictionaries and TypedArrays in the past and what we're doing is likely a less common operation.

cc: @dsnopek

@dsnopek
Copy link
Contributor

dsnopek commented Nov 4, 2024

Yeah, I'm not sure there is a general fix we can make here. Maybe we could add a check to see if the engine is already at a certain point of shutting down before trying to use the pointer from _get_extension()?

But we should definitely figure out why there is still an existing resource way after its GDExtension has been cleaned up. It sounds like your saying that in your investigation there's a Dictionary which contains an Array, and when it has no references anymore and should get cleaned up, it just doesn't?

This sounds a lot like issue godotengine/godot-cpp#1240 which we theoretically fixed in PR godotengine/godot-cpp#1379 - but that's been merged for some time. Can you see if you have that fix in your version of godot-cpp? I see it in the 4.3 branch, but not 4.2, which it sounds like you're using? and 4.2 branch.

EDIT: I searched the 4.2 branch badly :-) I see that PR merged in there too, so it probably isn't due to a lack of that fix

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants