Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crash without feedback when attempting to remove a freed child. #77053

Open
carllacan opened this issue May 14, 2023 · 13 comments
Open

Crash without feedback when attempting to remove a freed child. #77053

carllacan opened this issue May 14, 2023 · 13 comments

Comments

@carllacan
Copy link

carllacan commented May 14, 2023

Godot version

v4.0.2.stable.official [7a0977c]

System information

Manjaro Linux KDE
Plasma Version: 5.27.4 KDE
Frameworks Version: 5.105.0
Qt Version: 5.15.9
Kernel Version: 5.15.109-1-MANJARO (64-bit)

Issue description

If a variable pointing to a previously freed object is used as the argument of .remove_child the game crashes without any feedback. Obviously it is an error to remove a freed node, but there should be feedback so that the user can fix their mistake.

Interestingly the game does not crash if the node is refered to using its $ path. That is, in the following snippet (included in the minimal reproduction project):

	await get_tree().create_timer(1).timeout
	print("Freeing first child...")
	$Node2D.queue_free()
	await get_tree().create_timer(1).timeout
	print("Removing first child...")
	remove_child($Node2D)
	
	await get_tree().create_timer(1).timeout
	print("Freeing second child...")
	node2.queue_free()
	await get_tree().create_timer(1).timeout
	print("Removing second child...")
	remove_child(node2)

the first remove_child causes no problems, but the second one makes the game crash even before "Removing second child" is printed.

Steps to reproduce

  1. Create a variable.
  2. Assign a node to it.
  3. Add the node to another node as a child.
  4. Free the child node.
  5. Try to remove the child node from the parent.

Minimal reproduction project

Waits are introduced between each action so that the bug is easier to observe, and so that it is clear that the actions do not have to happen in the same frame.

Godot.zip

@AThousandShips
Copy link
Member

It's not exactly straightforward to detect and warn about this issue though I suspect

@AThousandShips
Copy link
Member

AThousandShips commented May 14, 2023

It doesn't happen with $path because it has separate lookup and doesn't expect the reference to be valid

There's an overhead to consider when enforcing checks to if the reference is valid, the difference between just using a pointer expected to be valid, or doing a lookup for an id and checking, which more importantly also does thread safety which means a bottleneck and a slowdown

@AThousandShips
Copy link
Member

AThousandShips commented May 14, 2023

If you need to be sure an instance is valid use is_instance_valid, but I believe having this check be done automatically every time instead of assuming validity (i.e. user correctness) would be a major performance drop

@KhalilBenGaied
Copy link

i proposing a solution following the logic of the issue, In the following script, we add a check before removing the child to see if it's still in the scene tree using is_inside_tree(). If the node is not in the tree, we print a message and do not attempt to remove it. This way, you can avoid crashes even if you try to remove a freed node.
the script:
extends Node

var node2

func _ready():
node2 = $Node2D # Assuming you have a Node2D as child

remove_node_after_delay(node2)

func remove_node_after_delay(node):
yield(get_tree().create_timer(1.0), "timeout")
print("Freeing child...")
node.queue_free()

yield(get_tree().create_timer(1.0), "timeout")
print("Removing child...")
if node and node.is_inside_tree():
    remove_child(node)
else:
    print("Cannot remove child, it is either already removed or was never added.")

please verify it then provide me with a feedback to check if its a viable solution or not.

@Calinou
Copy link
Member

Calinou commented May 15, 2023

See also #61906, which should improve the existing situation for any kind of engine crash.

@AThousandShips
Copy link
Member

AThousandShips commented May 15, 2023

@KhalilBenGaied this is not valid 4.0 code, and won't solve the issue on non-debug builds, see above for the way to detect freed objects

node and node.is_inside_tree()

This won't detect if the node has been freed, and should crash on non-debug builds

@AThousandShips
Copy link
Member

AThousandShips commented May 15, 2023

The way to deal with this safely comes in as I can see it three ways:

  1. Make sure to clear any references to objects when you free them, so after doing node2.queue_free() you also do node2 = null, this is recommended I'd say as when you decide to free an object you should be done using it, and indicating that by clearing the variable helps
  2. Use $name or ObjectID in cases where you might be freeing objects and aren't sure you can keep track of if they have been freed, the former will only work if the node is in the same hierarchy as the script called from, the latter always works
  3. Use is_instance_valid to check if a reference is valid, this is the most secure of the three

The issue here, as is the appropriate way to approach things especially in non-debug builds, is that we assume that references are valid when passed to functions, as there is an overhead to checking this, and just assuming the user is knowing what they are doing and not passing invalid references (there is a lot of checks that the references aren't null, but if they aren't null they are assumed to be valid). This is the normal way of doing things (c++ does it for one) as you'd otherwise do these checks all the time and it'd cost you a lot of performance when in most cases you don't need it

Now we could add checking for variant validity when passing references as arguments in GDScript, it's quite easy to do, but I think it'd add quite a bit of performance loss, as each passed Object would have to be checked and that involves locking a spinlock and some checking that I don't expect to be cheap, GDScript does already check for the base of a call, like when doing node.function(), but that is a much smaller number of checks than doing it for every argument

@AThousandShips
Copy link
Member

But to summarise, other than making crashing more informative, this isn't a bug, this is an issue of insufficient user checking and variable management

@graydwarf
Copy link

graydwarf commented May 20, 2023

I'm having a similar issue where remove_child(), queue_free() or even .visible = false, inside a thread on a valid object is crashing intermittently without any debug/verbose/logs. It crashes about 1 in every few times. Sometimes taking upwards of 10 calls which is no fun. Since we can't seem to debug threaded operations, I can't figure out what's causing it.

Scenario: I create and add a simple child scene (busyBackground) (has a label and simple AnimationPlayer) before starting the thread and at the end of the thread, I try to remove the busyBackground and it crashes (sometimes).

I tried to get a reduced repro but I'm not able to repro outside of my main project.

I added is_instance_valid() check to make sure the child I'm removing is valid and it is.

I tried moving the instantiation of the busyBackground into the thread and no change. I tried emitting a global signal to call the ClearBusyBackground() and it's not working at all. The signal seems to be ignored from the thread. Not clear to me how that should work. I tried changing from a global signal to a local signal and I'm back to crashing (so local signals work from threads while global signals don't?).

  • I tried removing the remove_child() call and using just _busyBackground.queue_free() and it crashes.
  • I tried moving the queue_free() call into the _busyBackground and it still crashes. _busyBackground.CleanUp()
  • I tried adding an await(5) to the CleanUp function and we wait the extra 5 seconds and then still crash.
  • I tried adding a timer to the CleanUp call which then calls queue_free() and I'm now up to 10 attempts with no crashes. I'm going to throw this on a timed loop and see if we can get 100 loops in a row without crashing... and it crashes.
  • I tried calling ClearBusyBackground() inside a 2nd threaded call and it crashes.
  • I tried setting _busyBackground.visible to false as a workaround and it crashed so it seems to be any sort of engagement with that object which means my issue moves a bit further away from op.
  • I tried _busyBackground.call_deferred("queue_free") and it crashes.
  • I tried call_deferred("add_child", _busyBackground) in addition and still crashes.
  • I tried adding call_deferred on a bunch of things and crash remains.
  • I've seen it lock up a couple times. Very rare but instead of crashing and closing, it just looks locked up.
  • I see this warning and haven't made sense of it yet and whether it's related or not. W 0:00:21:0026 ~Thread: A Thread object has been destroyed without wait_to_finish() having been called on it. Please do so to ensure correct cleanup of the thread.
    <C++ Source> core/os/thread.cpp:116 @ ~Thread(). None of my other threads complain about this and I use/instantiate them in the same way.
var _busyBackground 

func ExportProject():
	StartBusyBackground("Exporting...")
	var thread = Thread.new()
	thread.start(MyFunThreadedFunctionThatHasSomeStuffGoingOn)

func MyFunThreadedFunctionThatHasSomeStuffGoingOn():
        # Do bunch of validations
	# Make some directories
        # OS.execute godot in --headless
        # update some fields with the output
	CountErrors() # from the fields
	CountWarnings() # from the fields
	ClearBusyBackground()

# Shows a busy overlay over the UI so user doesn't mess with stuff while we're executing commands
func StartBusyBackground(busyDoingWhat):
	_busyBackground = load("res://scenes/scenes/busy-background-blocker/busy_background_blocker_color_rect.tscn").instantiate()
	add_child(_busyBackground)
	_busyBackground.SetBusyDoingWhatLabel(busyDoingWhat)
	
func ClearBusyBackground():
	if is_instance_valid(_busyBackground):
		remove_child(_busyBackground)
		_busyBackground.queue_free()
	else:
		pass

@AThousandShips
Copy link
Member

You are adding and removing nodes from a thread, try using deferred calls instead

@graydwarf
Copy link

graydwarf commented May 20, 2023

@AThousandShips I gave that a try. Didn't seem to change anything. I added call_deferred to a bunch of things in and around that problem space and nothing changed. Continued to crash.

@graydwarf
Copy link

@AThousandShips Actually, I have to take that back. One of my looping runs completed so digging into it some more.

@graydwarf
Copy link

So, this is fun. Removing either of these lines prevents the crash.

	var exitCode = OS.execute(_godotPathLineEdit.text, args, output, readStdeer, openConsole) 
	_outputTextEdit.text += "#Output: " + str(output).replace("\\r\\n", "\n")

Moving the str().replace to it's own line and not using it prevents the crash

	var exitCode = OS.execute(_godotPathLineEdit.text, args, output, readStdeer, openConsole) 
        var a = str(output).replace("\\r\\n", "\n")
	_outputTextEdit.text += "#Output: "

Still crash with the replace removed.

	var exitCode = OS.execute(_godotPathLineEdit.text, args, output, readStdeer, openConsole) 
        var a = str(output)
	_outputTextEdit.text += "#Output: " + a

It appears to crash when we attempt to use/assign the output array to the lineEdit field. Adding a call_deferred here fixes the problem. I had tried looping through the output and building up groomedOutput manually but it still crashed.

	var groomedOutput = str(output).replace("\\r\\n", "\n")
	call_deferred("SetOutputText", groomedOutput)

I have the workaround I need so I'll be moving on. Thanks @AThousandShips for the suggestion.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants