-
-
Notifications
You must be signed in to change notification settings - Fork 21.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Node children management #75627
Optimize Node children management #75627
Conversation
data.children_cache.remove_at(child_index); | ||
data.children_cache.insert(p_index, p_child); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be made to be a single operation by adding relevant method in LocalVector
(as currently it will needlessly shift elements after MAX(p_index, child_index)
back and forth).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Possibly a good idea, but beyond the scope of this PR, the original code did the same thing anyway.
I haven't tested the PR but an immediate problem that might crop up: afaik this notification was used to update the draw order in If you have 4 children, order 0, 1, 2, 3. Delete the middle two, if the draw order is not updated in visual server, the order there is now 0, 3 (in the client is 0, 1). This works, because even though the numbers are out of sync the sorting is the same. Add another child, and the order on the client is 0, 1, 2 (2 being the new child). |
@lawnjelly makes sense, but then it sounds like this is just a matter of CanvasItem hooking to removed child notification on its own to fix this precise issue. |
c4c6d63
to
f676397
Compare
Ah indeed, these are exactly the things that need to be cached.... As an aside, I think AnimationTree systems needs an optimization similar to this one, so I'll keep it in mind for me. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did some testing and there doesn't seem to be any obvious regression.
The code looks fine.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes deleting nodes go from linear in the number of children to quadratic in the number of children
var hash = Engine.get_version_info()["hash"]
for count in range(1000, 10001, 1000):
var start = Time.get_ticks_msec()
var root = Node.new()
for i in count:
root.add_child(Node.new())
root.free()
var end = Time.get_ticks_msec()
print("%s\t%s\t%d" % [hash, count, end - start])
Long table of benchmark numbers
commit | count |
time in ms |
---|---|---|
5fbbe3b | 1000 | 4 |
5fbbe3b | 2000 | 7 |
5fbbe3b | 3000 | 10 |
5fbbe3b | 4000 | 15 |
5fbbe3b | 5000 | 18 |
5fbbe3b | 6000 | 23 |
5fbbe3b | 7000 | 27 |
5fbbe3b | 8000 | 33 |
5fbbe3b | 9000 | 35 |
5fbbe3b | 10000 | 40 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 1000 | 7 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 2000 | 21 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 3000 | 40 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 4000 | 71 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 5000 | 135 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 6000 | 228 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 7000 | 413 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 8000 | 562 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 9000 | 719 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 10000 | 888 |
5fbbe3b | 1000 | 3 |
5fbbe3b | 2000 | 6 |
5fbbe3b | 3000 | 11 |
5fbbe3b | 4000 | 16 |
5fbbe3b | 5000 | 19 |
5fbbe3b | 6000 | 24 |
5fbbe3b | 7000 | 29 |
5fbbe3b | 8000 | 32 |
5fbbe3b | 9000 | 35 |
5fbbe3b | 10000 | 40 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 1000 | 8 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 2000 | 20 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 3000 | 60 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 4000 | 90 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 5000 | 144 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 6000 | 229 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 7000 | 370 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 8000 | 527 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 9000 | 699 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 10000 | 892 |
5fbbe3b | 1000 | 4 |
5fbbe3b | 2000 | 7 |
5fbbe3b | 3000 | 11 |
5fbbe3b | 4000 | 14 |
5fbbe3b | 5000 | 17 |
5fbbe3b | 6000 | 22 |
5fbbe3b | 7000 | 27 |
5fbbe3b | 8000 | 30 |
5fbbe3b | 9000 | 43 |
5fbbe3b | 10000 | 39 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 1000 | 7 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 2000 | 20 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 3000 | 40 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 4000 | 72 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 5000 | 134 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 6000 | 240 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 7000 | 360 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 8000 | 519 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 9000 | 697 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 10000 | 884 |
Without the free
this PR is about a 10% slowdown:
Details
commit | count |
time in ms (without free) |
---|---|---|
5fbbe3b | 10000 | 35 |
5fbbe3b | 20000 | 70 |
5fbbe3b | 30000 | 121 |
5fbbe3b | 40000 | 149 |
5fbbe3b | 50000 | 224 |
5fbbe3b | 60000 | 222 |
5fbbe3b | 70000 | 278 |
5fbbe3b | 80000 | 341 |
5fbbe3b | 90000 | 393 |
5fbbe3b | 100000 | 473 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 10000 | 42 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 20000 | 75 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 30000 | 116 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 40000 | 164 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 50000 | 218 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 60000 | 275 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 70000 | 305 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 80000 | 371 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 90000 | 432 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 100000 | 520 |
5fbbe3b | 10000 | 35 |
5fbbe3b | 20000 | 68 |
5fbbe3b | 30000 | 115 |
5fbbe3b | 40000 | 155 |
5fbbe3b | 50000 | 189 |
5fbbe3b | 60000 | 276 |
5fbbe3b | 70000 | 306 |
5fbbe3b | 80000 | 351 |
5fbbe3b | 90000 | 393 |
5fbbe3b | 100000 | 490 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 10000 | 38 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 20000 | 76 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 30000 | 118 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 40000 | 163 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 50000 | 220 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 60000 | 252 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 70000 | 299 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 80000 | 374 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 90000 | 430 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 100000 | 539 |
5fbbe3b | 10000 | 40 |
5fbbe3b | 20000 | 86 |
5fbbe3b | 30000 | 129 |
5fbbe3b | 40000 | 171 |
5fbbe3b | 50000 | 261 |
5fbbe3b | 60000 | 265 |
5fbbe3b | 70000 | 291 |
5fbbe3b | 80000 | 354 |
5fbbe3b | 90000 | 424 |
5fbbe3b | 100000 | 479 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 10000 | 39 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 20000 | 108 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 30000 | 120 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 40000 | 179 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 50000 | 210 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 60000 | 266 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 70000 | 305 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 80000 | 361 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 90000 | 420 |
f6763970aaed5f9f1bfc4b9890af81fe2307a75c | 100000 | 513 |
(note that there are 10x more nodes in this example to get the times larger)
@RedworkDE ah I just realized that my latest commit re-adding NOTIFICATION_MOVED_IN_PARENT in So I guess ultimately, this PR has to be merged first, then I can remove that notification and the optimization will work as intended :( |
Opened #75701 which needs to be merged before this one can be fixed. |
@RedworkDE could you please also check memory usage difference? I saw somewhere that the new approach with hashmap will consume much more RAM. |
Memory usage of a node with lots of children goes up about 10%: First rows are bytes, second rows are the increase over master.
And since I am now posting here anyways: some performance numbers as well with some more scenarios and a version with the other PR included: First rows are ms for the operations with 10'000 children (see code).
Code for the operations var adding = func(count : int):
perf_check("adding\t%d" % [count], func():
var root = Node.new()
for i in count:
root.add_child(Node.new()))
var deleting = func(count : int):
perf_check("deleting\t%d" % [count], func():
var root = Node.new()
for i in count:
root.add_child(Node.new())
root.free())
var move_first = func(count : int):
perf_check("move_first\t%d" % [count], func():
var root = Node.new()
for i in count:
var node = Node.new()
root.add_child(node)
root.move_child(node, 0))
var insert_second = func(count : int):
perf_check("insert_second\t%d" % [count], func():
var root = Node.new()
var first = Node.new()
root.add_child(first)
for i in count:
first.add_sibling(Node.new()))
var removing_reverse = func(count : int):
perf_check("removing_reverse\t%d" % [count], func():
var root = Node.new()
for i in count:
root.add_child(Node.new())
for i in count:
root.remove_child(root.get_child(0)))
var removing_random = func(count : int):
perf_check("removing_random\t%d" % [count], func():
var root = Node.new()
for i in count:
root.add_child(Node.new())
var rnd = RandomNumberGenerator.new()
rnd.seed = 0
for i in count:
root.remove_child(root.get_child(rnd.randi_range(0, count - i - 1))))
var shuffle = func(count : int):
perf_check("shuffle\t%d" % [count], func():
var root = Node.new()
for i in count:
root.add_child(Node.new())
var rnd = RandomNumberGenerator.new()
rnd.seed = 0
for i in count:
root.move_child(root.get_child(rnd.randi_range(0, count - i - 1)), rnd.randi_range(0, count - i - 1))) No changes in algorithmic complexity, but everything other than shuffling the nodes children seems to be a lot slower in this PR, but i may have just some changes that should be made when integrating the notification PR. Are there any scenarios I should be testing where this is expected to be good/bad or that someone needs in particular? |
@RedworkDE Simple addition/removal cases are not going to be much of a difference, and increase in memory usage is expected. That said, #75701 needs to be merged and I need to rebase this one upon it, otherwise there will not be much of a point in benchmarking for now. |
Hash table is for finding children by name faster. Operations that don't use name are likely same or (hopefully only slightly) slower. How much hash table helps presumably will depend how often named operations are used, and whether they are bottlenecks. For the other PR #75701, the original issue I noticed the problem is #61929 , this was when I was writing mesh merging, which does a lot of node sorting. See also #65581 and #74672 which contain more explanations / info, these were fixes for the same problem in 3.x. Also #62444 deals with the same problem in the specific case of Things that are likely to have a lot of improvement, all when there are large number of children in a parent node (say 10,000 or more):
The reason is that each deletion / move / attach / detach potentially invalidates the draw index of every child following the one changed. At present each time an operation is done, the entire remaining list of children is updated with a So if you have 10,000 children, deleting the first 10 children (1 by 1) can result in approx 100,000 notifications. With the flush once approach these notifications are deferred and only occur once per frame or tick. |
Thanks to everyone involved! |
Is it OK that nodes addition, which is one of the most common use cases, became 20% slower? The other like nodes creation, get_parent(), free() etc. also doesn't look promising. (I'm not dissatisfied and don't want to be rude, just want Godot to get faster and more stable :) ) |
@akien-mga Ah, I was expecting more review/testing before merging it. Well YOLO. @arkology Its a tradeoff. Keep in mind that this went together with #75701, #75760 and #75797, which already hugely optimize children addition and other operations. Adding children is already very fast now, and far faster than 3.x even with this PR. Still, this PR does add a small penalty in base cost for performance and memory usage, but in exchange it optimizes many other common use cases that had abysmal performance (as in, adding named children, get_node() or removing children in random order -again, all very common use cases- were very slow with large number of nodes and this makes them fast). |
What is the difference between "adding" and "adding_named". Why such huge gap? |
(Just some thoughts). |
Named children need to be validated for name conflicts. Unnamed (these with bunch of
Any scene you instantiate has named nodes... |
@seppoday Adding named means adding a child that has a name, this is currently very slow. As an example, if you have something you brought via preload/packed_scene its most likely going to be named, hence if you add too many its going to start slowing down. This PR fixes that. @arkology Named nodes is very common, if you do |
Looking at test values. Adding named are now almost 2x faster than adding (no named)... It was waaaaaay slower before PR. (PR is amazing overall, just trying to understand why adding named gained such boost and also it became faster than adding (no named)) Adding was 83 ms. Now is 105 ms. |
Oh, my fault, I fully forgot about scenes instantiation... Sorry about that. I should stop creating most of the nodes in runtime via scripts 😅 Or set their names to get speedup.
Interesting, could something more be done with addition of unnamed nodes (above already existed PRs)... |
@arkology Why do I bother writing :( As I just mentioned, it was already heavily optimized, if you compare with 4.0 its most likely much faster already even after this PR was merged. It will not be faster than the approach it replaces, but it is far faster than 4.0 mainline or 3.5. |
I'll just cut conversation. Overall PR is amazing in some cases. There just was some question (probably unnecesary) from noobs (including me). I think we can just close disscusion and move on :P Good job everyone! |
Even if they're both faster than before due to other PR's, it would seem that if you create a node with .new(), it is now a valid optimization technique to give it a name before calling add_child()? That seems odd, so I'm not surprised they asked several times. It'd also be an important thing to note if people somehow still run into bottllenecks with adding nodes. |
There are three scenarios for node naming:
|
@L4Vo5 Giving it a name and then adding is not necesarily faster, what is faster is adding it once it has a name. This is very useful when using PackedScene and creating instances of it. |
I was curious about the "optimization" of giving nodes a unique name before adding them: The very simple name generator of just using a counter is 50% slower than just adding the node unnamed, so there is no point in doing that. Also the second row from my benchmarks is before reduz two other optimzation prs, so adding unnamed nodes is really still 20% faster than before. |
This changes the children management and makes it a hashmap, optimizing most StringName based operations. Most operations should be severe speed up without breaking compatibility.
This should fix many issues regarding to node access performance, and may also speed up editor start/end, but benchmarks are needed. So if you want to test, please make some benchmarks!
Further improvements in performance will be achieved with the removal of NOTIFICATION_MOVED_IN_PARENT, but this is left for a future PR.Done #75701.