-
-
Notifications
You must be signed in to change notification settings - Fork 21.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize transform propagation for hidden 3d objects #45583
Optimize transform propagation for hidden 3d objects #45583
Conversation
b312488
to
9632162
Compare
Moved this from draft now as I think it should be a good idea, as it gives a free speed boost and is fairly non-intrusive. That is on the assumption that the visual server doesn't need transforms updated for hidden objects, need to get reduz to confirm this. If he thinks this is the case we can do something similar in master. EDIT - reduz says that this is not required in Godot 4.0 as it works differently in this area. And confirms that the visual server shouldn't need transforms of hidden objects, so that sounds as though this is a sensible change. |
When visual instances are hidden, there is no need to update their global transforms and send these to the visual server. This only needs to be done when the objects are reshown.
9632162
to
cb8134f
Compare
(data.parent && !data.toplevel_active) ? | ||
data.parent->get_global_transform().affine_inverse() * p_transform : | ||
p_transform; | ||
Transform xform = (data.parent && !data.toplevel_active) ? data.parent->get_global_transform().affine_inverse() * p_transform : p_transform; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Gave up here and put it all on one line. clang format seems to consistently produce a line breaking result which fails the CI checks for this. Clang format seems to give 6 tabs and 2 spaces, whereas the static checks want 5 tabs and 2 spaces? I thought it was my IDE but now not so sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You might need to upgrade to clang-format 11 to get the same results as the CI. But yeah, clang-format seems to have troubles handling long ternary operators in a consistent way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah I'll try the clang-format upgrade, see if it helps! 👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clang-format-11 gives 8 tabs. Maybe something they haven't quite got sorted yet! 😁
As I said on rocket.chat: This seems like the correct approach for 3.2! 👍 |
Node2D and Spatial have a However, as you state, there's no "parent" counterpart that would make a node's children ignore the parent' transform automatically. |
I'll just have a little look to see whether this can be done visual server side, as reduz says he prefers that if possible. 👍 Same kind of idea, just done on the other side of the call. On the visual server side the function is:
This could easily become something like:
Although this is a little neater, the downside is the efficiency. In the visual_instance:
So we can easily do in the visual server function, but the caveat is it will be doing needless extra work. Either approach is ok, I would lean on the side of client side still, because I think 3.2 can really benefit from any increases in performance we can give. But admittedly the code is simpler having it in the visual server. I'll try some timings using both approaches to compare. |
I didn't know this property. I'm not absolutely sure if it can help.
The wider problem comes down to this:
Where ideally:
In the general case you can't take this latter approach, because you don't know in advance which are hidden, and whether some are nodes that require transform even when hidden (e.g. physics). If you query 10,000 nodes to find this info, you've also lost a load of performance, just by hitting all those nodes unnecessarily. So ideally you'd want some kind of flags to be able to stop propagating certain things:
I think this is partly what is in this proposal: Another approach would be to maintain multiple lists on each node, a list of children, and a list of children that need specific updates, and have the children register / unregister. But this becomes quite complex to the point it's not worth it. You could also cache a list of children that need specific updates, and just set these to dirty when things change. That is maybe more feasible. Having disable flags is probably the most efficient of all but it requires user to take responsibility, whereas some of the others are automatic methods. There's also the question of where you draw the line with these optimizations, the benefit has to be worth the added code complexity. It is a bit of a trade off. Godot normally errs on the side of simple maintainable code at the cost of some performance. |
Ok I've benchmarked both methods now, with 10,000 moving hidden objects as before. Current Godot 3.2 dev This PR Visual Server approach As I suspected the speed of this PR has increased now that the BVH visibility PR #45576 has been merged. What is also clear is that the client side approach is not just a bit faster, it is a lot faster than bailing out in the visual server. So on balance I am inclined to think this is worth the slight less elegance over visual server approach. What do you guys think? |
Thanks! |
When visual instances are hidden, there is no need to update their global transforms and send these to the visual server. This only needs to be done when the objects are reshown.
Explanation
I'm currently looking at optimizing performance of hidden objects, as I've noticed they surprisingly can slow down frame rates significantly. While #45576 deals with the case of static objects, there is still a performance problem when hidden 3d objects are moved.
Benchmarks:
0 objects
6500 fps
10,000 objects, hidden and static (with #45576)
5500 fps
10,000 objects, hidden but moved each frame via moving parent node
158 fps
10,000 objects, hidden but moved each frame via moving parent node, with this PR
360 fps
Clearly the drop with dynamic objects is very significant, and should in theory be totally unnecessary.
Profiling revealed a lot of time was spent calling
get_global_transform()
and sending the transform to the visual server.This example PR keeps a local flag for visibility state (something we should probably be doing anyway, as
is_visible_in_tree
can be expensive), doesn't update these transforms when the node is not visible.This increases fps to 360 fps. While an improvement it is still not as good as can be. This is a profile with this PR applied. Still there is a lot of iterations of the propagation code (12,000000).
And actually the BVH seems to be taking some time which shouldn't be happening, will investigate - ah .. this is a false problem, it is because this PR is without #45576 applied which cures that. 😁
Ideally though we would like to be able to get to a hidden node, and disable propagation of transforms below that node in the scene tree. This is not (afaik) currently supported, presumably because even though child nodes of a hidden node may not need to be visible, they still may need the transform (e.g. physics objects).
I'm currently investigating whether it might be worth us having a single, or set of
disable
modes to prevent propagation of transforms, or perhaps other propagated features.Any comments welcome. The flags BTW is because it is clear from the node source that there are a number of bools that are intended to be converted to flags but no one has got round to it, so I'm potentially getting the ball rolling on that idea. You could either have access functions for the flags or access the data directly to use bit operations, whichever is preferable. Maybe the latter actually having seen this pattern used in other parts of Godot. Or we could use a bool in this PR, and leave all the conversions to flags in another PR - whichever is preferred.