-
-
Notifications
You must be signed in to change notification settings - Fork 21.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Significant performance drop with OmniLight3D shadows between 4.2 and 4.3. #97472
Comments
I think the visual profiler needs some more digits. Render Light3D is showing 0.01-0.03ms for each light, and each cull within it shows 0, but I think that might be where the perf difference is, as the render omnilight shadows takes around 1.3ms in each version for me (though 4.2 lumps that under "Depth pre-pass"). Also maybe a total of rendering lights could be helpful. This is going to be awkward if the perf hit is from my fix to the shadow distance fix. 😰 I tried disabling the call to light_instance_is_shadow_visible_at_position, but that didn't seem to make a difference. Also added some more visual profiler points. In KOOK, for all the lights in my scene it's 1.47/1.97ms on the CPU, which is a lot, but it doesn't compute as to how I'm getting sub-100FPS: |
Does it happen with the 4.4 dev version? |
Some more experimenting. If you disable the animation player then in 4.2.2 you get only a couple draw calls while in 4.4 you get around 250. So something is triggering shadow updates that shouldn't be happening |
Yeah, that makes sense. I noticed the perf was way lower in a level that had basically nothing moving on it, though when I tried in the MRP, it didn't seem to hurt the perf as badly, so I thought maybe that wasn't it. Also, as I've been trying to profile things, I've discovered the built-in profiler is WAY off, so I created a bug about that. Tracy seems far more reliable, but it's a pain to add profile points everywhere for stuff that should already be measured: #97473 |
I suspect that #91545 is the culprit. I tested by increasing the subdivisions up to 256 for each quadrant and the performance goes back to what it should be. Basically the problem in this MRP is that there are more shadows than can fit in the atlas. The lights that don't fit are essentially just waiting their turn and then evicting a light that is currently in the atlas. So there is always a queue of lights to update. By increasing the subdivisions you can avoid that situation. Prior to #91545 there was a bug where lights would find an imaginary place in the atlas that actually extended way past the boundary of the atlas. As a result, they didn't try to update there shadows each frame. However, they did try to draw outside the bounds of the shadow atlas which causes a crash on a lot of hardware. I need to think about this more to see if there is something more sensible we can do when you have more omni lights than you can fit on the atlas. |
Hmm, sounds like we need some sort of limit and priority system, so after a certain point, lights (or shadows) just turn off if you have more than will fit on the atlas. Along those lines, it would be really nice if there was a way to have a set limit on the number of atlas textures that update per frame with some sort of priority. Lots of shadow updates can really kill perf, and I'd rather trade off some shadow artifacting on distant lights. |
So this is becoming a major bottleneck for me. I can't seem to make a MRP that is as bad performance wise as my actual project. Something is causing basically every omni light in my scene (over 30 lights) to re-render every shadow every frame, so over 100 scene renders per frame, even on a completely static scene. This is the closest MRP i have: test_light_performance (2).zip So in this one I have the atlas set to 16, 64, 256, 256. If I set the atlas to all the same size, like 64, 64, 64, 64 it mostly fixes the issue. On my own project, though, 64, 64, 64, 64 only reduces the shadow re-renders by like half. Very painful to have a retro fps running at like 30-50 FPS on a 3070RTX. I'm still digging into why the shadows are marked as dirty. Edit: Seems the problem is not with the lights being marked dirty. It's somewhere related to this. if (shadow_atlas->shadow_owners.has(p_light_instance)) {
old_key = shadow_atlas->shadow_owners[p_light_instance];
old_quadrant = (old_key >> QUADRANT_SHIFT) & 0x3;
old_shadow = old_key & SHADOW_INDEX_MASK;
should_realloc = shadow_atlas->quadrants[old_quadrant].subdivision != (uint32_t)best_subdiv && (shadow_atlas->quadrants[old_quadrant].shadows[old_shadow].alloc_tick - tick > shadow_atlas_realloc_tolerance_msec);
should_redraw = shadow_atlas->quadrants[old_quadrant].shadows[old_shadow].version != p_light_version;
if (!should_realloc) {
shadow_atlas->quadrants[old_quadrant].shadows.write[old_shadow].version = p_light_version;
//already existing, see if it should redraw or it's just OK
return should_redraw;
}
old_subdivision = shadow_atlas->quadrants[old_quadrant].subdivision;
} should_redraw is true. p_light_version is 2 here, after running for a bit, so it's not like that has been incremented a bunch. Maybe the lights are constantly swapping between quadrants? |
So this seems backward, shouldn't
be
? That would prevent stuff from reallocating every frame, but won't fix the underlying issue of stuff constantly updating on the atlas. Also, it seems like we should have more of a global timer instead of a per-shadow timer, as per-shadow means you could have 100+ shadows all update on one frame, then 500ms later, all 100+shadows update on the same frame again. Edit: So just as a test, I made should_realloc always false, but something else is causing the shadows to always redraw that I can't reproduce in my MRP project. In the mrp, this change makes it so the shadows don't re-render. |
Ok, I think I found another part of the problem. I had a debug drawing line class where I had an immediate mesh: extends Node
# Debug system
var mesh_instance : MeshInstance3D
var mesh_instance_permanent : MeshInstance3D
var immediate_mesh_permanent : ImmediateMesh
var immediate_mesh : ImmediateMesh
var immediate_mesh_alternate_frame : ImmediateMesh
var unshaded_material : StandardMaterial3D
const MAX_MESH_INSTANCES := 256
func _ready():
unshaded_material = StandardMaterial3D.new()
unshaded_material.shading_mode = BaseMaterial3D.SHADING_MODE_UNSHADED
unshaded_material.vertex_color_use_as_albedo = true
unshaded_material.vertex_color_is_srgb = true
mesh_instance = MeshInstance3D.new()
mesh_instance_permanent = MeshInstance3D.new()
mesh_instance.material_override = unshaded_material
mesh_instance_permanent.material_override = unshaded_material
immediate_mesh = ImmediateMesh.new()
immediate_mesh_alternate_frame = ImmediateMesh.new()
immediate_mesh_permanent = ImmediateMesh.new()
mesh_instance.mesh = immediate_mesh
mesh_instance_permanent.mesh = immediate_mesh_permanent
get_tree().current_scene.add_child(mesh_instance)
get_tree().current_scene.add_child(mesh_instance_permanent)
draw_line(Vector3.ZERO, Vector3.RIGHT * 100)
func draw_line(start : Vector3, end : Vector3, color := Color.WHITE):
immediate_mesh.surface_begin(Mesh.PRIMITIVE_LINES)
immediate_mesh.surface_set_color(color)
immediate_mesh.surface_add_vertex(start)
immediate_mesh.surface_add_vertex(end)
immediate_mesh.surface_end()
var permanent_lines_added := 0
func draw_line_permanent(start : Vector3, end : Vector3, color := Color.WHITE):
permanent_lines_added += 1
if (permanent_lines_added >= MAX_MESH_INSTANCES):
immediate_mesh_permanent.clear_surfaces()
permanent_lines_added = 0
immediate_mesh_permanent.surface_begin(Mesh.PRIMITIVE_LINES)
immediate_mesh_permanent.surface_set_color(color)
immediate_mesh_permanent.surface_add_vertex(start)
immediate_mesh_permanent.surface_add_vertex(end)
immediate_mesh_permanent.surface_end()
func _process(_delta):
immediate_mesh_alternate_frame.clear_surfaces()
var temp := immediate_mesh
immediate_mesh = immediate_mesh_alternate_frame
immediate_mesh_alternate_frame = temp
As a test, I had just drawn a 100m line that should only last for 1 frame, but it seems the line was marking the lights as dirty every frame. Turning shadow casting off on the debug lines helped a lot: Not sure why they're always updating. Might be an issue with immediate meshes that should be investigated. Edit: Yeah, I'll bet the clear_surfaces() marks things as dirty even if it's already empty, and it seems the bounding box doesn't update when the surfaces are cleared, so basically if you draw a really large (like 100m) mesh like I did, that's going to cover a large portion of the level and force a bunch of lights to update every frame. Should probably make clear_surfaces() not do anything if it has no surfaces, and also shrink the bounding box if there is something there. |
I think you are right, this line is backwards from the other places where we check the alloc_tick:
And ya, it makes sense for Would you like to make a PR with these changes or should I? |
I tested out the changes locally to confirm that they are enough to fix the regression. And they are, so I'll just go ahead and make a PR now. Great work figuring this out! |
I've posted some ideas for future performance improvement based on my findings here: godotengine/godot-proposals#2745 (comment) |
Tested versions
Higher performance in 4.2.2, lower performance in 4.3.
System information
Windows 10, Vulkan forward +, Nvidia 3070
Issue description
After updating to 4.3, I've noticed a significant performance drop. I confirmed with a test scene that the framerate goes from 400+fps to like 290.
In 4.2.2:
In 4.3:
Steps to reproduce
Build a scene with around 30 omni lights with shadows enabled and some moving objects. Test in 4.2 and note framerate. Test the same scene in 4.3 and note the framerate difference (or just try the MRP below).
Minimal reproduction project (MRP)
test_light_performance.zip
The text was updated successfully, but these errors were encountered: