[3.x] Shadow volume culling and tighter shadow caster culling #82584

lawnjelly · 2023-09-30T15:09:00Z

Existing shadow caster culling using the BVH takes no account of the camera. This PR adds the highly encapsulated class VisualServerLightCuller which can cut down the casters in the shadow volume to only those which can cast shadows on the camera frustum.

This is used to:

More accurately defer dirty updates to shadows when the shadow volume does not intersect the camera frustum.
Tighter cull shadow casters to the view frustum.

Lights dirty state is now automatically managed:

Continuous (tighter caster culling)
Static (all casters are rendered)

Explanation

You can see roughly how it works in this old video of mine (ignore the rooms and portals, that is a separate system):
https://www.youtube.com/watch?v=1WT5AXZlsDc

The blue lines from the light sources to the camera frustum show the extra culling planes.

How does it work?

At runtime, the routine checks each plane of the camera frustum, and finds whether it is facing either towards or away from the light (0 or 1). These bits for the 6 planes form a 6 bit number, which is the lookup.

The lookup tells us a list of corner points from the camera frustum which form a silhouette, which can be used to generate culling planes together with the light origin (3 points form a culling plane).

References:

http://lspiroengine.com/?p=153
http://www.terathon.com/gdc06_lengyel.pdf

Performance

In tests in TPS demo, without GI and just using shadows, in many areas this halves the number of drawcalls / vertex count, in some cases reduces drawcalls by a factor of 10x. This can lead to 10-300% increase in FPS (the increase in FPS depends on settings used, if fill rate is bottlenecking then improving shadows has less dramatic effect and vice versa).

In WroughtFlesh, which uses directional light only, I get a more modest 10% or so improvement if FPS, due to the tighter caster culling with the directional light. So it seems like the benefits are higher the more omnis / spots are used.

Notes

This is based on a resurrection of [WIP] Tighter shadow caster culling #33340 . That PR's approach wasn't compatible with the dirty flag optimization for omnis and spots, but this PR automatically manages dynamic lights to handle tighter culling, while reverting to rendering all casters when used in static manner.
This PR also adds the major optimization that shadow maps outside the view frustum don't need to be updated at all. There was some existing culling based on AABBs but this PR is far more accurate. This can lead to major performance gains where a lot of shadowed lights are in use (e.g. TPS demo).
The project settings are initially for testing, it can detect changes at runtime (if you e.g. change project setting from script). If it works correctly there may be no need to have it switchable, as it should be virtually always a win and the calculations are very little, and outweighed by any gains.

Tighter caster culling and Multiple Cameras

There is one more situation in which tighter caster culling is problematic: when multiple viewports are in use, and the shadow volume intersects multiple cameras.
In this situation tighter caster culling will work - it will do a tight cull on the first camera, and a full render for the second camera. The problem is that it will do 2 shadow renders per frame instead of one.
The answer used here is to detect this situation (in detect_light_intersects_multiple_cameras()) and switch to a different mode light_intersects_multiple_cameras.
This reverts to the legacy approach of doing a full render on the first update. However, we still want to detect the situation where it changes back to a single camera. This is done by means of a timeout after a certain number of frames without a double update.

Directional Lights

Directional lights are handled separately in 3.x, they are always updated, and with different shadow maps if multiple cameras are used with viewports. Therefore they can always do the tighter caster cull.

Further work

There is one important further optimization which I have not used yet here. A shadowmap update is triggered by either an object that is paired with a light moving, or the light itself moving. However, if the object / objects moving that trigger the update are culled by tighter shadow casting, there is actually no need to update the shadow map at all, unless it is a full update. This could be significant in some cases, if there is e.g. a moving object that doesn't cast on the frustum that is triggering the whole process.

Production edit: This closes Improve DirectionalLight shadow rendering efficiency #57549.

Calinou · 2023-10-01T00:36:53Z

It is highly possible we can use some AI approach to change omnis and spots from their regular "dirty optimization" mode to continuous mode using tighter shadow casting. Which they use will depend on detecting moving objects within their volume. This could be e.g. a timer, if no moving objects after 10 frames, change to regular mode, else change to continuous mode.

If we mark light shadows as static or dynamic for each light, this could be decided based on whether the light is declared to be static or dynamic.

jams3223 · 2023-10-01T23:24:39Z

Could we cherry-pick this for 4.x ?

servers/visual/visual_server_light_culler.cpp

servers/visual/visual_server_light_culler.h

servers/visual/visual_server_scene.cpp

lawnjelly · 2023-10-18T13:23:40Z

Rendering meeting today:

We are fine with this PR, and it should be fine to port to 4.x (although the actual shadow map render may be deferred).
I'll try and include the code used to generate the lookup table (perhaps in long comment) for future maintenance / debugging, and in case of order of frustum plane changes.

UPDATE:
Now includes the lookup table generation code. This does double the PR size, but the generation code is compiled out unless VISUAL_SERVER_LIGHT_CULLER_CALCULATE_LUT is defined.

The lookup generation prints the LUT to the standard output, and this can be copied directly to the c++ source.

``` LIGHT VOLUME TABLE BEGIN

Copy this to LUT_entry_sizes:

{0, 4, 4, 0, 4, 6, 6, 8, 4, 6, 6, 8, 6, 6, 6, 6, 4, 6, 6, 8, 0, 8, 8, 0, 6, 6, 6, 6, 8, 6, 6, 4, 4, 6, 6, 8, 6, 6, 6, 6, 0, 8, 8, 0, 8, 6, 6, 4, 6, 6, 6, 6, 8, 6, 6, 4, 8, 6, 6, 4, 0, 4, 4, 0, }

Copy this to LUT_entries:

{0, 0, 0, 0, 0, 0, 0, },
{7, 6, 4, 5, 0, 0, 0, },
{1, 0, 2, 3, 0, 0, 0, },
{0, 0, 0, 0, 0, 0, 0, },
{1, 5, 4, 0, 0, 0, 0, },
{1, 5, 7, 6, 4, 0, 0, },
{4, 0, 2, 3, 1, 5, 0, },
{5, 7, 6, 4, 0, 2, 3, },
{0, 4, 6, 2, 0, 0, 0, },
{0, 4, 5, 7, 6, 2, 0, },
{6, 2, 3, 1, 0, 4, 0, },
{2, 3, 1, 0, 4, 5, 7, },
{0, 1, 5, 4, 6, 2, 0, },
{0, 1, 5, 7, 6, 2, 0, },
{6, 2, 3, 1, 5, 4, 0, },
{2, 3, 1, 5, 7, 6, 0, },
{2, 6, 7, 3, 0, 0, 0, },
{2, 6, 4, 5, 7, 3, 0, },
{7, 3, 1, 0, 2, 6, 0, },
{3, 1, 0, 2, 6, 4, 5, },
{0, 0, 0, 0, 0, 0, 0, },
{2, 6, 4, 0, 1, 5, 7, },
{7, 3, 1, 5, 4, 0, 2, },
{0, 0, 0, 0, 0, 0, 0, },
{2, 0, 4, 6, 7, 3, 0, },
{2, 0, 4, 5, 7, 3, 0, },
{7, 3, 1, 0, 4, 6, 0, },
{3, 1, 0, 4, 5, 7, 0, },
{2, 0, 1, 5, 4, 6, 7, },
{2, 0, 1, 5, 7, 3, 0, },
{7, 3, 1, 5, 4, 6, 0, },
{3, 1, 5, 7, 0, 0, 0, },
{3, 7, 5, 1, 0, 0, 0, },
{3, 7, 6, 4, 5, 1, 0, },
{5, 1, 0, 2, 3, 7, 0, },
{7, 6, 4, 5, 1, 0, 2, },
{3, 7, 5, 4, 0, 1, 0, },
{3, 7, 6, 4, 0, 1, 0, },
{5, 4, 0, 2, 3, 7, 0, },
{7, 6, 4, 0, 2, 3, 0, },
{0, 0, 0, 0, 0, 0, 0, },
{3, 7, 6, 2, 0, 4, 5, },
{5, 1, 0, 4, 6, 2, 3, },
{0, 0, 0, 0, 0, 0, 0, },
{3, 7, 5, 4, 6, 2, 0, },
{3, 7, 6, 2, 0, 1, 0, },
{5, 4, 6, 2, 3, 7, 0, },
{7, 6, 2, 3, 0, 0, 0, },
{3, 2, 6, 7, 5, 1, 0, },
{3, 2, 6, 4, 5, 1, 0, },
{5, 1, 0, 2, 6, 7, 0, },
{1, 0, 2, 6, 4, 5, 0, },
{3, 2, 6, 7, 5, 4, 0, },
{3, 2, 6, 4, 0, 1, 0, },
{5, 4, 0, 2, 6, 7, 0, },
{6, 4, 0, 2, 0, 0, 0, },
{3, 2, 0, 4, 6, 7, 5, },
{3, 2, 0, 4, 5, 1, 0, },
{5, 1, 0, 4, 6, 7, 0, },
{1, 0, 4, 5, 0, 0, 0, },
{0, 0, 0, 0, 0, 0, 0, },
{3, 2, 0, 1, 0, 0, 0, },
{5, 4, 6, 7, 0, 0, 0, },
{0, 0, 0, 0, 0, 0, 0, },

LIGHT VOLUME TABLE END

</details>

Calinou · 2023-11-14T11:37:45Z

Tested locally, it works as expected. Visuals look correct too from my testing in various demo projects.

Great work, this likely resolves one of Godot's largest rendering bottlenecks in complex scenes 🙂

Benchmark on tps-demo

OS: Fedora 38
CPU: Intel Core i9-13900K
GPU: GeForce RTX 4090 (NVIDIA 535.113.01)

The project is modified to disable V-Sync. The FPS reported is the highest FPS attained over a period of 10 seconds after loading the level, although I can confirm the average values are always increased in a similar proportion. When CPU-limited, the FPS varies a fair bit over time due to the flying forklift moving in and out of view.

Type	Before	After
4K Maximum GLES3	131 FPS (7.63 mspf)	135 FPS (7.40 mspf)
4K Minimum GLES3	325 FPS (3.07 mspf)	431 FPS (2.32 mspf)
720p Maximum GLES3	309 FPS (3.23 mspf)	398 FPS (2.51 mspf)
720p Minimum GLES3	330 FPS (3.03 mspf)	434 FPS (2.30 mspf)
4K Minimum GLES2	181 FPS (5.52 mspf)	213 FPS (4.69 mspf)
720p Minimum GLES2	186 FPS (5.37 mspf)	221 FPS (4.52 mspf)

Maximum settings has all settings enabled or set to their highest possible value.
- These tests are largely GPU-bound, in particular for the 4K one.
Minimum settings has all settings disabled except shadow mapping, which is left enabled.
- These tests are largely CPU-bound.

Existing shadow caster culling using the BVH takes no account of the camera. This PR adds the highly encapsulated class VisualServerLightCuller which can cut down the casters in the shadow volume to only those which can cast shadows on the camera frustum. This is used to: * More accurately defer dirty updates to shadows when the shadow volume does not intersect the camera frustum. * Tighter cull shadow casters to the view frustum. Lights dirty state is now automatically managed: * Continuous (tighter caster culling) * Static (all casters are rendered)

lawnjelly · 2023-11-14T14:32:30Z

I pushed some small improvements, but it turns out the bug in the master version is because it's being used multithread there, and it isn't thread safe. So 3.x version should be fine in that respect, and I'll see if I can fix up the master version. 👍

clayjohn

Looks good to me. I trust the testing that has already been done.

The performance benefits speak for themselves. Let's get this in to 3.6

akien-mga · 2024-01-29T22:34:13Z

Thanks!

Zireael07 · 2024-01-30T08:17:45Z

Will there be equivalent improvements to Vulkan or is this GLES only?

lawnjelly · 2024-01-30T08:23:36Z

This is the 3.x PR, I'm just testing the master PR #84745 . There are improvements to all backends, as the culling takes place before the backend.

lawnjelly added enhancement topic:rendering topic:3d labels Sep 30, 2023

lawnjelly added this to the 3.6 milestone Sep 30, 2023

lawnjelly force-pushed the lightcull_23 branch 3 times, most recently from 5fec369 to bbc4857 Compare September 30, 2023 16:42

Calinou added the performance label Oct 1, 2023

This comment was marked as resolved.

Sign in to view

lawnjelly force-pushed the lightcull_23 branch 7 times, most recently from 9e3c8cb to 5818a1b Compare October 1, 2023 14:11

lawnjelly marked this pull request as ready for review October 1, 2023 16:58

lawnjelly requested review from a team as code owners October 1, 2023 16:58

lawnjelly force-pushed the lightcull_23 branch from 5818a1b to 089b09f Compare October 1, 2023 17:08

jams3223 mentioned this pull request Oct 1, 2023

Add per-light shadow cull masks to control which objects cast shadows godotengine/godot-proposals#3606

Closed

AThousandShips reviewed Oct 2, 2023

View reviewed changes

lawnjelly force-pushed the lightcull_23 branch from 089b09f to d07da23 Compare October 2, 2023 11:20

lawnjelly force-pushed the lightcull_23 branch 2 times, most recently from f800bb7 to e0f2da5 Compare October 30, 2023 14:03

lawnjelly mentioned this pull request Nov 11, 2023

Shadow volume culling and tighter shadow caster culling #84745

Merged

lawnjelly force-pushed the lightcull_23 branch from e0f2da5 to 8ca631a Compare November 14, 2023 14:18

clayjohn approved these changes Jan 29, 2024

View reviewed changes

akien-mga merged commit 6f3c5e6 into godotengine:3.x Jan 29, 2024
13 checks passed

akien-mga mentioned this pull request Jan 29, 2024

Improve DirectionalLight shadow rendering efficiency #57549

Closed

lawnjelly deleted the lightcull_23 branch January 30, 2024 06:43

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[3.x] Shadow volume culling and tighter shadow caster culling #82584

[3.x] Shadow volume culling and tighter shadow caster culling #82584

lawnjelly commented Sep 30, 2023 •

edited by Calinou

Loading

Calinou commented Oct 1, 2023 •

edited

Loading

This comment was marked as resolved.

jams3223 commented Oct 1, 2023 •

edited

Loading

lawnjelly commented Oct 18, 2023 •

edited

Loading

Calinou commented Nov 14, 2023 •

edited

Loading

lawnjelly commented Nov 14, 2023

clayjohn left a comment

akien-mga commented Jan 29, 2024

Zireael07 commented Jan 30, 2024

lawnjelly commented Jan 30, 2024

[3.x] Shadow volume culling and tighter shadow caster culling #82584

[3.x] Shadow volume culling and tighter shadow caster culling #82584

Conversation

lawnjelly commented Sep 30, 2023 • edited by Calinou Loading

Explanation

How does it work?

References:

Performance

Notes

Tighter caster culling and Multiple Cameras

Directional Lights

Further work

Calinou commented Oct 1, 2023 • edited Loading

This comment was marked as resolved.

jams3223 commented Oct 1, 2023 • edited Loading

lawnjelly commented Oct 18, 2023 • edited Loading

Calinou commented Nov 14, 2023 • edited Loading

Benchmark on tps-demo

lawnjelly commented Nov 14, 2023

clayjohn left a comment

Choose a reason for hiding this comment

akien-mga commented Jan 29, 2024

Zireael07 commented Jan 30, 2024

lawnjelly commented Jan 30, 2024

lawnjelly commented Sep 30, 2023 •

edited by Calinou

Loading

Calinou commented Oct 1, 2023 •

edited

Loading

jams3223 commented Oct 1, 2023 •

edited

Loading

lawnjelly commented Oct 18, 2023 •

edited

Loading

Calinou commented Nov 14, 2023 •

edited

Loading