[Merged by Bors] - Use storage buffers for clustered forward point lights #3989

superdump · 2022-02-19T15:50:11Z

Objective

Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene
Fixes Use storage buffers for clustered forward bindings for 'unlimited' point lights #3605
Based on top of [Merged by Bors] - Add a helper for storage buffers similar to UniformVec #4079

This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.

Solution

Add RenderDevice to the Material and SpecializedMaterial trait ::key() functions to allow setting flags on the keys depending on feature/limit availability
Make GpuPointLights and ViewClusterBuffers into enums containing UniformVec and StorageBuffer variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.
Appropriate shader defs in the shader code to handle the two cases

Context on some decisions / open questions

I'm using max_storage_buffers_per_shader_stage >= 3 as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.
Should ViewClusterBuffers and ViewClusterBindings be merged, duplicating the count variables into the enum variants?

superdump · 2022-02-19T16:16:25Z

Apparently storage buffers cannot have 0 size so that case needs taking care of.

superdump · 2022-02-27T00:13:05Z

I'm going to rework the StorageVec a little.

superdump · 2022-03-02T07:39:04Z

This is now based on top of:

[Merged by Bors] - Add a helper for storage buffers similar to UniformVec #4079

crates/bevy_pbr/src/lib.rs

crates/bevy_pbr/src/light.rs

examples/README.md

alice-i-cecile

A few notes on small code quality issues.

alice-i-cecile

Great; I'm happy with this now. I think the few little tweaks you made strike a nice balance between avoiding over-engineering and making sure that we actually keep changes synchronized.

Obviously, I don't have in-depth feedback about the details of the rendering strategy, but I'm happy with the code quality and examples.

jakobhellermann

Looks good as far as I can tell.

crates/bevy_pbr/src/render/light.rs

crates/bevy_render/src/render_resource/storage_buffer.rs

mockersf · 2022-03-27T19:20:06Z

crates/bevy_pbr/src/lib.rs

+                // NOTE: This needs to run after prepare_lights. As prepare_lights is an exclusive system,
+                // just adding it to the non-exclusive systems in the Prepare stage means it runs after
+                // prepare_lights.
+                render::prepare_clusters.label(RenderLightSystems::PrepareClusters),


this label shouldn't be needed since #4224

mockersf · 2022-03-27T19:57:43Z

examples/stress_tests/many_lights.rs

+        commands.spawn_bundle(PointLightBundle {
+            point_light: PointLight {
+                range: LIGHT_RADIUS,
+                intensity: LIGHT_INTENSITY,
+                ..default()
+            },
+            transform: Transform::from_translation((RADIUS as f64 * unit_sphere_p).as_vec3()),
+            ..default()
+        });


Putting a random colour here makes it look prettier I think

…erDevice::limits()

…GHT_COMPLEXITY

superdump · 2022-04-06T12:11:10Z

The lighting example is failing on wasm for me on this branch (including when I rebase on main):

panicked at 'wgpu error: Validation Error

Caused by:
    In Device::create_render_pipeline
      note: label = `pbr_opaque_mesh_pipeline`
    Internal error in VERTEX shader: ERROR: 0:41: 'data' : array members of structs must specify a size
ERROR: 0:44: 'data' : array members of structs must specify a size
ERROR: 0:47: 'data' : array members of structs must specify a size

Presumably due to this part of mesh_view_bind_group.wgsl?

struct ClusterLightIndexLists {
    data: array<u32>;
};

With the current state of this PR, I cannot reproduce this running in Chrome. I still have this irritating gamepad issue in Firefox which I have to hack out manually in the JS:

panicked at 'error getting gamepads: JsValue(TypeError: getObject(...).getGamepads is not a function
init/imports.wbg.__wbg_getGamepads_125e8440ea7a98c8/<@http://0.0.0.0:8000/target/wasm_example.js:1071:35
handleError@http://0.0.0.0:8000/target/wasm_example.js:260:18
init/imports.wbg.__wbg_getGamepads_125e8440ea7a98c8@http://0.0.0.0:8000/target/wasm_example.js:1070:74
web_sys::features::gen_Navigator::Navigator::get_gamepads::h38f15820bcd54585@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[17624]:0x9e4430
gilrs_core::platform::platform::gamepad::Gilrs::next_event::ha502708d74993467@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[659]:0x24484f
gilrs_core::Gilrs::next_event::h1def7b279c39043e@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[25101]:0xa1cd38
gilrs::gamepad::Gilrs::next_event_priv::h581e4e4191b9fd6b@http://0.0.0.0:8000/target/wasm_example_bg.wasm:wasm-function[719]:0x272637
gilrs::gamepad::Gilrs::next_event::h795f889b0d271268@http:/…
wasm_example.js:420:21

superdump · 2022-04-06T12:23:47Z

I made an issue for the gamepad API problem here: #4431

cart · 2022-04-06T19:20:48Z

With the current state of this PR, I cannot reproduce this running in Chrome. I still have this irritating gamepad issue in Firefox which I have to hack out manually in the JS:

Weeeird I just tried the latest changes and those also work for me (in firefox, which is where it was failing before). Idk if that was a "cart's system had something weird" thing or a "it used to be broken and now its not" thing.

Regardless, it works now so I guess we're good :)

cart · 2022-04-06T19:23:31Z

bors r+

# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>

bors · 2022-04-06T20:31:13Z

Timed out.

cart · 2022-04-06T20:47:26Z

bors r+

# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>

bors · 2022-04-06T22:11:01Z

This PR was included in a batch that timed out, it will be automatically retried

# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>

bors · 2022-04-07T00:32:43Z

Timed out.

mockersf · 2022-04-07T15:50:18Z

bors r+

# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>

bors · 2022-04-07T16:39:14Z

Pull request successfully merged into main.

Build succeeded:

# Objective - Fixes #4234 - Fixes #4473 - Built on top of #3989 - Improve performance of `assign_lights_to_clusters` ## Solution - Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs - Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf - Adapt to also support orthographic projections - Add `many_lights -- orthographic` for testing many lights using an orthographic projection ## Results - `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system. --- ## Changelog - Changed: Improved cluster light assignment performance Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>

# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes bevyengine#3605 - Based on top of bevyengine#4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>

# Objective - Fixes bevyengine#4234 - Fixes bevyengine#4473 - Built on top of bevyengine#3989 - Improve performance of `assign_lights_to_clusters` ## Solution - Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs - Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf - Adapt to also support orthographic projections - Add `many_lights -- orthographic` for testing many lights using an orthographic projection ## Results - `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system. --- ## Changelog - Changed: Improved cluster light assignment performance Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>

# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes bevyengine#3605 - Based on top of bevyengine#4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>

# Objective - Fixes bevyengine#4234 - Fixes bevyengine#4473 - Built on top of bevyengine#3989 - Improve performance of `assign_lights_to_clusters` ## Solution - Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs - Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf - Adapt to also support orthographic projections - Add `many_lights -- orthographic` for testing many lights using an orthographic projection ## Results - `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system. --- ## Changelog - Changed: Improved cluster light assignment performance Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>

github-actions bot added the S-Needs-Triage This issue needs to be labelled label Feb 19, 2022

superdump mentioned this pull request Feb 19, 2022

Use storage buffers for clustered forward bindings for 'unlimited' point lights #3605

Closed

1 task

superdump self-assigned this Feb 19, 2022

alice-i-cecile added A-Rendering Drawing game state to the screen C-Bug An unexpected or incorrect behavior C-Performance A change motivated by improving speed, memory usage or compile times and removed S-Needs-Triage This issue needs to be labelled labels Feb 20, 2022

superdump force-pushed the clustered-forward-storage-buffers branch from 9d9b13d to a2fc835 Compare February 25, 2022 15:32

superdump marked this pull request as ready for review February 25, 2022 15:33

superdump force-pushed the clustered-forward-storage-buffers branch from b71569c to 305a3d0 Compare March 1, 2022 23:58

superdump force-pushed the clustered-forward-storage-buffers branch from ccf9af8 to ffad065 Compare March 24, 2022 09:54

superdump changed the title ~~Use storage buffers for clustered forward bindings~~ Use storage buffers for clustered forward point lights Mar 24, 2022

superdump requested review from james7132, mockersf and alice-i-cecile March 24, 2022 10:38

alice-i-cecile reviewed Mar 24, 2022

View reviewed changes

crates/bevy_pbr/src/lib.rs Outdated Show resolved Hide resolved

alice-i-cecile reviewed Mar 24, 2022

View reviewed changes

crates/bevy_pbr/src/light.rs Outdated Show resolved Hide resolved

alice-i-cecile reviewed Mar 24, 2022

View reviewed changes

crates/bevy_pbr/src/light.rs Outdated Show resolved Hide resolved

alice-i-cecile reviewed Mar 24, 2022

View reviewed changes

examples/README.md Outdated Show resolved Hide resolved

alice-i-cecile requested changes Mar 24, 2022

View reviewed changes

superdump force-pushed the clustered-forward-storage-buffers branch from 2891f2f to 132f203 Compare March 25, 2022 11:07

superdump added this to the Bevy 0.7 milestone Mar 25, 2022

alice-i-cecile approved these changes Mar 25, 2022

View reviewed changes

jakobhellermann approved these changes Mar 25, 2022

View reviewed changes

crates/bevy_pbr/src/render/light.rs Show resolved Hide resolved

crates/bevy_pbr/src/render/light.rs Outdated Show resolved Hide resolved

crates/bevy_render/src/render_resource/storage_buffer.rs Outdated Show resolved Hide resolved

superdump mentioned this pull request Mar 27, 2022

[Merged by Bors] - Faster assign lights to clusters #4345

Closed

mockersf reviewed Mar 27, 2022

View reviewed changes

superdump and others added 6 commits April 6, 2022 13:49

bevy_pbr: Add SupportedBindingTypes that can be constructed from Rend…

e64640b

…erDevice::limits()

examples: Move many_lights from 3d to stress_tests

61872df

bevy_pbr: Use Vec::append instead of a for loop and Vec::drain

1d3f878

Move storage buffer detection to RenderDevice

8c9a8a5

Move storage buffer specialization out of key and into "pipeline type"

04c6124

bevy_pbr: Fix point light count in CLUSTERED_FORWARD_DEBUG_CLUSTER_LI…

3b13f86

…GHT_COMPLEXITY

superdump force-pushed the clustered-forward-storage-buffers branch from 33bdd28 to 9da6d5d Compare April 6, 2022 12:09

bors bot changed the title ~~Use storage buffers for clustered forward point lights~~ [Merged by Bors] - Use storage buffers for clustered forward point lights Apr 7, 2022

bors bot closed this Apr 7, 2022

Shatur mentioned this pull request Apr 14, 2022

headless_defaults example panics again #4440

Closed

Shatur mentioned this pull request Apr 17, 2022

Bevy 0.7 cryscan/bevy-hikari#6

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Merged by Bors] - Use storage buffers for clustered forward point lights #3989

[Merged by Bors] - Use storage buffers for clustered forward point lights #3989

superdump commented Feb 19, 2022 •

edited

Loading

superdump commented Feb 19, 2022

superdump commented Feb 27, 2022

superdump commented Mar 2, 2022

alice-i-cecile left a comment

alice-i-cecile left a comment

jakobhellermann left a comment

mockersf Mar 27, 2022

mockersf Mar 27, 2022

superdump commented Apr 6, 2022

superdump commented Apr 6, 2022

cart commented Apr 6, 2022

cart commented Apr 6, 2022

bors bot commented Apr 6, 2022

cart commented Apr 6, 2022

bors bot commented Apr 6, 2022

bors bot commented Apr 7, 2022

mockersf commented Apr 7, 2022

bors bot commented Apr 7, 2022

[Merged by Bors] - Use storage buffers for clustered forward point lights #3989

[Merged by Bors] - Use storage buffers for clustered forward point lights #3989

Conversation

superdump commented Feb 19, 2022 • edited Loading

Objective

Solution

Context on some decisions / open questions

superdump commented Feb 19, 2022

superdump commented Feb 27, 2022

superdump commented Mar 2, 2022

alice-i-cecile left a comment

Choose a reason for hiding this comment

alice-i-cecile left a comment

Choose a reason for hiding this comment

jakobhellermann left a comment

Choose a reason for hiding this comment

mockersf Mar 27, 2022

Choose a reason for hiding this comment

mockersf Mar 27, 2022

Choose a reason for hiding this comment

superdump commented Apr 6, 2022

superdump commented Apr 6, 2022

cart commented Apr 6, 2022

cart commented Apr 6, 2022

bors bot commented Apr 6, 2022

cart commented Apr 6, 2022

bors bot commented Apr 6, 2022

bors bot commented Apr 7, 2022

mockersf commented Apr 7, 2022

bors bot commented Apr 7, 2022

superdump commented Feb 19, 2022 •

edited

Loading