-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Merged by Bors] - Use storage buffers for clustered forward point lights #3989
[Merged by Bors] - Use storage buffers for clustered forward point lights #3989
Conversation
Apparently storage buffers cannot have 0 size so that case needs taking care of. |
9d9b13d
to
a2fc835
Compare
I'm going to rework the StorageVec a little. |
b71569c
to
305a3d0
Compare
This is now based on top of: |
ccf9af8
to
ffad065
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few notes on small code quality issues.
2891f2f
to
132f203
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great; I'm happy with this now. I think the few little tweaks you made strike a nice balance between avoiding over-engineering and making sure that we actually keep changes synchronized.
Obviously, I don't have in-depth feedback about the details of the rendering strategy, but I'm happy with the code quality and examples.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good as far as I can tell.
// NOTE: This needs to run after prepare_lights. As prepare_lights is an exclusive system, | ||
// just adding it to the non-exclusive systems in the Prepare stage means it runs after | ||
// prepare_lights. | ||
render::prepare_clusters.label(RenderLightSystems::PrepareClusters), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this label shouldn't be needed since #4224
commands.spawn_bundle(PointLightBundle { | ||
point_light: PointLight { | ||
range: LIGHT_RADIUS, | ||
intensity: LIGHT_INTENSITY, | ||
..default() | ||
}, | ||
transform: Transform::from_translation((RADIUS as f64 * unit_sphere_p).as_vec3()), | ||
..default() | ||
}); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Putting a random colour here makes it look prettier I think
…erDevice::limits()
33bdd28
to
9da6d5d
Compare
With the current state of this PR, I cannot reproduce this running in Chrome. I still have this irritating gamepad issue in Firefox which I have to hack out manually in the JS:
|
I made an issue for the gamepad API problem here: #4431 |
Weeeird I just tried the latest changes and those also work for me (in firefox, which is where it was failing before). Idk if that was a "cart's system had something weird" thing or a "it used to be broken and now its not" thing. Regardless, it works now so I guess we're good :) |
bors r+ |
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Timed out. |
bors r+ |
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
This PR was included in a batch that timed out, it will be automatically retried |
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Timed out. |
bors r+ |
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective - Fixes #4234 - Fixes #4473 - Built on top of #3989 - Improve performance of `assign_lights_to_clusters` ## Solution - Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs - Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf - Adapt to also support orthographic projections - Add `many_lights -- orthographic` for testing many lights using an orthographic projection ## Results - `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system. --- ## Changelog - Changed: Improved cluster light assignment performance Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes bevyengine#3605 - Based on top of bevyengine#4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective - Fixes bevyengine#4234 - Fixes bevyengine#4473 - Built on top of bevyengine#3989 - Improve performance of `assign_lights_to_clusters` ## Solution - Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs - Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf - Adapt to also support orthographic projections - Add `many_lights -- orthographic` for testing many lights using an orthographic projection ## Results - `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system. --- ## Changelog - Changed: Improved cluster light assignment performance Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes bevyengine#3605 - Based on top of bevyengine#4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
# Objective - Fixes bevyengine#4234 - Fixes bevyengine#4473 - Built on top of bevyengine#3989 - Improve performance of `assign_lights_to_clusters` ## Solution - Remove the OBB-based cluster light assignment algorithm and calculation of view space AABBs - Implement the 'iterative sphere refinement' algorithm used in Just Cause 3 by Emil Persson as documented in the Siggraph 2015 Practical Clustered Shading talk by Persson, on pages 42-44 http://newq.net/dl/pub/s2015_practical.pdf - Adapt to also support orthographic projections - Add `many_lights -- orthographic` for testing many lights using an orthographic projection ## Results - `assign_lights_to_clusters` in `many_lights` before this PR on an M1 Max over 1500 frames had a median execution time of 1.71ms. With this PR it is 1.51ms, a reduction of 0.2ms or 11.7% for this system. --- ## Changelog - Changed: Improved cluster light assignment performance Co-authored-by: robtfm <50659922+robtfm@users.noreply.github.com> Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Objective
UniformVec
#4079This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR.
Solution
RenderDevice
to theMaterial
andSpecializedMaterial
trait::key()
functions to allow setting flags on the keys depending on feature/limit availabilityGpuPointLights
andViewClusterBuffers
into enums containingUniformVec
andStorageBuffer
variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.Context on some decisions / open questions
max_storage_buffers_per_shader_stage >= 3
as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately.ViewClusterBuffers
andViewClusterBindings
be merged, duplicating the count variables into the enum variants?