-
-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use storage buffers for clustered forward bindings for 'unlimited' point lights #3605
Closed
1 task done
Labels
A-Rendering
Drawing game state to the screen
C-Code-Quality
A section of code that is hard to understand or change
C-Usability
A targeted quality-of-life change that makes Bevy easier to use
Comments
superdump
changed the title
Use storage buffers for clustered forward bindings for 'unlimited' point lights - https://github.com/superdump/bevy/tree/clustered-forward-storage-buffers
Use storage buffers for clustered forward bindings for 'unlimited' point lights
Jan 9, 2022
alice-i-cecile
added
A-Rendering
Drawing game state to the screen
C-Code-Quality
A section of code that is hard to understand or change
C-Usability
A targeted quality-of-life change that makes Bevy easier to use
labels
Jan 9, 2022
One thing to note with storage buffers is you will have to add a special case for zero lights: buffers can't have a size of zero and bindings can't be size zero either. I've gotten around this by adding a zeroed version of the struct I need in the storage buffer if I don't have any for this draw. |
bors bot
pushed a commit
that referenced
this issue
Apr 6, 2022
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
bors bot
pushed a commit
that referenced
this issue
Apr 6, 2022
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
bors bot
pushed a commit
that referenced
this issue
Apr 6, 2022
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes #3605 - Based on top of #4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Repository owner
moved this from In Review
to Done
in Rendering
Apr 7, 2022
aevyrie
pushed a commit
to aevyrie/bevy
that referenced
this issue
Jun 7, 2022
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes bevyengine#3605 - Based on top of bevyengine#4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
ItsDoot
pushed a commit
to ItsDoot/bevy
that referenced
this issue
Feb 1, 2023
# Objective - Make use of storage buffers, where they are available, for clustered forward bindings to support far more point lights in a scene - Fixes bevyengine#3605 - Based on top of bevyengine#4079 This branch on an M1 Max can keep 60fps with about 2150 point lights of radius 1m in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million. That said, I think any further optimisations should probably be done in a separate PR. ## Solution - Add `RenderDevice` to the `Material` and `SpecializedMaterial` trait `::key()` functions to allow setting flags on the keys depending on feature/limit availability - Make `GpuPointLights` and `ViewClusterBuffers` into enums containing `UniformVec` and `StorageBuffer` variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time. - Appropriate shader defs in the shader code to handle the two cases ## Context on some decisions / open questions - I'm using `max_storage_buffers_per_shader_stage >= 3` as a check to see if storage buffers are supported. I was thinking about diving into 'binding resource management' but it feels like we don't have enough use cases to understand the problem yet, and it is mostly a separate concern to this PR, so I think it should be handled separately. - Should `ViewClusterBuffers` and `ViewClusterBindings` be merged, duplicating the count variables into the enum variants? Co-authored-by: Carter Anderson <mcanders1@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-Rendering
Drawing game state to the screen
C-Code-Quality
A section of code that is hard to understand or change
C-Usability
A targeted quality-of-life change that makes Bevy easier to use
Description
In order to support 'unlimited' (practically just less-limited) point lights in a scene, we should leverage storage buffers wherever they are available, which should be WebGPU + native, just not WebGL2.
This this branch on an M1 Max I could keep 60fps with about 2150 point lights in the Sponza scene where I've been testing. The bottleneck is mostly assigning lights to clusters which grows faster than linearly (I think 1000 lights was about 1.5ms and 5000 was 7.5ms). I have seen papers and presentations leveraging compute shaders that can get this up to over 1 million, and there are probably some optimisations that can be done CPU-side too.
Solution
UniformVec
and modify to make aStorageVec
RenderDevice
to theMaterial
andSpecializedMaterial
trait::key()
functions to allow setting flags on the keys depending on feature/limit availabilityGpuPointLights
andViewClusterBuffers
into enums containing UniformVec and StorageVec variants. Implement the necessary API on them to make usage the same for both cases, and the only difference is at initialisation time.Open Questions
How should we detect storage buffer support?max_storage_buffers_per_shader_stage >= 3
but this feels like a hack as it knows nothing about any other storage buffers used by any other code. I am leaning toward just checking for> 0
and then letting wgpu validation complain whenever too many are used... though we should check the limits for WebGPU and probably try to make sure we use fewer than this if possible for the sake of compatibility and leaving space for custom shader modifications to use more.WgpuOptions
as an argument? Should it be part ofspecialize()
? Both? Neither?specialize()
could be sufficient rather than proxying the logic through the key. However, I think there will be cases where material keys may need to depend on information fromWgpuOptions
. Now that I think about it some more while writing this, if we wanted to implement some form of batching based on specialization and material, then we would need to group based on the knowledge that the pipeline specialization is the same, and that the material handle is the same. If the pipeline specialization key contains all the information necessary then it alone can be used for the purpose of batching to try to minimise pipeline switching. This at least makes me think that no other information than the prepared asset and key should be passed tospecialize()
that can impact the pipeline configuration.RenderDevice
has been added to the material trait key() functions as an argument to allow configuration of custom behaviours depending on wgpu limits/features. This feels like the right approach to me to retain that the key is the only source of information for specialisation.ViewClusterBuffers
andViewClusterBindings
?The text was updated successfully, but these errors were encountered: