Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add GpuArrayBuffer and BatchedUniformBuffer #8204

Merged
merged 27 commits into from
Jul 21, 2023
Merged
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
40e9985
Add GpuList and BatchedUniformBuffer
JMS55 Mar 25, 2023
c11d4f2
Clippy lint
JMS55 Mar 25, 2023
8688472
Add GpuList and BatchedUniformBuffer
superdump Mar 25, 2023
1c94dcd
Merge branch 'gpu-list' of https://github.com/JMS55/bevy into gpu-list
JMS55 Mar 31, 2023
dc39eb5
Update crates/bevy_render/src/render_resource/batched_uniform_buffer.rs
JMS55 Apr 24, 2023
f88ccc4
Update crates/bevy_render/src/render_resource/gpu_list.rs
JMS55 Apr 24, 2023
cdbbad2
Update crates/bevy_render/src/render_resource/gpu_list.rs
JMS55 Apr 24, 2023
94a58ec
Update crates/bevy_render/src/render_resource/gpu_list.rs
JMS55 Apr 24, 2023
c5357de
Update crates/bevy_render/src/render_resource/gpu_list.rs
JMS55 Apr 24, 2023
70254d4
Update crates/bevy_render/src/render_resource/gpu_list.rs
JMS55 Apr 24, 2023
7f7101d
Update crates/bevy_render/src/render_resource/gpu_list.rs
JMS55 Apr 24, 2023
66e48d7
Update crates/bevy_render/src/render_resource/storage_buffer.rs
JMS55 Apr 24, 2023
00fb9b5
Update crates/bevy_render/src/render_resource/storage_buffer.rs
JMS55 Apr 24, 2023
3144b42
Update crates/bevy_render/src/render_resource/uniform_buffer.rs
JMS55 Apr 24, 2023
658568a
Update crates/bevy_render/src/render_resource/gpu_list.rs
superdump May 1, 2023
eb93067
Update crates/bevy_render/src/render_resource/gpu_list.rs
superdump May 1, 2023
1d10195
Merge branch 'main' into gpu-list-main
superdump May 1, 2023
68f7a8e
Fixes to buffer sizes
superdump May 1, 2023
05c723f
Fix after merge from main
superdump May 1, 2023
37fdbd4
Add credit to Teoxoy for MaxCapacityArray
superdump May 1, 2023
f5ca55d
Clarify logic around max_storage_buffers_per_shader_stage
superdump May 1, 2023
7765c86
Merge commit '1e73312e49fc90479d8c9c645ffd85a59233067c' into gpu-list
JMS55 Jun 26, 2023
9f4f027
Lower MAX_REASONABLE_UNIFORM_BUFFER_BINDING_SIZE on WebGL2
JMS55 Jun 26, 2023
973b8bb
Rename GpuList -> GpuArrayBuffer
JMS55 Jun 26, 2023
d65cab4
Update crates/bevy_render/src/render_resource/gpu_array_buffer.rs
JMS55 Jun 26, 2023
499e3a2
Add internal documentation of BatchedUniformBuffer members
superdump Jul 21, 2023
df66b85
BatchedUniformBuffer: Optimize rounding code
konsolas Jul 21, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 55 additions & 0 deletions crates/bevy_render/src/gpu_component_list.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,55 @@
use crate::{
render_resource::{GpuList, GpuListable},
renderer::{RenderDevice, RenderQueue},
Render, RenderApp, RenderSet,
};
use bevy_app::{App, Plugin};
use bevy_ecs::{
prelude::{Component, Entity},
schedule::IntoSystemConfigs,
system::{Commands, Query, Res, ResMut},
};
use std::marker::PhantomData;

/// This plugin prepares the components of the corresponding type for the GPU
/// by storing them in a [`GpuList`].
pub struct GpuComponentListPlugin<C: Component + GpuListable>(PhantomData<C>);

impl<C: Component + GpuListable> Plugin for GpuComponentListPlugin<C> {
fn build(&self, app: &mut App) {
if let Ok(render_app) = app.get_sub_app_mut(RenderApp) {
render_app
.insert_resource(GpuList::<C>::new(
render_app.world.resource::<RenderDevice>(),
))
.add_systems(
Render,
prepare_gpu_component_lists::<C>.in_set(RenderSet::Prepare),
);
}
}
}

impl<C: Component + GpuListable> Default for GpuComponentListPlugin<C> {
fn default() -> Self {
Self(PhantomData::<C>)
}
}

fn prepare_gpu_component_lists<C: Component + GpuListable>(
mut commands: Commands,
render_device: Res<RenderDevice>,
render_queue: Res<RenderQueue>,
mut gpu_list: ResMut<GpuList<C>>,
components: Query<(Entity, &C)>,
) {
gpu_list.clear();

let entities = components
.iter()
.map(|(entity, component)| (entity, gpu_list.push(component.clone())))
.collect::<Vec<_>>();
commands.insert_or_spawn_batch(entities);

gpu_list.write_buffer(&render_device, &render_queue);
}
1 change: 1 addition & 0 deletions crates/bevy_render/src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ pub mod extract_component;
mod extract_param;
pub mod extract_resource;
pub mod globals;
pub mod gpu_component_list;
pub mod mesh;
pub mod pipelined_rendering;
pub mod primitives;
Expand Down
125 changes: 125 additions & 0 deletions crates/bevy_render/src/render_resource/batched_uniform_buffer.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,125 @@
use super::{GpuListIndex, GpuListable};
use crate::{
render_resource::DynamicUniformBuffer,
renderer::{RenderDevice, RenderQueue},
};
use encase::{
private::{ArrayMetadata, BufferMut, Metadata, RuntimeSizedArray, WriteInto, Writer},
ShaderType,
};
use std::{marker::PhantomData, num::NonZeroU64};
use wgpu::{BindingResource, Limits};

// 1MB else we will make really large arrays on macOS which reports very large
// `max_uniform_buffer_binding_size`. On macOS this ends up being the minimum
// size of the uniform buffer as well as the size of each chunk of data at a
// dynamic offset.
const MAX_REASONABLE_UNIFORM_BUFFER_BINDING_SIZE: u32 = 1 << 20;

/// Similar to [`DynamicUniformBuffer`], except every N elements (depending on size)
/// are grouped into a batch as an `array<T, N>` in WGSL.
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
pub struct BatchedUniformBuffer<T: GpuListable> {
uniforms: DynamicUniformBuffer<MaxCapacityArray<Vec<T>>>,
temp: MaxCapacityArray<Vec<T>>,
superdump marked this conversation as resolved.
Show resolved Hide resolved
superdump marked this conversation as resolved.
Show resolved Hide resolved
current_offset: u32,
dynamic_offset_alignment: u32,
}

impl<T: GpuListable> BatchedUniformBuffer<T> {
pub fn batch_size(limits: &Limits) -> usize {
(limits
.max_uniform_buffer_binding_size
.min(MAX_REASONABLE_UNIFORM_BUFFER_BINDING_SIZE) as u64
/ T::min_size().get()) as usize
}

pub fn new(limits: &Limits) -> Self {
let capacity = Self::batch_size(limits);
let alignment = limits.min_uniform_buffer_offset_alignment;

Self {
uniforms: DynamicUniformBuffer::new_with_alignment(alignment as u64),
temp: MaxCapacityArray(Vec::with_capacity(capacity), capacity),
current_offset: 0,
dynamic_offset_alignment: alignment,
}
}

#[inline]
pub fn size(&self) -> NonZeroU64 {
self.temp.size()
}

pub fn clear(&mut self) {
self.uniforms.clear();
self.current_offset = 0;
self.temp.0.clear();
}

pub fn push(&mut self, component: T) -> GpuListIndex<T> {
let result = GpuListIndex {
index: self.temp.0.len() as u32,
dynamic_offset: Some(self.current_offset),
element_type: PhantomData,
};
self.temp.0.push(component);
if self.temp.0.len() == self.temp.1 {
self.flush();
}
result
}

pub fn flush(&mut self) {
self.uniforms.push(self.temp.clone());

self.current_offset +=
round_up(self.temp.size().get(), self.dynamic_offset_alignment as u64) as u32;
superdump marked this conversation as resolved.
Show resolved Hide resolved

self.temp.0.clear();
}

pub fn write_buffer(&mut self, device: &RenderDevice, queue: &RenderQueue) {
if !self.temp.0.is_empty() {
self.flush();
}
self.uniforms.write_buffer(device, queue);
}

#[inline]
pub fn binding(&self) -> Option<BindingResource> {
self.uniforms.binding()
}
superdump marked this conversation as resolved.
Show resolved Hide resolved
}

// ----------------------------------------------------------------------------

#[derive(Clone, Copy, Debug, Default, PartialEq, Eq, PartialOrd, Ord)]
struct MaxCapacityArray<T>(T, usize);

impl<T> ShaderType for MaxCapacityArray<T>
where
T: ShaderType<ExtraMetadata = ArrayMetadata>,
{
type ExtraMetadata = ArrayMetadata;

const METADATA: Metadata<Self::ExtraMetadata> = T::METADATA;

fn size(&self) -> ::core::num::NonZeroU64 {
Self::METADATA.stride().mul(self.1.max(1) as u64).0
}
}

impl<T> WriteInto for MaxCapacityArray<T>
where
T: WriteInto + RuntimeSizedArray,
{
fn write_into<B: BufferMut>(&self, writer: &mut Writer<B>) {
debug_assert!(self.0.len() <= self.1);
self.0.write_into(writer);
}
}
Comment on lines +128 to +152
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code was written by @teoxoy so we need to add credit for them to the commit that introduces it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! If this is ready for production I can merge the branch in encase and do a release.
Let me know!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It works fine for us. :) There is that other aspect of being able to start the next dynamic offset binding of a uniform buffer at the next dynamic offset alignment if not all space is used, and ensure that the final binding is full-size. I don't know if that would clash with this and basically immediately deprecate this approach. If so maybe you'd prefer that we use a solution in bevy for what we need and add the long-term and more flexible solution to encase when someone gets to it. What do you think?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I won't block the PR on this. We can figure it out over time. :)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to rebase to give credit on the original commit but due to merges it was a pain. I instead added a comment and a co-authored-by so that when the squash merge is done, the credit will follow along with it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, we can further iterate and see what we come up with. Thanks for the credit!


#[inline]
fn round_up(v: u64, a: u64) -> u64 {
((v + a - 1) / a) * a
}
2 changes: 2 additions & 0 deletions crates/bevy_render/src/render_resource/buffer_vec.rs
Original file line number Diff line number Diff line change
Expand Up @@ -21,9 +21,11 @@ use wgpu::BufferUsages;
/// from system RAM to VRAM.
///
/// Other options for storing GPU-accessible data are:
/// * [`StorageBuffer`](crate::render_resource::StorageBuffer)
/// * [`DynamicStorageBuffer`](crate::render_resource::DynamicStorageBuffer)
/// * [`UniformBuffer`](crate::render_resource::UniformBuffer)
/// * [`DynamicUniformBuffer`](crate::render_resource::DynamicUniformBuffer)
/// * [`GpuList`](crate::render_resource::GpuList)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`Texture`](crate::render_resource::Texture)
pub struct BufferVec<T: Pod> {
Expand Down
127 changes: 127 additions & 0 deletions crates/bevy_render/src/render_resource/gpu_list.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,127 @@
use super::StorageBuffer;
use crate::{
render_resource::batched_uniform_buffer::BatchedUniformBuffer,
renderer::{RenderDevice, RenderQueue},
};
use bevy_ecs::{prelude::Component, system::Resource};
use encase::{private::WriteInto, ShaderSize, ShaderType};
use std::{marker::PhantomData, mem};
use wgpu::{BindGroupLayoutEntry, BindingResource, BindingType, BufferBindingType, ShaderStages};

/// Trait for types able to go in a [`GpuList`].
pub trait GpuListable: ShaderType + ShaderSize + WriteInto + Clone {}
impl<T: ShaderType + ShaderSize + WriteInto + Clone> GpuListable for T {}

/// Stores a list of elements to be transferred to the GPU and made accessible to shaders as a read-only array.
///
/// On platforms that support storage buffers, this is equivalent to [`StorageBuffer<Vec<T>>`].
/// Otherwise, this falls back to batched uniforms.
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
///
/// Other options for storing GPU-accessible data are:
/// * [`StorageBuffer`](crate::render_resource::StorageBuffer)
/// * [`DynamicStorageBuffer`](crate::render_resource::DynamicStorageBuffer)
/// * [`UniformBuffer`](crate::render_resource::UniformBuffer)
/// * [`DynamicUniformBuffer`](crate::render_resource::DynamicUniformBuffer)
/// * [`GpuList`](crate::render_resource::GpuList)
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`Texture`](crate::render_resource::Texture)
#[derive(Resource)]
pub enum GpuList<T: GpuListable> {
Uniform(BatchedUniformBuffer<T>),
Storage((StorageBuffer<Vec<T>>, Vec<T>)),
}

impl<T: GpuListable> GpuList<T> {
pub fn new(device: &RenderDevice) -> Self {
let limits = device.limits();
if limits.max_storage_buffers_per_shader_stage < 3 {
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
GpuList::Uniform(BatchedUniformBuffer::new(&limits))
} else {
GpuList::Storage((StorageBuffer::default(), Vec::new()))
}
}

pub fn clear(&mut self) {
match self {
GpuList::Uniform(buffer) => buffer.clear(),
GpuList::Storage((_, buffer)) => buffer.clear(),
}
}

pub fn push(&mut self, value: T) -> GpuListIndex<T> {
match self {
GpuList::Uniform(buffer) => buffer.push(value),
GpuList::Storage((_, buffer)) => {
let index = buffer.len() as u32;
buffer.push(value);
GpuListIndex {
index,
dynamic_offset: None,
element_type: PhantomData,
}
}
}
}

pub fn write_buffer(&mut self, device: &RenderDevice, queue: &RenderQueue) {
match self {
GpuList::Uniform(buffer) => buffer.write_buffer(device, queue),
GpuList::Storage((buffer, vec)) => {
buffer.set(mem::take(vec));
buffer.write_buffer(device, queue);
}
}
}

pub fn binding_layout(
binding: u32,
visibility: ShaderStages,
device: &RenderDevice,
) -> BindGroupLayoutEntry {
BindGroupLayoutEntry {
binding,
visibility,
ty: if device.limits().max_storage_buffers_per_shader_stage < 3 {
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
BindingType::Buffer {
ty: BufferBindingType::Uniform,
has_dynamic_offset: true,
min_binding_size: Some(T::min_size()),
superdump marked this conversation as resolved.
Show resolved Hide resolved
}
} else {
BindingType::Buffer {
ty: BufferBindingType::Storage { read_only: true },
has_dynamic_offset: false,
min_binding_size: Some(T::min_size()),
}
},
count: None,
}
}

pub fn binding(&self) -> Option<BindingResource> {
match self {
GpuList::Uniform(buffer) => buffer.binding(),
GpuList::Storage((buffer, _)) => buffer.binding(),
}
}

pub fn batch_size(device: &RenderDevice) -> Option<u32> {
let limits = device.limits();
if limits.max_storage_buffers_per_shader_stage < 3 {
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
Some(BatchedUniformBuffer::<T>::batch_size(&limits) as u32)
} else {
None
}
}
}

/// An index into a [`GpuList`] for a given element.
#[derive(Component)]
pub struct GpuListIndex<T: GpuListable> {
/// The index to use in a shader on the array.
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
pub index: u32,
/// The dynamic offset to use when binding the list from Rust.
superdump marked this conversation as resolved.
Show resolved Hide resolved
/// Only used on platforms that don't support storage buffers.
pub dynamic_offset: Option<u32>,
pub element_type: PhantomData<T>,
}
3 changes: 3 additions & 0 deletions crates/bevy_render/src/render_resource/mod.rs
Original file line number Diff line number Diff line change
@@ -1,7 +1,9 @@
mod batched_uniform_buffer;
mod bind_group;
mod bind_group_layout;
mod buffer;
mod buffer_vec;
mod gpu_list;
mod pipeline;
mod pipeline_cache;
mod pipeline_specializer;
Expand All @@ -15,6 +17,7 @@ pub use bind_group::*;
pub use bind_group_layout::*;
pub use buffer::*;
pub use buffer_vec::*;
pub use gpu_list::*;
pub use pipeline::*;
pub use pipeline_cache::*;
pub use pipeline_specializer::*;
Expand Down
4 changes: 4 additions & 0 deletions crates/bevy_render/src/render_resource/storage_buffer.rs
Original file line number Diff line number Diff line change
Expand Up @@ -20,9 +20,11 @@ use wgpu::{util::BufferInitDescriptor, BindingResource, BufferBinding, BufferUsa
/// is automatically enforced by this structure.
///
/// Other options for storing GPU-accessible data are:
/// * [`StorageBuffer`](crate::render_resource::StorageBuffer)
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
/// * [`DynamicStorageBuffer`](crate::render_resource::DynamicStorageBuffer)
/// * [`UniformBuffer`](crate::render_resource::UniformBuffer)
/// * [`DynamicUniformBuffer`](crate::render_resource::DynamicUniformBuffer)
/// * [`GpuList`](crate::render_resource::GpuList)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`Texture`](crate::render_resource::Texture)
///
Expand Down Expand Up @@ -153,8 +155,10 @@ impl<T: ShaderType + WriteInto> StorageBuffer<T> {
///
/// Other options for storing GPU-accessible data are:
/// * [`StorageBuffer`](crate::render_resource::StorageBuffer)
/// * [`DynamicStorageBuffer`](crate::render_resource::DynamicStorageBuffer)
JMS55 marked this conversation as resolved.
Show resolved Hide resolved
/// * [`UniformBuffer`](crate::render_resource::UniformBuffer)
/// * [`DynamicUniformBuffer`](crate::render_resource::DynamicUniformBuffer)
/// * [`GpuList`](crate::render_resource::GpuList)
/// * [`BufferVec`](crate::render_resource::BufferVec)
/// * [`Texture`](crate::render_resource::Texture)
///
Expand Down
Loading