Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Directly copy data into uniform buffers #8340

Closed
wants to merge 8 commits into from

Conversation

james7132
Copy link
Member

@james7132 james7132 commented Apr 9, 2023

Objective

Fixes #8307. Partially addresses #4642. As seen in #8284, we're actually copying data twice in Prepare stage systems. Once into a CPU-side intermediate scratch buffer, and once again into a mapped buffer. This is inefficient and effectively doubles the time spent and memory allocated to run these systems.

Solution

Remove the scratch buffer entirely and use wgpu::Queue::write_buffer_with to directly write data into mapped buffers.

Separately, this also directly uses wgpu::Limits::min_uniform_buffer_offset_alignment to set up the alignment when writing to the buffers. Partially addressing the issue raised in #4642.

TODO


Changelog

Added: DynamicUniformBuffer::get_writer
Added: DynamicUniformBufferWriter
Removed: DynamicUniformBuffer::clear
Removed: DynamicUniformBuffer::push
Removed: DynamicUniformBuffer::write_buffer

Migration Guide

TODO

@james7132 james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Apr 9, 2023
@james7132 james7132 requested a review from superdump April 9, 2023 22:43
buffer: Option<Buffer>,
label: Option<String>,
changed: bool,
buffer_usage: BufferUsages,
_marker: PhantomData<fn() -> T>,
}

pub struct DynamicUniformBufferWriter<'a, T> {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TODO: Need to document this type and it's relationship with DynamicUniformBuffer.

@github-actions
Copy link
Contributor

github-actions bot commented Apr 9, 2023

Example alien_cake_addict failed to run, please try running it locally and check the result.

@alice-i-cecile alice-i-cecile added this to the 0.11 milestone Apr 10, 2023
@james7132 james7132 changed the title Directly copy data into mapped uniform buffers Directly copy data into uniform buffers Apr 10, 2023
@github-actions
Copy link
Contributor

Example alien_cake_addict failed to run, please try running it locally and check the result.

@james7132
Copy link
Member Author

Got this working normally, but there's actually a slight CPU time performance regression. Not sure where it's coming from.

This should, however, still be a win in terms of allocated memory both on the CPU and GPU due to using the alignment defined by wgpu::Limits.

@james7132
Copy link
Member Author

Turns out that Queue::write_buffer_with allocates a staging buffer, which sort of defeats the purpose of switching to it over keeping a persistent Vec around. This additional allocation and clearing probably is also why there's a slight performance regression.

Apparently wgpu::util::StagingBelt may be able to alleviate this issue by pooling staging buffers.

Reverting this to what we have now, but keeping the Limits based constructor.

@JMS55 JMS55 modified the milestones: 0.11, 0.12 Jun 11, 2023
@B-head B-head mentioned this pull request Jul 4, 2023
8 tasks
@james7132
Copy link
Member Author

Closing this in favor of #9865, as it's no longer possible to update this cleanly without breaking the changes in #8204.

@james7132 james7132 closed this Sep 20, 2023
github-merge-queue bot pushed a commit that referenced this pull request Sep 25, 2023
# Objective
This is a minimally disruptive version of #8340. I attempted to update
it, but failed due to the scope of the changes added in #8204.

Fixes #8307. Partially addresses #4642. As seen in
#8284, we're actually copying
data twice in Prepare stage systems. Once into a CPU-side intermediate
scratch buffer, and once again into a mapped buffer. This is inefficient
and effectively doubles the time spent and memory allocated to run these
systems.

## Solution
Skip the scratch buffer entirely and use
`wgpu::Queue::write_buffer_with` to directly write data into mapped
buffers.

Separately, this also directly uses
`wgpu::Limits::min_uniform_buffer_offset_alignment` to set up the
alignment when writing to the buffers. Partially addressing the issue
raised in #4642.

Storage buffers and the abstractions built on top of
`DynamicUniformBuffer` will need to come in followup PRs.

This may not have a noticeable performance difference in this PR, as the
only first-party systems affected by this are view related, and likely
are not going to be particularly heavy.

---

## Changelog
Added: `DynamicUniformBuffer::get_writer`.
Added: `DynamicUniformBufferWriter`.
rdrpenguin04 pushed a commit to rdrpenguin04/bevy that referenced this pull request Jan 9, 2024
# Objective
This is a minimally disruptive version of bevyengine#8340. I attempted to update
it, but failed due to the scope of the changes added in bevyengine#8204.

Fixes bevyengine#8307. Partially addresses bevyengine#4642. As seen in
bevyengine#8284, we're actually copying
data twice in Prepare stage systems. Once into a CPU-side intermediate
scratch buffer, and once again into a mapped buffer. This is inefficient
and effectively doubles the time spent and memory allocated to run these
systems.

## Solution
Skip the scratch buffer entirely and use
`wgpu::Queue::write_buffer_with` to directly write data into mapped
buffers.

Separately, this also directly uses
`wgpu::Limits::min_uniform_buffer_offset_alignment` to set up the
alignment when writing to the buffers. Partially addressing the issue
raised in bevyengine#4642.

Storage buffers and the abstractions built on top of
`DynamicUniformBuffer` will need to come in followup PRs.

This may not have a noticeable performance difference in this PR, as the
only first-party systems affected by this are view related, and likely
are not going to be particularly heavy.

---

## Changelog
Added: `DynamicUniformBuffer::get_writer`.
Added: `DynamicUniformBufferWriter`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Default alignment for DynamicStorageBuffer is incorrect. Direct copy API for Buffer wrappers
3 participants