Directly copy data into uniform buffers #8340

james7132 · 2023-04-09T22:43:54Z

Objective

Fixes #8307. Partially addresses #4642. As seen in #8284, we're actually copying data twice in Prepare stage systems. Once into a CPU-side intermediate scratch buffer, and once again into a mapped buffer. This is inefficient and effectively doubles the time spent and memory allocated to run these systems.

Solution

Remove the scratch buffer entirely and use wgpu::Queue::write_buffer_with to directly write data into mapped buffers.

Separately, this also directly uses wgpu::Limits::min_uniform_buffer_offset_alignment to set up the alignment when writing to the buffers. Partially addressing the issue raised in #4642.

TODO

Figure out why this is panicking on basic examples.
Implement this for storage buffers. (should resolve Default alignment for DynamicStorageBuffer is incorrect. #5411)
Test against stress tests to ensure this results in the performance improvements we're expecting.
Write migration guide

Changelog

Added: DynamicUniformBuffer::get_writer
Added: DynamicUniformBufferWriter
Removed: DynamicUniformBuffer::clear
Removed: DynamicUniformBuffer::push
Removed: DynamicUniformBuffer::write_buffer

Migration Guide

TODO

james7132 · 2023-04-09T22:45:47Z

crates/bevy_render/src/render_resource/uniform_buffer.rs

    _marker: PhantomData<fn() -> T>,
 }

+pub struct DynamicUniformBufferWriter<'a, T> {


TODO: Need to document this type and it's relationship with DynamicUniformBuffer.

github-actions · 2023-04-09T22:52:52Z

Example alien_cake_addict failed to run, please try running it locally and check the result.

github-actions · 2023-04-10T05:14:17Z

Example alien_cake_addict failed to run, please try running it locally and check the result.

james7132 · 2023-04-10T18:10:54Z

Got this working normally, but there's actually a slight CPU time performance regression. Not sure where it's coming from.

This should, however, still be a win in terms of allocated memory both on the CPU and GPU due to using the alignment defined by wgpu::Limits.

james7132 · 2023-04-18T21:08:12Z

Turns out that Queue::write_buffer_with allocates a staging buffer, which sort of defeats the purpose of switching to it over keeping a persistent Vec around. This additional allocation and clearing probably is also why there's a slight performance regression.

Apparently wgpu::util::StagingBelt may be able to alleviate this issue by pooling staging buffers.

Reverting this to what we have now, but keeping the Limits based constructor.

This reverts commit 05e7954.

james7132 · 2023-09-20T10:10:21Z

Closing this in favor of #9865, as it's no longer possible to update this cleanly without breaking the changes in #8204.

# Objective This is a minimally disruptive version of #8340. I attempted to update it, but failed due to the scope of the changes added in #8204. Fixes #8307. Partially addresses #4642. As seen in #8284, we're actually copying data twice in Prepare stage systems. Once into a CPU-side intermediate scratch buffer, and once again into a mapped buffer. This is inefficient and effectively doubles the time spent and memory allocated to run these systems. ## Solution Skip the scratch buffer entirely and use `wgpu::Queue::write_buffer_with` to directly write data into mapped buffers. Separately, this also directly uses `wgpu::Limits::min_uniform_buffer_offset_alignment` to set up the alignment when writing to the buffers. Partially addressing the issue raised in #4642. Storage buffers and the abstractions built on top of `DynamicUniformBuffer` will need to come in followup PRs. This may not have a noticeable performance difference in this PR, as the only first-party systems affected by this are view related, and likely are not going to be particularly heavy. --- ## Changelog Added: `DynamicUniformBuffer::get_writer`. Added: `DynamicUniformBufferWriter`.

# Objective This is a minimally disruptive version of bevyengine#8340. I attempted to update it, but failed due to the scope of the changes added in bevyengine#8204. Fixes bevyengine#8307. Partially addresses bevyengine#4642. As seen in bevyengine#8284, we're actually copying data twice in Prepare stage systems. Once into a CPU-side intermediate scratch buffer, and once again into a mapped buffer. This is inefficient and effectively doubles the time spent and memory allocated to run these systems. ## Solution Skip the scratch buffer entirely and use `wgpu::Queue::write_buffer_with` to directly write data into mapped buffers. Separately, this also directly uses `wgpu::Limits::min_uniform_buffer_offset_alignment` to set up the alignment when writing to the buffers. Partially addressing the issue raised in bevyengine#4642. Storage buffers and the abstractions built on top of `DynamicUniformBuffer` will need to come in followup PRs. This may not have a noticeable performance difference in this PR, as the only first-party systems affected by this are view related, and likely are not going to be particularly heavy. --- ## Changelog Added: `DynamicUniformBuffer::get_writer`. Added: `DynamicUniformBufferWriter`.

Directly copy data into mapped uniform buffers

ca17ebe

james7132 added A-Rendering Drawing game state to the screen C-Performance A change motivated by improving speed, memory usage or compile times labels Apr 9, 2023

james7132 requested a review from superdump April 9, 2023 22:43

james7132 commented Apr 9, 2023

View reviewed changes

alice-i-cecile added this to the 0.11 milestone Apr 10, 2023

james7132 added 2 commits April 9, 2023 22:01

Fix panics

b6a7a0a

Remove unused DerefMut

b99814c

james7132 changed the title ~~Directly copy data into mapped uniform buffers~~ Directly copy data into uniform buffers Apr 10, 2023

Fix alignment issues

c9dd906

james7132 added 4 commits April 18, 2023 14:34

Revert to using a scratch Vec

05e7954

Merge branch 'main' into buffer-direct-copy

aa53a50

Revert "Revert to using a scratch Vec"

e05f57e

This reverts commit 05e7954.

Merge branch 'main' into buffer-direct-copy

827c617

JMS55 modified the milestones: 0.11, 0.12 Jun 11, 2023

B-head mentioned this pull request Jul 4, 2023

Rework winit runner #9034

Draft

8 tasks

james7132 mentioned this pull request Sep 20, 2023

Directly copy data into uniform buffers #9865

Merged

james7132 closed this Sep 20, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Directly copy data into uniform buffers #8340

Directly copy data into uniform buffers #8340

Uh oh!

james7132 commented Apr 9, 2023 •

edited

Loading

Uh oh!

james7132 Apr 9, 2023

Uh oh!

github-actions bot commented Apr 9, 2023

Uh oh!

github-actions bot commented Apr 10, 2023

Uh oh!

james7132 commented Apr 10, 2023

Uh oh!

james7132 commented Apr 18, 2023

Uh oh!

james7132 commented Sep 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Directly copy data into uniform buffers #8340

Directly copy data into uniform buffers #8340

Uh oh!

Conversation

james7132 commented Apr 9, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Objective

Solution

Changelog

Migration Guide

Uh oh!

james7132 Apr 9, 2023

Choose a reason for hiding this comment

Uh oh!

github-actions bot commented Apr 9, 2023

Uh oh!

github-actions bot commented Apr 10, 2023

Uh oh!

james7132 commented Apr 10, 2023

Uh oh!

james7132 commented Apr 18, 2023

Uh oh!

james7132 commented Sep 20, 2023

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

james7132 commented Apr 9, 2023 •

edited

Loading