Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bump allocation for Uniform Buffers on WebGPU #5438

Merged
merged 3 commits into from
Jun 27, 2023
Merged

Conversation

mvaligursky
Copy link
Contributor

@mvaligursky mvaligursky commented Jun 26, 2023

Before this PR, each UniformBuffer would allocate its internal GPUBuffer storage, and per frame copy the CPU storage content to it using writeBuffer. This required many writeBuffer calls, which is expensive on both CPU and GPU time.

This PR implement more performant implementation. Under the hood, a one or more large (1MB) gpu buffers are allocated, and a pool of staging buffers of the same size. Individual uniform buffers allocate storage using bump allocator from the staging buffers. Then, just before the command buffers are submitted, a command buffer is added to execute first, which copies the used staging buffers to the gpu buffers.
Here's an example of used buffers for many example. Note that the number of staging buffers gets larger each time a command buffers are submitted, as they can no longer use already existing staging buffer.

buffer-allocation

This PR also cleans up some temporary solutions introduced in #5423 to limit the number of expensive submit commands per frame. Before, command buffer of each render pass would be submitted separately, while now those are batched to a very small number.

As an example, the shadow cascades example is using a single submit, first copying the staging buffers to gpu buffers, following by a single command buffer render all shadow cascade render passes, followed the the forward pass of the scene:

shadow-cascades

Multi view example similarly renders the whole scene using a single submit for all command buffers:

multi-view

If there are texture uploads done in a frame (typically a very small number of places), for example in this case the bone texture used by the skinning, and clustered lights updated on CPU, we end up with two submits:

texture-uploads

All rendering submitted from the update functions of the script are submitted separately for now (could be a single submit as well), for example reflection-cubemap example which renders the scene using a single submit, and does multiple texture reprojections using draQuadWithShader within the scripts:

reflectio-cubemap

Performance

CPU frame time for the hierarchy example with 5000 or so meshes:

  • WebGPU before: 57ms
  • WebGPU now: 48ms (15% improvement)
  • for comparison, WebGL time: 14ms

GPU times (these are based on the GPU duration reported by Chrome Profiler only, not sure about their reliability / what else they capture). I do not think this is reliable at all.

  • WebGPU before: 12.8ms
  • WebGPU now: 11.8ms
  • WebGL time: hard to estimate in browser, too many displayed bars.

@mvaligursky mvaligursky self-assigned this Jun 26, 2023
@mvaligursky mvaligursky marked this pull request as draft June 26, 2023 11:32
@mvaligursky mvaligursky added feature area: graphics Graphics related issue labels Jun 26, 2023
@mvaligursky mvaligursky mentioned this pull request Jun 26, 2023
@mvaligursky mvaligursky marked this pull request as ready for review June 26, 2023 13:11
@mvaligursky mvaligursky merged commit c30f8d8 into main Jun 27, 2023
@mvaligursky mvaligursky deleted the mv-dynamic-buffers branch June 27, 2023 08:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: graphics Graphics related issue feature
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants