Restructure shadowmap rendering in mobile renderer #76872

BastiaanOlij · 2023-05-09T03:18:54Z

This PR attempts to simplify shadow rendering for the mobile renderer as we're not trying to run things in parallel with GI.

Also trying out a few performance improvements recommended.

So far this is not having the desired result yet so lots to be done yet.

BastiaanOlij · 2023-05-09T15:29:11Z

Ok, lots of good feedback after talking with some of the GPU guys at ARM and I already managed to update a few things.

So a few things for prosperity.

Barriers
So our current implementation for barriers lumps the vertex and fragment shaders together into a BARRIER_MASK_RASTER. This makes total sense on desktop as these are run in harmony, as the vertex shader processes verts that make up a face, rastering of that face begins.

But on mobile the TBDR architecture results in all vertices being processed by the vertex shader, and then having rastering happen per tile. Our BARRIER_MASK_RASTER and heavy use of BARRIER_MASK_ALL prevented a lot of parallel processing where we had to wait on the fragment shader of the previous render pass being finished before we could start on the vertex shader of the next pass, especially with our cubemap omni lights.

I think there are a number of other processes that will benefit from more targeted barriers.

What I've done is introduce a BARRIER_MASK_VERTEX and BARRIER_MASK_FRAGMENT enum and haveBARRIER_MASK_RASTER combine those flags. This means for the clustered renderer it will do exactly the same thing it did before, but for the mobile renderer we have more control.

When rendering our shadowmap for our cubemap we'll have the following setup:

vkCmdCopyBuffer for UBO for cubemap side 1
TRANSFER -> VS Barrier
RenderPass 1 (Render cubemap side 1)
vkCmdCopyBuffer for UBO for cubemap side 2
TRANSFER -> VS Barrier
RenderPass 2 (Render cubemap side 2)
...
vkCmdCopyBuffer for UBO for cubemap side 6
TRANSFER -> VS Barrier
RenderPass 6 (Render cubemap side 6)
FRAG -> FRAG barrier
RenderPass 7 (Render cubemap into shadow atlas)

Running the mobile renderer on desktop probably won't have much use of this split either, but here as it's not TBDR anyway it probably won't make much difference.

Uniform buffers
On desktop with dedicated GPUs we need to load data into uniform buffers that reside on GPU memory.
On mobile GPUs (and probably integrated GPUs) however we have unified memory, i.e. the CPU and GPU use the same memory chips.

But we're still creating and updating uniform buffers and some of these are pretty large (like our light buffers) and update each frame. This means a lot of wasted bandwidth.

On mobile GPUs we can instead make uniform buffers map to our own data structures and use the source data. This does mean that we need to keep that source data around as we often destroy our buffers, and we need to make sure we don't start overwriting data if we're rendering multiple viewports and things like that.

Obviously this needs to be optional logic, detecting if we can map data or if we must load data into GPU memory but if we design the mobile renderer with unified memory in mind, it just means the copy we would otherwise have will be introduced on dedicated GPUs.

RenderAreas
Already mentioned this in the OP but it deserves a spotlight. The TBDR architecture means we should be using RenderAreas instead of viewports when rendering to our shadow atlas or each render pass will put the whole render buffer through the tile system regardless of whether tiles are effected.

Strangely it seems that on desktop the opposite it true.

For now I've added a boolean that for testing I've set to true and that makes it use renderAreas (the code for this was already commented out with the remark about this being faster on desktop) but further testing and switching is required.

Cubemap Shadows
Ok this was one thing that came out of discussing this with Clay. We currently render cubemaps shadows to a proper cubemap, and then apply a paraboloid representation of this data into our shadow atlas.

That's a lot for mobile and we should investigate alternatives that can be directly rendered into the shadow atlas.

Calinou · 2023-05-09T15:34:19Z

Cubemap Shadows
Ok this was one thing that came out of discussing this with Clay. We currently render cubemaps shadows to a proper cubemap, and then apply a paraboloid representation of this data into our shadow atlas.

That's a lot for mobile and we should investigate alternatives that can be directly rendered into the shadow atlas.

Dual parabolid mode is still supported for omni lights in 4.0, but the default is cubemaps since 3.0. This property is set on a per-light basis and is the same on desktop and mobile.

That said, dual parabolid shadows suffer from lots of distortion if using unsubdividied meshes. Maybe look into tetrahedron shadows (4 faces), which don't suffer from as much distortion but should be faster to render than cubemaps.

Relevant quote from https://github.com/Calinou/tesseract-renderer-design (which only targets desktop hardware, so I think tetrahedral is still worth trying for mobile):

After experimenting with different projection setups for omnidirectional shadows such as tetrahedral (4 faces) or dual-parabolic (2 faces), it was found that the ordinary cubemap (6 faces) layout was best as the larger number of smaller frustums actually provides better opportunities for culling and caching of faces while providing the least amount of projection distortion. However, for multi-tap shadowmap filters, the native cubemap format is insufficient for easily computing the locations of neighboring taps. Also, despite texture arrays allowing for batching of many shadowmaps during a single rendering pass, they do not allow adequate control of sizing of individual shadowmaps and their partitions.

clayjohn · 2023-05-09T17:08:30Z

@Calinou That quote is very interesting. Bastiaan and I were discussing comparing tetrahedral and octahedral shadow maps. Octahedral requires rendering to 8 faces, but the quality is comparable to using cubemaps and the texture lookup is much better than any of the other options

akien-mga added enhancement topic:rendering labels May 9, 2023

akien-mga added this to the 4.x milestone May 9, 2023

Restructure shadowmap rendering in mobile renderer

92f2545

BastiaanOlij force-pushed the restructure_mobile_shadows branch from 426e5dc to 92f2545 Compare May 9, 2023 15:01

BastiaanOlij mentioned this pull request May 24, 2023

Split raster barrier into vertex and fragment barrier #77420

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restructure shadowmap rendering in mobile renderer #76872

Restructure shadowmap rendering in mobile renderer #76872

BastiaanOlij commented May 9, 2023

BastiaanOlij commented May 9, 2023

Calinou commented May 9, 2023 •

edited

Loading

clayjohn commented May 9, 2023

Restructure shadowmap rendering in mobile renderer #76872

Are you sure you want to change the base?

Restructure shadowmap rendering in mobile renderer #76872

Conversation

BastiaanOlij commented May 9, 2023

BastiaanOlij commented May 9, 2023

Calinou commented May 9, 2023 • edited Loading

clayjohn commented May 9, 2023

Calinou commented May 9, 2023 •

edited

Loading