Optimize base and shadow meshes for vertex cache #94241

zeux · 2024-07-11T23:17:32Z

Previously, vertex cache optimization was ran for the LOD meshes, but was never ran for the base mesh or for the shadow meshes, including shadow LOD chain (shadow LOD chain would sometimes get implicitly optimized for vertex cache as a byproduct of base LOD optimization, but not always). This could significantly affect the rendering performance of geometry heavy scenes, especially for depth or shadow passes where the fragment load is light.

This PR unconditionally runs the optimization for base mesh before further processing, and for any generated shadow index buffers; if meshoptimizer module is not loaded, we silently skip the processing. Note that this is the same algorithm we already use for LOD index buffers.

I generally treat this optimization as "always on, do no harm" - it only changes the order of triangles, which is generally speaking indeterminate on import, and is fairly quick. For a sense of scale, this is ~6x faster than tangent generation, ~25x faster than LOD generation (before my previous optimization PR, so maybe ~10x after?), and consequently should not change the import time much. I've tested this with DragonAttenuation model (https://github.com/KhronosGroup/glTF-Sample-Models/tree/main/2.0/DragonAttenuation) and didn't see overall import time change in a statistically measurable way. The appearance of any model should be the same, this only changes the submitted triangle order within each mesh, which has no impact on opaque meshes and should not make transparent meshes worse in that the order of triangles on them could not be relied upon anyway.

As any hardware performance optimization, this is hard to measure well. On a scene with 28 clones of the model above, with some objects closer to camera (LOD 0) and some further away, my aggregate measurements on NVidia RTX 4090 make that scene ~17% faster in terms of full frame time to render. Most of the gains are just from the shadow mesh optimization (it's something like 11% for shadow mesh optimization and 6% extra on top from base mesh optimization) - depth pre-pass and shadow passes tend to be vertex/raster bound, and the shadow mesh is rendered multiple times, so that makes sense. Note that other meshes may display no performance gains (for example, if a mesh is fairly low-poly, or if the scene has been preprocessed with tools like gltfpack that generate optimal order, the gains will be small to non-existent), and could also display larger performance gains (as the original order can be more pathologically bad depending on the exporter). Realistically I would not expect a double digit performance improvement here on any realistic scenes, but the gains are free.

The measurements quoted above are with VSync disabled using full frame FPS, if we measure the GPU time on the individual passes (using Godot's Visual Profiler), the relative gains are more significant - note that I'm using the numbers as displayed by the profiler (2 decimal digits), my GPU is clearly too fast for this 😝:

Pass	Time (Before)	Time (After)	Improvement (%)
Depth Pre-Pass	0.09 ms	0.06 ms	~33%
Shadows	0.12 ms	0.09 ms	~25%
Opaque Pass	0.18 ms	0.15 ms	~16%
(Total) 3D Scene	0.44 ms	0.34 ms	~22%

Calinou · 2024-07-12T00:30:15Z

This could also benefit #94097 when using complex PrimitiveMeshes.

Like create_shadow_mesh() which is not exposed yet, the method in ImporterMesh may be worth exposing to scripting, so that procedural geometry generation scripts can make use of it. In general, it should be possible to procedurally generated meshes to achieve the same level of optimization as pre-authored meshes (assuming you can spend the time doing this processing once when the mesh is first generated).

zeux · 2024-07-12T00:42:18Z

Would procedural geometry use ImporterMesh or SurfaceTool? Asking because SurfaceTool already exposes optimize_indices_for_cache.

fire · 2024-07-12T04:22:19Z

At this point, I expect to use both for procedural generation and .. csg, but I am in favour of this.

clayjohn

Looks great! I'm glad to see that the performance benefits are so tangible

mrjustaguy · 2024-07-12T08:37:13Z

#68959 should be tested with this..

zeux · 2024-07-12T16:10:11Z

@mrjustaguy That issue should not be affected by this change in isolation for two reasons: 1) this PR only adds the relevant functionality to the glTF import path; adding it to .obj is a matter of adding this to the .obj importer but I'm worried this will cause conflicts with #94108 so I'd rather do that separately / as part of that change if this ends up getting merged first:

+       for (int i = 0; i < r_meshes.size(); i++) {
+               r_meshes.get(i)->optimize_indices_for_cache();
+       }

that issue has a geometry file which is basically a lot of cubes. Due to lack of vertex sharing, in general models with faceted shading - cubes or otherwise - are simultaneously inefficient to render, and mostly not affected by vertex cache optimizations. That said, I would assume Add Generate LODs, Shadow Mesh and Lightmap UV2 options to OBJ mesh import #94108 helps somewhat as it adds shadow meshes which should accelerate depth pre-pass/shadow rendering and get a further small boost from this PR.

edit yeah confirmed that shadow meshes help on that file, depth pre-pass drops from 0.45ms to 0.24ms on 4090. Without this change but with shadow mesh creation depth pre-pass drops to 0.26ms, so there's a small improvement for shadow meshes even for this edge case from this PR which is nice.

mrjustaguy · 2024-07-12T20:36:48Z

I mean that was really a Stress test to compare Godot 3 with Godot 4 primitive performance..

Though I think that there have been a few optimizations relevant to it since I've last looked at it so IDK how Godot 4 compares to 3 Today in that aspect.

Calinou · 2024-07-12T20:55:56Z

Would procedural geometry use ImporterMesh or SurfaceTool? Asking because SurfaceTool already exposes optimize_indices_for_cache.

Procedural geometry generation is done with SurfaceTool, but ImporterMesh also exposes similar functions so that import scripts can make use of it.

akien-mga · 2024-08-16T08:53:10Z

Needs a rebase to resolve merge conflicts after some initial merges in 4.4.
Then it's in the queue for merging ASAP.

Previously, vertex cache optimization was ran for the LOD meshes, but was never ran for the base mesh or for the shadow meshes, including shadow LOD chain (shadow LOD chain would sometimes get implicitly optimized for vertex cache as a byproduct of base LOD optimization, but not always). This could significantly affect the rendering performance of geometry heavy scenes, especially for depth or shadow passes where the fragment load is light.

zeux · 2024-08-16T14:40:51Z

Rebased vs master.

akien-mga · 2024-08-16T21:52:05Z

Thanks!

zeux requested a review from a team as a code owner July 11, 2024 23:17

zeux force-pushed the optimize-cache branch 2 times, most recently from 5c1b821 to a6db472 Compare July 11, 2024 23:22

Calinou added enhancement topic:rendering topic:3d performance labels Jul 12, 2024

Calinou added this to the 4.x milestone Jul 12, 2024

zeux mentioned this pull request Jul 12, 2024

Add Generate LODs, Shadow Mesh and Lightmap UV2 options to OBJ mesh import #94108

Open

1 task

clayjohn approved these changes Jul 12, 2024

View reviewed changes

clayjohn modified the milestones: 4.x, 4.4 Jul 12, 2024

LiveTrower pushed a commit to LiveTrower/godot that referenced this pull request Aug 12, 2024

part of godotengine#94241

8a4f4c6

zeux force-pushed the optimize-cache branch from a6db472 to 0fde03c Compare August 16, 2024 14:39

akien-mga merged commit 759d7d4 into godotengine:master Aug 16, 2024
18 checks passed

matheusmdx mentioned this pull request Sep 11, 2024

crash opening glb file #96869

Closed

zeux mentioned this pull request Sep 11, 2024

Fix a crash in ImporterMesh::create_shadow_mesh for non-triangle surfaces #96880

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize base and shadow meshes for vertex cache #94241

Optimize base and shadow meshes for vertex cache #94241

zeux commented Jul 11, 2024 •

edited

Loading

Calinou commented Jul 12, 2024 •

edited

Loading

zeux commented Jul 12, 2024 •

edited

Loading

fire commented Jul 12, 2024

clayjohn left a comment

mrjustaguy commented Jul 12, 2024

zeux commented Jul 12, 2024 •

edited

Loading

mrjustaguy commented Jul 12, 2024

Calinou commented Jul 12, 2024 •

edited

Loading

akien-mga commented Aug 16, 2024

zeux commented Aug 16, 2024

akien-mga commented Aug 16, 2024

Optimize base and shadow meshes for vertex cache #94241

Optimize base and shadow meshes for vertex cache #94241

Conversation

zeux commented Jul 11, 2024 • edited Loading

Calinou commented Jul 12, 2024 • edited Loading

zeux commented Jul 12, 2024 • edited Loading

fire commented Jul 12, 2024

clayjohn left a comment

Choose a reason for hiding this comment

mrjustaguy commented Jul 12, 2024

zeux commented Jul 12, 2024 • edited Loading

mrjustaguy commented Jul 12, 2024

Calinou commented Jul 12, 2024 • edited Loading

akien-mga commented Aug 16, 2024

zeux commented Aug 16, 2024

akien-mga commented Aug 16, 2024

zeux commented Jul 11, 2024 •

edited

Loading

Calinou commented Jul 12, 2024 •

edited

Loading

zeux commented Jul 12, 2024 •

edited

Loading

zeux commented Jul 12, 2024 •

edited

Loading

Calinou commented Jul 12, 2024 •

edited

Loading