Improve vertex upload performance significantly on iOS #5931

frenzibyte · 2023-07-15T12:36:20Z

Closes Investigate poor performance of sprite text rendering on iOS #5920

With the new performance test scenes we have, VBO uploads have shown to be quite poor on iOS, even when building in AOT.

After multiple days of investigation (interrupted by regressions in our rendering systems), it turns out the intermediate vertex storage allocated in VeldridVertexBuffer have a significant hit on performance. In an isolated environment, there's +7ms aggregate CPU overhead when uploading to an intermediate vertex storage followed by uploading to the GPU buffer, from uploading & drawing 10k quads.

Shaving that overhead off improves performance on iOS significantly, especially in scenes with a high number of vertex uploads:

`master`	this PR

Generally, using and writing to a shared buffer directly on Metal is not a bad idea for multiple reasons:

It is done for dynamic buffers in Metal's example applications.
There's no difference in performance between a shared and a private buffer (no optimisation for GPU access), at least on macOS (source):

To add this into our implementation appropriately, I've took the time to refactor index buffer storage to become separated from VeldridVertexBuffer (and also make more sense), @smoogipoo requesting your review to make sure you're on board with this new structure.

Initially I didn't want to make this change since it keeps on increasing the diversity between backends within the framework project while it still uses Veldrid, but the resultant gain is 100% worth it. Eventually I plan to work on a Metal renderer implementation away from Veldrid, using .NET's API that's shipped with iOS / Mac Catalyst. But for now, I've added this as a separate VBO class for Metal renderers specifically.

smoogipoo · 2023-07-15T12:57:00Z

Can you also test desktop in the same situation? I tried this a long time ago (before the VBO invalidation work that's been put on the backburner - maybe time to bring that back...) and found similar results that in general we'd get higher FPS by not doing the deduping, but there were certain scenarios where that would fall apart. I can't remember the exact case where I noticed it though, so I hope it's reached by the test cases...

@peppy I also want you to test this on desktop metal, on your x86 PC.

frenzibyte · 2023-07-15T17:26:59Z

M1 Pro:

`master`	this PR

peppy · 2023-07-16T04:28:44Z

M2 (minor improvements across the board):

Before	After

peppy · 2023-07-16T05:44:00Z

macOS / Intel / AMD (considerable improvements across the board!):

Before	After

peppy · 2023-07-16T07:54:53Z

Windows / Direct X / Nvidia (no difference / within error margins, expected i guess):

Before	After

smoogipoo · 2023-07-16T08:22:21Z

As far as I can tell this isn't enabled on D3D, so unless you're making those changes yourself you wouldn't see a difference.

frenzibyte · 2023-07-16T08:27:59Z

Yeah backends other than Metal are unaffected in this PR. I'm not willing to increase the scope of this change in just one PR with uneducated thoughts over the other backends. Metal, at least, I have a good grasp on.

peppy · 2023-07-16T15:27:18Z

As far as I can tell this isn't enabled on D3D, so unless you're making those changes yourself you wouldn't see a difference.

yep, i was just testing to confirm there was no regression mostly.

peppy

As far as I can tell this performs and is structured fine.

@smoogipoo will need your review

smoogipoo

As always, my concern with these sorts of changes is still undefined GPU behaviour. I just hope our double-triple-buffered VBOs is enough, and given that others have tested it and haven't experienced glitchiness, I'm willing to give this a pass.

osu.Framework/Graphics/Veldrid/Batches/VeldridVertexBatch.cs

frenzibyte · 2023-07-20T18:32:03Z

As always, my concern with these sorts of changes is still undefined GPU behaviour. I just hope our double-triple-buffered VBOs is enough, and given that others have tested it and haven't experienced glitchiness, I'm willing to give this a pass.

And this behaviour is specific to us not 100% guarding against access between CPU & GPU using fences and what not, right? Asking just to confirm knowledge, not hinting at anything.

smoogipoo · 2023-07-21T06:14:57Z

And this behaviour is specific to us not 100% guarding against access between CPU & GPU using fences and what not, right

Yes, that's correct.

frenzibyte added 3 commits July 15, 2023 15:14

Split index buffers away from VeldridVertexBuffer

ee66777

Introduce interface for VBOs and refactor VeldridVertexBatch slightly

9e107fd

Add VBO implementation optimised for Metal renderer

b80f357

frenzibyte added type:performance platform:iOS area:renderer-veldrid labels Jul 15, 2023

frenzibyte requested review from peppy and smoogipoo July 15, 2023 12:36

frenzibyte self-assigned this Jul 15, 2023

pull-request-size bot added the size/XL label Jul 15, 2023

peppy previously approved these changes Jul 18, 2023

View reviewed changes

smoogipoo approved these changes Jul 20, 2023

View reviewed changes

smoogipoo enabled auto-merge July 20, 2023 18:13

Merge branch 'master' into metal-optimise-vbos

083a3f2

smoogipoo requested changes Jul 20, 2023

View reviewed changes

osu.Framework/Graphics/Veldrid/Batches/VeldridVertexBatch.cs Outdated Show resolved Hide resolved

Fix linear index buffer not being used for path VBOs

f01f502

frenzibyte dismissed peppy’s stale review via f01f502 July 21, 2023 01:32

smoogipoo merged commit 12b3c03 into ppy:master Jul 21, 2023

smoogipoo approved these changes Jul 21, 2023

View reviewed changes

frenzibyte deleted the metal-optimise-vbos branch July 21, 2023 09:31

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Improve vertex upload performance significantly on iOS #5931

Improve vertex upload performance significantly on iOS #5931

frenzibyte commented Jul 15, 2023 •

edited

Loading

smoogipoo commented Jul 15, 2023 •

edited

Loading

frenzibyte commented Jul 15, 2023 •

edited

Loading

peppy commented Jul 16, 2023

peppy commented Jul 16, 2023

peppy commented Jul 16, 2023 •

edited

Loading

smoogipoo commented Jul 16, 2023

frenzibyte commented Jul 16, 2023 •

edited

Loading

peppy commented Jul 16, 2023

peppy left a comment •

edited

Loading

smoogipoo left a comment •

edited

Loading

frenzibyte commented Jul 20, 2023

smoogipoo commented Jul 21, 2023

Improve vertex upload performance significantly on iOS #5931

Improve vertex upload performance significantly on iOS #5931

Conversation

frenzibyte commented Jul 15, 2023 • edited Loading

smoogipoo commented Jul 15, 2023 • edited Loading

frenzibyte commented Jul 15, 2023 • edited Loading

peppy commented Jul 16, 2023

peppy commented Jul 16, 2023

peppy commented Jul 16, 2023 • edited Loading

smoogipoo commented Jul 16, 2023

frenzibyte commented Jul 16, 2023 • edited Loading

peppy commented Jul 16, 2023

peppy left a comment • edited Loading

Choose a reason for hiding this comment

smoogipoo left a comment • edited Loading

Choose a reason for hiding this comment

frenzibyte commented Jul 20, 2023

smoogipoo commented Jul 21, 2023

frenzibyte commented Jul 15, 2023 •

edited

Loading

smoogipoo commented Jul 15, 2023 •

edited

Loading

frenzibyte commented Jul 15, 2023 •

edited

Loading

peppy commented Jul 16, 2023 •

edited

Loading

frenzibyte commented Jul 16, 2023 •

edited

Loading

peppy left a comment •

edited

Loading

smoogipoo left a comment •

edited

Loading