Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve vertex upload performance significantly on iOS #5931

Merged
merged 5 commits into from
Jul 21, 2023

Conversation

frenzibyte
Copy link
Member

@frenzibyte frenzibyte commented Jul 15, 2023

With the new performance test scenes we have, VBO uploads have shown to be quite poor on iOS, even when building in AOT.

After multiple days of investigation (interrupted by regressions in our rendering systems), it turns out the intermediate vertex storage allocated in VeldridVertexBuffer have a significant hit on performance. In an isolated environment, there's +7ms aggregate CPU overhead when uploading to an intermediate vertex storage followed by uploading to the GPU buffer, from uploading & drawing 10k quads.

Shaving that overhead off improves performance on iOS significantly, especially in scenes with a high number of vertex uploads:

master this PR
IMG_6194 IMG_6203
IMG_6195 IMG_6204
IMG_6196 IMG_6205

Generally, using and writing to a shared buffer directly on Metal is not a bad idea for multiple reasons:

  • It is done for dynamic buffers in Metal's example applications.

  • There's no difference in performance between a shared and a private buffer (no optimisation for GPU access), at least on macOS (source):

    CleanShot 2023-07-16 at 11 32 10

To add this into our implementation appropriately, I've took the time to refactor index buffer storage to become separated from VeldridVertexBuffer (and also make more sense), @smoogipoo requesting your review to make sure you're on board with this new structure.

Initially I didn't want to make this change since it keeps on increasing the diversity between backends within the framework project while it still uses Veldrid, but the resultant gain is 100% worth it. Eventually I plan to work on a Metal renderer implementation away from Veldrid, using .NET's API that's shipped with iOS / Mac Catalyst. But for now, I've added this as a separate VBO class for Metal renderers specifically.

@smoogipoo
Copy link
Contributor

smoogipoo commented Jul 15, 2023

Can you also test desktop in the same situation? I tried this a long time ago (before the VBO invalidation work that's been put on the backburner - maybe time to bring that back...) and found similar results that in general we'd get higher FPS by not doing the deduping, but there were certain scenarios where that would fall apart. I can't remember the exact case where I noticed it though, so I hope it's reached by the test cases...

@peppy I also want you to test this on desktop metal, on your x86 PC.

@frenzibyte
Copy link
Member Author

frenzibyte commented Jul 15, 2023

M1 Pro:

master this PR
CleanShot 2023-07-15 at 20 25 01 CleanShot 2023-07-15 at 20 23 17
CleanShot 2023-07-15 at 20 25 16 CleanShot 2023-07-15 at 20 23 32
CleanShot 2023-07-15 at 20 28 23 CleanShot 2023-07-15 at 20 29 33

@peppy
Copy link
Member

peppy commented Jul 16, 2023

M2 (minor improvements across the board):

Before After
2023-07-16 13 24 07@2x 2023-07-16 13 21 07@2x
2021-09 13 23 30@2x 2023-07-16 13 20 36@2x
2021-09 13 22 53@2x 2023-07-16 13 25 35@2x

@peppy
Copy link
Member

peppy commented Jul 16, 2023

macOS / Intel / AMD (considerable improvements across the board!):

Before After
osu Framework Tests 2021-09 at 05 41 58 osu Framework Tests 2023-07-16 at 05 39 49
osu Framework Tests 2021-09 at 05 41 19 osu Framework Tests 2023-07-16 at 05 40 42
osu Framework Tests 2021-09 at 05 41 34 osu Framework Tests 2023-07-16 at 05 40 19

@peppy
Copy link
Member

peppy commented Jul 16, 2023

Windows / Direct X / Nvidia (no difference / within error margins, expected i guess):

Before After
osu Framework Tests_2023-07-16_16-49-57 image
image image
image image

@smoogipoo
Copy link
Contributor

As far as I can tell this isn't enabled on D3D, so unless you're making those changes yourself you wouldn't see a difference.

@frenzibyte
Copy link
Member Author

frenzibyte commented Jul 16, 2023

Yeah backends other than Metal are unaffected in this PR. I'm not willing to increase the scope of this change in just one PR with uneducated thoughts over the other backends. Metal, at least, I have a good grasp on.

@peppy
Copy link
Member

peppy commented Jul 16, 2023

As far as I can tell this isn't enabled on D3D, so unless you're making those changes yourself you wouldn't see a difference.

yep, i was just testing to confirm there was no regression mostly.

peppy
peppy previously approved these changes Jul 18, 2023
Copy link
Member

@peppy peppy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As far as I can tell this performs and is structured fine.

@smoogipoo will need your review

Copy link
Contributor

@smoogipoo smoogipoo left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As always, my concern with these sorts of changes is still undefined GPU behaviour. I just hope our double-triple-buffered VBOs is enough, and given that others have tested it and haven't experienced glitchiness, I'm willing to give this a pass.

@smoogipoo smoogipoo enabled auto-merge July 20, 2023 18:13
@frenzibyte
Copy link
Member Author

As always, my concern with these sorts of changes is still undefined GPU behaviour. I just hope our double-triple-buffered VBOs is enough, and given that others have tested it and haven't experienced glitchiness, I'm willing to give this a pass.

And this behaviour is specific to us not 100% guarding against access between CPU & GPU using fences and what not, right? Asking just to confirm knowledge, not hinting at anything.

@smoogipoo
Copy link
Contributor

And this behaviour is specific to us not 100% guarding against access between CPU & GPU using fences and what not, right

Yes, that's correct.

@frenzibyte frenzibyte deleted the metal-optimise-vbos branch July 21, 2023 09:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

Successfully merging this pull request may close these issues.

Investigate poor performance of sprite text rendering on iOS
3 participants