-
Notifications
You must be signed in to change notification settings - Fork 419
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve vertex upload performance significantly on iOS #5931
Conversation
Can you also test desktop in the same situation? I tried this a long time ago (before the VBO invalidation work that's been put on the backburner - maybe time to bring that back...) and found similar results that in general we'd get higher FPS by not doing the deduping, but there were certain scenarios where that would fall apart. I can't remember the exact case where I noticed it though, so I hope it's reached by the test cases... @peppy I also want you to test this on desktop metal, on your x86 PC. |
As far as I can tell this isn't enabled on D3D, so unless you're making those changes yourself you wouldn't see a difference. |
Yeah backends other than Metal are unaffected in this PR. I'm not willing to increase the scope of this change in just one PR with uneducated thoughts over the other backends. Metal, at least, I have a good grasp on. |
yep, i was just testing to confirm there was no regression mostly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As far as I can tell this performs and is structured fine.
@smoogipoo will need your review
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As always, my concern with these sorts of changes is still undefined GPU behaviour. I just hope our double-triple-buffered VBOs is enough, and given that others have tested it and haven't experienced glitchiness, I'm willing to give this a pass.
And this behaviour is specific to us not 100% guarding against access between CPU & GPU using fences and what not, right? Asking just to confirm knowledge, not hinting at anything. |
Yes, that's correct. |
With the new performance test scenes we have, VBO uploads have shown to be quite poor on iOS, even when building in AOT.
After multiple days of investigation (interrupted by regressions in our rendering systems), it turns out the intermediate vertex storage allocated in
VeldridVertexBuffer
have a significant hit on performance. In an isolated environment, there's +7ms aggregate CPU overhead when uploading to an intermediate vertex storage followed by uploading to the GPU buffer, from uploading & drawing 10k quads.Shaving that overhead off improves performance on iOS significantly, especially in scenes with a high number of vertex uploads:
master
Generally, using and writing to a shared buffer directly on Metal is not a bad idea for multiple reasons:
It is done for dynamic buffers in Metal's example applications.
There's no difference in performance between a shared and a private buffer (no optimisation for GPU access), at least on macOS (source):
To add this into our implementation appropriately, I've took the time to refactor index buffer storage to become separated from
VeldridVertexBuffer
(and also make more sense), @smoogipoo requesting your review to make sure you're on board with this new structure.Initially I didn't want to make this change since it keeps on increasing the diversity between backends within the framework project while it still uses Veldrid, but the resultant gain is 100% worth it. Eventually I plan to work on a Metal renderer implementation away from Veldrid, using .NET's API that's shipped with iOS / Mac Catalyst. But for now, I've added this as a separate VBO class for Metal renderers specifically.