Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize the deferred command recording path of Metal #2238

Closed
kvark opened this issue Jul 16, 2018 · 2 comments · Fixed by #2254
Closed

Optimize the deferred command recording path of Metal #2238

kvark opened this issue Jul 16, 2018 · 2 comments · Fixed by #2254

Comments

@kvark
Copy link
Member

kvark commented Jul 16, 2018

We used to focus solely on the immediate recording, with an assumption that moving as much load as possible from the submission into the recording time would allow us to run faster in an ideal application. Unfortunately, there are issue with this - #2232, resulting in the deferred path actually showing better framerate.

There is a few low hanging fruits in there that we need to fix, as well as profile it nicely in general. E.g. the pass vector and command vectors of each pass are not properly recycled, resulting in a lot of heap allocations and moves (on heap resize). We might want to keep them all in the command pool (which doesn't need locking) and re-use from there.

@kvark
Copy link
Member Author

kvark commented Jul 16, 2018

Setting up:

  1. Run Dota with gfx-portability using local version of gfx-rs as a cargo patch
  2. Add && false somewhere here
  3. Take an Instruments profile and see what can be improved

@kvark
Copy link
Member Author

kvark commented Jul 20, 2018

I believe the heap re-allocation is the major source of problems here (in deferred path). Just going straight re-cycling the vectors doesn't actually end up being the best idea, actually. Implementation becomes a bit wonky, and worst-case memory consumption isn't great either.

Instead, I suggest the following scheme:

  • have 4 big arrays recorded:
    1. array of render commands (mixed between passes)
    2. compute commands
    3. blit commands
    4. array of passes, each basically specifying the type of a pass and the range of commands (in one of the other arrays).
  • make sure that resetting a deferred command buffer still re-uses that storage

With current passes, clearing the main pass vector automatically drops all the command vectors, making them impossible to recycle. With the new structure it becomes much easier.

bors bot added a commit that referenced this issue Jul 20, 2018
2254: Metal deferred command buffer optimizations r=grovesNL a=kvark

Fixes #2252
Fixes #2238

~~I'm not 100% convinced this is a good thing to fix, but had to try it anyway, so might as well file the PR :)
Please take a (critical) look.~~

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends: metal
- [ ] `rustfmt` run on changed code


Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
@bors bors bot closed this as completed in #2254 Jul 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant