Optimize the deferred command recording path of Metal #2238

kvark · 2018-07-16T14:42:34Z

We used to focus solely on the immediate recording, with an assumption that moving as much load as possible from the submission into the recording time would allow us to run faster in an ideal application. Unfortunately, there are issue with this - #2232, resulting in the deferred path actually showing better framerate.

There is a few low hanging fruits in there that we need to fix, as well as profile it nicely in general. E.g. the pass vector and command vectors of each pass are not properly recycled, resulting in a lot of heap allocations and moves (on heap resize). We might want to keep them all in the command pool (which doesn't need locking) and re-use from there.

kvark · 2018-07-16T14:47:47Z

Setting up:

Run Dota with gfx-portability using local version of gfx-rs as a cargo patch
Add && false somewhere here
Take an Instruments profile and see what can be improved

kvark · 2018-07-20T01:40:43Z

I believe the heap re-allocation is the major source of problems here (in deferred path). Just going straight re-cycling the vectors doesn't actually end up being the best idea, actually. Implementation becomes a bit wonky, and worst-case memory consumption isn't great either.

Instead, I suggest the following scheme:

have 4 big arrays recorded:
1. array of render commands (mixed between passes)
2. compute commands
3. blit commands
4. array of passes, each basically specifying the type of a pass and the range of commands (in one of the other arrays).
make sure that resetting a deferred command buffer still re-uses that storage

With current passes, clearing the main pass vector automatically drops all the command vectors, making them impossible to recycle. With the new structure it becomes much easier.

2254: Metal deferred command buffer optimizations r=grovesNL a=kvark Fixes #2252 Fixes #2238 ~~I'm not 100% convinced this is a good thing to fix, but had to try it anyway, so might as well file the PR :) Please take a (critical) look.~~ PR checklist: - [ ] `make` succeeds (on *nix) - [x] `make reftests` succeeds - [x] tested examples with the following backends: metal - [ ] `rustfmt` run on changed code Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>

kvark added type: optimization status: ready for work difficulty: average module: hal value: medium backend: Metal labels Jul 16, 2018

kvark mentioned this issue Jul 20, 2018

Metal deferred command buffer optimizations #2254

Merged

4 tasks

bors bot closed this as completed in #2254 Jul 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimize the deferred command recording path of Metal #2238

Optimize the deferred command recording path of Metal #2238

kvark commented Jul 16, 2018

kvark commented Jul 16, 2018

kvark commented Jul 20, 2018

Optimize the deferred command recording path of Metal #2238

Optimize the deferred command recording path of Metal #2238

Comments

kvark commented Jul 16, 2018

kvark commented Jul 16, 2018

kvark commented Jul 20, 2018