You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
We used to focus solely on the immediate recording, with an assumption that moving as much load as possible from the submission into the recording time would allow us to run faster in an ideal application. Unfortunately, there are issue with this - #2232, resulting in the deferred path actually showing better framerate.
There is a few low hanging fruits in there that we need to fix, as well as profile it nicely in general. E.g. the pass vector and command vectors of each pass are not properly recycled, resulting in a lot of heap allocations and moves (on heap resize). We might want to keep them all in the command pool (which doesn't need locking) and re-use from there.
The text was updated successfully, but these errors were encountered:
I believe the heap re-allocation is the major source of problems here (in deferred path). Just going straight re-cycling the vectors doesn't actually end up being the best idea, actually. Implementation becomes a bit wonky, and worst-case memory consumption isn't great either.
Instead, I suggest the following scheme:
have 4 big arrays recorded:
array of render commands (mixed between passes)
compute commands
blit commands
array of passes, each basically specifying the type of a pass and the range of commands (in one of the other arrays).
make sure that resetting a deferred command buffer still re-uses that storage
With current passes, clearing the main pass vector automatically drops all the command vectors, making them impossible to recycle. With the new structure it becomes much easier.
2254: Metal deferred command buffer optimizations r=grovesNL a=kvark
Fixes#2252Fixes#2238
~~I'm not 100% convinced this is a good thing to fix, but had to try it anyway, so might as well file the PR :)
Please take a (critical) look.~~
PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends: metal
- [ ] `rustfmt` run on changed code
Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
We used to focus solely on the immediate recording, with an assumption that moving as much load as possible from the submission into the recording time would allow us to run faster in an ideal application. Unfortunately, there are issue with this - #2232, resulting in the deferred path actually showing better framerate.
There is a few low hanging fruits in there that we need to fix, as well as profile it nicely in general. E.g. the pass vector and command vectors of each pass are not properly recycled, resulting in a lot of heap allocations and moves (on heap resize). We might want to keep them all in the command pool (which doesn't need locking) and re-use from there.
The text was updated successfully, but these errors were encountered: