Optimizing resource lifetime tracking #1629

pythonesque · 2021-07-10T22:12:39Z

This is copy-pasted from what I wrote on the wgpu Matrix. I'm putting it here so it doesn't get lost. The context is wanting to reduce lifetime tracking overhead as much as possible compared to wgpu-hal (preferably: as close to zero as possible, if the tracking requirements are not complex).

It's kind of awkward actually, it seems like there are two good options at two ends of the tradeoff space in terms of reducing overhead for tracking:

Refcount immediately on submission to the command buffer (like Metal does). Obviously safe, very easy to use, but you lose sophisticated optimization opportunities due to having to worry about state that's tracked in command buffers that haven't been submitted yet. More generally, what you really would want here is some sort of optimized tracing garbage collection, since the vast majority of the time resources used by a queue would still be alive the next frame anyway. But optimized tracing GC in Rust (particularly safe, stable Rust) is still a long way away I think (though I do think we'll get there eventually); and whether it will ever be ergonomic is an open question.
Don't refcount stuff in command buffers at all (still use refcounting for resources used by other resources, since for these refcounting has basically perfect tradeoffs for this scenario). Instead, wait until you know the order commands will be submitted in on a queue (using borrows to guarantee that the objects are alive until this point), and then use priority update to update each object in the queue exactly once (to its latest submission version). This is basically the most efficient option possible if you can't peek into the actual raw command buffer data. It allows speedy resource reclamation since it completely avoids HashMap overhead, which is substantial with large numbers of objects, and has minimal write overhead (only one shared write per submission to a queue even if an object is used thousands of times, cache-friendlier than Arc without hashmap). When an object is dropped, it can just check its version against the current queue version, dropping the underlying resource only if its version is not on the queue anymore (with a variety of options for what to do when the object is still on the queue).

So (2) is basically what I want to use, but it conflicts significantly with the WebGPU spec in the following sense:

If I know the relative order command buffers will be submitted in ahead of time (like I think how you're intended to use Metal?), I can set the versions right away while live recording and avoid needing very long borrows. But WebGPU says the submission order of command buffers is not fixed ahead of time.
If I don't, I can still satisfy these requirements (with some more local, but cheap, overhead) by delaying setting versions of the objects I record until I submit to the queue. But then I need to keep a borrow open not just as long as a RenderPass, but for the duration of the whole queue submission. For Veloren, this would not be an issue, but I think a lot of stuff already struggles to hold stuff resources open for just the duration of the RenderPass being recorded.

There are other possibilities besides what I suggested for trying to reduce overhead in common cases, but so far I haven't been able to figure out anything that isn't just a benchmark hack, or has other bad tradeoffs (e.g. if you acquire a version speculatively when you create a command buffer, and then hold onto it randomly for the rest of your program, we probably don't want that to infinitely delay collection of any resources for command buffers created afterwards).

Additionally, what I just said is I think the most efficient safe way to do resource lifetime tracking (short of asking submitted objects to stay alive across multiple frames and manually specifying a blocking cleanup scope [which some people claim is a common gamedev pattern but I don't believe them] or some related "rooting" scheme that seems likely even more complicated to use). But for buffers and textures (as featured in bunnymark) we also lose overhead to state/conflict tracking, and I still am not sure what the best way to deal with that is... and I think probably the "best" solutions to reduce overhead will conflict with the WebGPU spec even more significantly than needing to know command buffer order ahead of time. If we already need to use hashmaps anyway on a large percentage of the resources we track, then I suspect that will probably dominate the overhead of lifetime tracking in most cases.

So... what I'm thinking is, it would be nice if wgpu somehow made its Rust API able to support both modes of use (refcounting a la Metal, or version tracking and delaying it until queue submission). But I don't know what the best way to do that would be. IMO you can probably do more and fancier optimizations if you can just specify the version up front, while also being more ergonomic than holding resources borrowed for the length of a whole queue submission. But this more outwardly conflicts with the wgpu API, while just having more restricted lifetime constraints technically doesn't.

Currently, with what I'm working on (eliminating hubs), mode (1) is going to be the easiest approach to implement. But I want to make sure we keep mode (2) open, hence I don't want to remove the lifetime constraints just yet with my changes.

cwfitzgerald · 2022-06-06T04:45:49Z

Closed by #2662

pythonesque changed the title ~~Optimizing resource tracking~~ Optimizing resource lifetime tracking Jul 10, 2021

cwfitzgerald added type: enhancement New feature or request type: question Further information is requested labels Jul 11, 2021

kvark added area: ecosystem Help the connected projects grow and prosper area: performance How fast things go labels Jul 12, 2021

cwfitzgerald closed this as completed Jun 6, 2022

Patryk27 pushed a commit to Patryk27/wgpu that referenced this issue Nov 23, 2022

hlsl: support arrays of matrices (gfx-rs#1629)

f2832b8

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Optimizing resource lifetime tracking #1629

Optimizing resource lifetime tracking #1629

pythonesque commented Jul 10, 2021 •

edited

Loading

cwfitzgerald commented Jun 6, 2022

Optimizing resource lifetime tracking #1629

Optimizing resource lifetime tracking #1629

Comments

pythonesque commented Jul 10, 2021 • edited Loading

cwfitzgerald commented Jun 6, 2022

pythonesque commented Jul 10, 2021 •

edited

Loading