-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Resource tracking overhead #1413
Comments
Assembly for the following functions need to be looked at under a microscope:
|
Another idea (3) - a merge of 1) and 2) but without any heavy changes. If we know something doesn't have a state (but just needs to be added to the lifetime tracker), then adding it to the render pass tracker is a waste. We could make it so only buffers and textures are tracked per pass (or per usage scope), and everything else goes straight to the command buffer tracker. For the animometer benchmark, it would cut the costs by almost the factor of 2. |
1417: Split the tracker into stateful/stateless to reduce the overhead r=cwfitzgerald a=kvark **Connections** Implements #1413 (comment) Reduces the overhead for resource tracking in the Animometer benchmark by up to 50%. **Description** We used to use the full tracker set on the usage scopes associated with compute/render passes. A resource tracker has 2 responsibilities: ensuring the resource is held alive, and validating and recording the state transitions. This PR exploits the fact that the latter responsibility is only applicable for buffers and textures. So doing all the lifetime tracking for a pass is a waste: we can instead just attach the lifetimes to the parent command buffer, straight. In the Animometer benchmark, there is one large buffer, and thousands of bind groups pointing to different offsets into it. The old code would fill up the pass tracker with those bind groups, and then merge it into the command buffer tracker. The new code would just fill up the command buffer tracker instead. Since there is only one buffer, the pass tracking becomes much lighter. **Testing** Untested. It would be nice to have some benchmarks here, possibly after #1397 ? Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
New profile after (3) is merged - https://share.firefox.dev/3pjq2Qa |
Closing after #2662 |
Related to #1411
Firefox profile - https://share.firefox.dev/3fvytEz
If you look at
run_render_pass_impl
in the flamegraph view, the Vulkan driver is taking less than half of that time.The rest is spent on barriers and tracking (everything related to
TrackerSet
), so clearly this is a hot spot.Ideas:
(init, current)
states per subresource. We could extend it to(init, current, next)
, so that the sync scope (of, say, a render pass) will be accumulated in thenext
. This would mean - no need to allocate/free tracker sets per pass. Everything would be done right in the command buffer tracker. Related to Scope-based usage tracking in the render pass #443The text was updated successfully, but these errors were encountered: