Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resource tracking overhead #1413

Closed
kvark opened this issue May 28, 2021 · 4 comments
Closed

Resource tracking overhead #1413

kvark opened this issue May 28, 2021 · 4 comments
Labels
area: performance How fast things go help required We need community help to make this happen. type: enhancement New feature or request

Comments

@kvark
Copy link
Member

kvark commented May 28, 2021

Related to #1411
Firefox profile - https://share.firefox.dev/3fvytEz
If you look at run_render_pass_impl in the flamegraph view, the Vulkan driver is taking less than half of that time.
The rest is spent on barriers and tracking (everything related to TrackerSet), so clearly this is a hot spot.

Ideas:

  1. Most buffers and textures are immutable after they got the initial contents transferred into them. It would be wonderful to find a way to avoid tracking them entirely, or at least up to the point where they do get mutated.
  2. Today the tracker has (init, current) states per subresource. We could extend it to (init, current, next), so that the sync scope (of, say, a render pass) will be accumulated in the next. This would mean - no need to allocate/free tracker sets per pass. Everything would be done right in the command buffer tracker. Related to Scope-based usage tracking in the render pass #443
@kvark kvark added type: enhancement New feature or request help required We need community help to make this happen. area: performance How fast things go labels May 28, 2021
@kvark
Copy link
Member Author

kvark commented May 29, 2021

Assembly for the following functions need to be looked at under a microscope:

  • gfx_backend_vulkan::command::CommandBuffer::bind_descriptor_sets: roughly 1/9 of it is actually spent in ash::vk::features::DeviceFnV1_0::cmd_bind_descriptor_sets. The rest includes boilerplate an inlined pipeline compatibility logic.
  • wgpu_core::command::CommandBuffer::insert_barriers seems quite heavy, also does some refcount stuff (unexpectedly?)
  • wgpu_core::command::bind::Binder::assign_group does refcounting. I don't think it should?
  • wgpu_core::track::TrackerSet::merge_extend and wgpu_core::track::ResourceTracker::use_extend: maybe we can make them cache friendlier?
  • wgpu_core::hub::Storage::get shows up a bit everywhere, but a lot in submit()

@kvark
Copy link
Member Author

kvark commented May 31, 2021

Another idea (3) - a merge of 1) and 2) but without any heavy changes. If we know something doesn't have a state (but just needs to be added to the lifetime tracker), then adding it to the render pass tracker is a waste. We could make it so only buffers and textures are tracked per pass (or per usage scope), and everything else goes straight to the command buffer tracker. For the animometer benchmark, it would cut the costs by almost the factor of 2.

bors bot added a commit that referenced this issue Jun 1, 2021
1417: Split the tracker into stateful/stateless to reduce the overhead r=cwfitzgerald a=kvark

**Connections**
Implements #1413 (comment)
Reduces the overhead for resource tracking in the Animometer benchmark by up to 50%.

**Description**
We used to use the full tracker set on the usage scopes associated with compute/render passes. A resource tracker has 2 responsibilities: ensuring the resource is held alive, and validating and recording the state transitions. This PR exploits the fact that the latter responsibility is only applicable for buffers and textures. So doing all the lifetime tracking for a pass is a waste: we can instead just attach the lifetimes to the parent command buffer, straight.

In the Animometer benchmark, there is one large buffer, and thousands of bind groups pointing to different offsets into it. The old code would fill up the pass tracker with those bind groups, and then merge it into the command buffer tracker. The new code would just fill up the command buffer tracker instead. Since there is only one buffer, the pass tracking becomes much lighter.

**Testing**
Untested. It would be nice to have some benchmarks here, possibly after #1397 ?

Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
@kvark
Copy link
Member Author

kvark commented Jun 4, 2021

New profile after (3) is merged - https://share.firefox.dev/3pjq2Qa
There are still things to address here. Also there is a gap not annotated by anything. Run render pass doing something heavy?

@cwfitzgerald
Copy link
Member

Closing after #2662

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: performance How fast things go help required We need community help to make this happen. type: enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants