Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Render pass descriptor cache for Metal #2264

Merged
merged 1 commit into from
Jul 24, 2018
Merged

Conversation

kvark
Copy link
Member

@kvark kvark commented Jul 24, 2018

Includes #2260
Fixes two of the performance issues in #2161 (RP desc locking and copying costs).

Immediate recording FPS doesn't seem to change, maybe slightly lower (touching 90 from below more than from above, as I recall it doing - need to confirm. Edit - confirmed to not be caused by the PR).
Deferred recording FPS seem to go from lower 100s to higher, or even from 100 to 110, roughly speaking. There are barely any bottlenecks left for it, outside of the general architecture. Main thread now spends about 14.5% in our code, which is mostly covered by driver interaction.

PR checklist:

  • make succeeds (on *nix)
  • make reftests succeeds
  • tested examples with the following backends: Metal
  • rustfmt run on changed code

@kvark kvark requested a review from grovesNL July 24, 2018 01:26
Copy link
Contributor

@grovesNL grovesNL left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

if aspects.contains(Aspects::DEPTH) {
key.operations.push(rat.ops);
if rat.ops.load == AttachmentLoadOp::Clear {
key.clear_data.push(unsafe { *(&cv.depth_stencil.depth as *const _ as *const u32) });
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we need to union inside of our union... 😃

@grovesNL
Copy link
Contributor

bors r+

bors bot added a commit that referenced this pull request Jul 24, 2018
2264: Render pass descriptor cache for Metal r=grovesNL a=kvark

~~Includes #2260~~
Fixes two of the performance issues in  #2161 (RP desc locking and copying costs).

Immediate recording FPS doesn't seem to change, maybe slightly lower (touching 90 from below more than from above, as I recall it doing - need to confirm. Edit - confirmed to not be caused by the PR).
Deferred recording FPS seem to go from lower 100s to higher, or even from 100 to 110, roughly speaking. There are barely any bottlenecks left for it, outside of the general architecture. Main thread now spends about 14.5% in our code, which is mostly covered by driver interaction.

PR checklist:
- [ ] `make` succeeds (on *nix)
- [x] `make reftests` succeeds
- [x] tested examples with the following backends: Metal
- [ ] `rustfmt` run on changed code


Co-authored-by: Dzmitry Malyshau <kvarkus@gmail.com>
@bors
Copy link
Contributor

bors bot commented Jul 24, 2018

@bors bors bot merged commit 867315c into gfx-rs:master Jul 24, 2018
@kvark kvark deleted the mtp-rp-cache branch July 24, 2018 14:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants