Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Expose maximum_frame_latency #4899

Merged
merged 14 commits into from
Jan 17, 2024

Conversation

Wumpf
Copy link
Member

@Wumpf Wumpf commented Dec 19, 2023

Connections

Description
Previously, the swapchain buffer count was hardcoded to be always 3 no matter what is supported. This PR exposes the desired swap chain count as a minimal extension to the wgc presentation setup.

Among other things this allows one now to switch between "Game Mode" and "Classic Game Mode" as classified here
https://www.intel.com/content/www/us/en/developer/articles/code-sample/sample-application-for-direct3d-12-flip-model-swap-chains.html

Prime motivation for this is to reduce perceived lag in egui applications where a buffer count of 2 makes usually more sense (assuming a stable and low workload).

Testing
Experimented with buffer count of 2 in egui example on Mac & Windows. Gives a lot less latency.

Checklist

  • Run cargo fmt.
  • Run cargo clippy. If applicable, add:
    • --target wasm32-unknown-unknown
    • --target wasm32-unknown-emscripten
  • Run cargo xtask test to run tests.
  • Add change to CHANGELOG.md. See simple instructions inside file.

@emilk
Copy link
Contributor

emilk commented Dec 19, 2023

I just want to emphasize what a huge difference changing going from 3 to 2 is on Mac, at least in windowed mode on a 60 Hz monitor. It takes the application from feeling sluggish to feeling responsive (using AutoVsync).

When the CPU is fast (e.g. producing a frame in 1ms), SwapChain::get_next_texture will block, saving CPU time and preventing tearing.

Interestingly though, when the CPU is too slow (e.g. taking 20ms to prepare a frame), then SwapChain::get_next_texture will not block. I'm not sure why, but I like it.

So: I'm planning on making 2 the default for all eframe apps

CHANGELOG.md Outdated Show resolved Hide resolved
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Marking request review per discussion in call

On most platforms this is merely a change in narrative, on DX12 it actually has a practical effect already since we use it to directly set frame latency
@Wumpf Wumpf changed the title Expose swapchain buffer count Expose maximum_frame_latency Dec 22, 2023
@Wumpf
Copy link
Member Author

Wumpf commented Dec 22, 2023

code looks mostly now like we want it to be I think! For sure way more expressive of the stated goal and in close alignment with

Need to update changelog and test whether maximum_frame_latency == 1 causes gpu/cpu to run in sync on Vulkan & Metal (hoping to find this answered with 'no, they still run parallel')

wgpu-hal/src/dx12/mod.rs Show resolved Hide resolved
wgpu-hal/src/dx12/mod.rs Outdated Show resolved Hide resolved
wgpu-hal/src/vulkan/adapter.rs Outdated Show resolved Hide resolved
wgpu-types/src/lib.rs Show resolved Hide resolved
CHANGELOG.md Outdated Show resolved Hide resolved
wgpu-types/src/lib.rs Outdated Show resolved Hide resolved
@Wumpf
Copy link
Member Author

Wumpf commented Dec 23, 2023

Got around to test with Vulkan on windows of when GPU/CPU ending up to working in sync.
My test setup was to render the fractal of wgpu-profiler's demo with more iterations (making it easy to control the gpu load) and do articial cpu load by via thread sleep. I then compared frame times with maximum_frame_latency 1 and maximum_frame_latency 2.
Turns out with a frame latency setting of 1 the gpu won't have any work between get_current_texture and the next queue submit. Or in other words: When I put the sleep after get_current_texture, the gpu will be fully idle since there's no work to be done at all. Meaning at the very least need to adjust the documentation.

On DX12 I'm hitting some odd bug making frame time expand to 1 full second, so I don't have test results from that so far.

Not the result I was hoping for. Need to repeat the same on Mac to better understand stuff.

@emilk
Copy link
Contributor

emilk commented Dec 23, 2023

On DX12 I'm hitting some odd bug making frame time expand to 1 full second, so I don't have test results from that so far.

Sounds like you might be hitting const FRAME_TIMEOUT_MS: u32 = 1000;

@Wumpf
Copy link
Member Author

Wumpf commented Dec 23, 2023

Same behavior on Metal.
Thinking about this some more this starts making more sense to me. Already updated docs accordingly again. I suppose setting the latency to 1 still makes sense for fast rendering ui intense applications, so exposing this is still good. But use of it definitely requires some special care to not completely starve the gpu.

The thing I don't get yet is how one CAMetalLayer can do a wait on present. #2869 mentioned one can but I didn't find out how.

@Wumpf Wumpf requested a review from cwfitzgerald December 23, 2023 16:57
Copy link
Member

@cwfitzgerald cwfitzgerald left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lets party!

@cwfitzgerald cwfitzgerald merged commit b8f27c7 into gfx-rs:trunk Jan 17, 2024
25 checks passed
@Wumpf Wumpf deleted the expose-swapchain-buffer-count branch January 17, 2024 17:52
emilk added a commit to emilk/egui that referenced this pull request Jan 24, 2024
Setting `desired_maximum_frame_latency` to a low value should
theoretically lead to lower latency in winit apps using `egui-wgpu`
(e.g. in `eframe` with `wgpu` backend).

* Replaces #3714
* See also gfx-rs/wgpu#4899

----

It seems like `desired_maximum_frame_latency` has no effect on my Mac. I
lowered my monitor refresh-rate to 30Hz to test, and can see no
difference between `desired_maximum_frame_latency` of `0` or `3`.

Before when experimenting with changing the global `DESIRED_NUM_FRAMES`
in `wgpu` I saw a huge difference, so I wonder what has changed.

I verified that `set_maximum_drawable_count` is being called with either
`1` or `2`, but I perceive no difference between the two.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: api Issues related to API surface
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants