Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Very High CPU Usage with wgpu >= 0.20.0 #6338

Open
myth0genesis opened this issue Sep 28, 2024 · 15 comments
Open

Very High CPU Usage with wgpu >= 0.20.0 #6338

myth0genesis opened this issue Sep 28, 2024 · 15 comments
Labels
api: vulkan Issues with Vulkan platform: x11 Issues with integration with linux/x11 type: bug Something isn't working

Comments

@myth0genesis
Copy link

myth0genesis commented Sep 28, 2024

Description
There seems to be very high single-core CPU usage in versions of wgpu 0.20.0 and later.

Repro steps

  • Run an example that requires regular redraws (for example, skybox) in any version of wgpu before 0.20.0 and observe CPU usage.
  • Run the same example in wgpu 0.20.0 or later and observe much higher CPU usage.

Expected vs observed behavior
Attached is a video where I run the skybox example provided in the wgpu repo first with version 0.19.4, and then I run the same example again with wgpu 0.20.2, and then I run them both in the same order once again while keeping a CPU monitor open to observe the effects on CPU usage. Single-core CPU usage spikes to at or near 100% with wgpu 0.20.2 (and later). However, CPU usage with the 'skybox' example for wgpu 0.19.4 and earlier are at near idle levels.

Extra materials
The video I mention is attached below, as well as the first page of the perf reports for the skybox example for both wgpu 0.19.4 and wgpu 0.20.2 in human-readable plaintext format.
High_CPU_wgpu.webm
Perf_Report_wgpu-0.19.4.txt
Perf_Report_wgpu-0.20.2.txt

Platform
Operating System: Kubuntu 24.04
KDE Plasma Version: 5.27.11
KDE Frameworks Version: 5.115.0
Qt Version: 5.15.13
Kernel Version: 6.8.0-45-generic (64-bit)
Graphics Platform: X11
Processors: 12 × 12th Gen Intel® Core™ i9-12900H
Memory: 31.1 GiB of RAM
Graphics Processor: NVIDIA GeForce RTX 3070 Ti Laptop GPU/PCIe/SSE2
System Version: REV:1.0

@myth0genesis
Copy link
Author

myth0genesis commented Sep 28, 2024

My suspicion is that this is related to the Vulkan backend. I've built and run some of the Vulkano examples from their most recent release and noticed the top offenders also seem to be Mutexes and printfs similar to what the perf report in wgpu 0.20.0 and later show.

@Wumpf
Copy link
Member

Wumpf commented Sep 28, 2024

Very likely the same as

Can you try what happens when you disable validation? That's automatically the case in Release, for Debug it's opt out

@myth0genesis
Copy link
Author

I've already tried the fixes suggested in those issues, which is what prompted me to start this new one. I may have bungled the disabling validation bit because I'm not sure if that's the correct way to do it. I tried running the example setting the environment variable, as well as following the instructions here, where I commented out the relevant lines in wgpu/wgpu-hal/src/vulkan/instance.rs. Running in Release still has the same high CPU usage (though sometimes it's slightly lower than without WGPU_VALIDATION=0 and the perf report shows the same top offenders. But either way, the CPU usage, which is sometimes at 75% and sometimes at 100%, is still orders of magnitude higher than older versions of wgpu. Attached is a video showing the CPU usage when running in Release and turning off validation via the environment variable.

High_CPU_wgpu_No_Validation.webm

@Wumpf Wumpf added type: bug Something isn't working platform: x11 Issues with integration with linux/x11 labels Sep 29, 2024
@Wumpf
Copy link
Member

Wumpf commented Sep 29, 2024

thanks for follow-up! Bit of relevant context that I know of: from 0.19 to 0.20 there was a bunch of fixed that landed to how synchronization is done on Vulkan - in fact it was pretty bugged before. That also matches up well with the perf logs you attached:

9.57%  wgpu-examples    libc.so.6                          [.] pthread_mutex_lock@@GLIBC_2.2.5
8.79%  wgpu-examples    libc.so.6                          [.] pthread_mutex_unlock@@GLIBC_2.2.5

Looks like the internal spinning optimization of libc is now hit hard 🤔 (afaik libc first spins a bit before doing the syscalls to yield to the scheduler)
Bunch of next investigation steps I can think of:

  • which ones of the involved locks are the ones that cause so much cpu usage instead of yielding
    • if the "it's spinning locks"-hypothesis is right: would a lower level non spinning lock implementation not have this issue (likely causing unwanted delays)
  • can we see this also on Vulkan on Windows, form hypothesis why not and try to test them
  • is the Vulkan synchronization still wrong after all and if it isn't, how do other applications/samples usually handle this

@myth0genesis
Copy link
Author

myth0genesis commented Sep 29, 2024

I appreciate the quick response. I don't know for sure if it's something exclusive to wgpu. It's not an apples-to-apples comparison, as I don't know enough about Vulkan to understand how frame pacing works in any meaningful detail and the examples are obviously not the same, but I ran the triangle example from the most recent release of Vulkano and there was high CPU usage there, too. I've attached the first page of the perf report here and you might be interested to see the list of top offenders looks very familiar.
Perf_Report_Vulkano.txt

@myth0genesis
Copy link
Author

myth0genesis commented Sep 29, 2024

Okay. Scratch that last comment. I no longer think it's to do with the Rust Vulkan bindings. I should've looked beforehand, but I just today learned wgpu uses the Ash Vulkan bindings. So I ran the triangle example in the version of Ash that was first present in wgpu 0.20.0, 0.37.1, and no high CPU usage was observed. Attached is a video showing the results:

CPU_Usage_Ash.webm

@Wumpf Wumpf added the api: vulkan Issues with Vulkan label Sep 29, 2024
@John-Nagle
Copy link

Very likely the same as

* [100% CPU usage since the update to wgpu 0.20 emilk/egui#5092](https://github.com/emilk/egui/issues/5092)

* [Debug Performance Regressions on 0.20 Vulkan #5756](https://github.com/gfx-rs/wgpu/issues/5756)

Possibly also related:

I'm also seeing high CPU usage, but on something that was already CPU bound. So I see it as a frame rate drop and jank. The symptoms are different but the underlying cause may be related. This is on Vulkan, too.

Has anyone run Tracy on this high CPU usage problem? When I do that, I see that too much time is going into binding. But I didn't run sample-type profiling, which would tell whether too much time is being lost in lock primitives.

User-space spinlocks are prone to something called futex congestion collapse in multi-thread programs. But that shouldn't happen when you have fewer threads than CPUs. What's going on with these simple benchmarks in terms of threads vs. CPUs?

@myth0genesis
Copy link
Author

As mentioned above, I already checked CPU usage when disabling validation and when running in release and the issue still remained, as shown in the attached videos, so I'm doubtful it's related to those issues. As far as the WGPU 0.22.+ issue goes, this was a problem present for me post 0.20.0, which you can also see in the comparison videos I posted.
I also recently ran the examples in Arch on X11 using a different scheduler and the problem is still there. I wasn't able to give a comprehensive performance log previously because of how slow Ubuntu is with compiling the reports, but it shouldn't be so bad now that I'm on Arch. I'll try to take some samples and post the results here in the next week or so.

@c0repwn3r
Copy link

Any updates on this? Even running in release mode (which according to the other issues, should fix this) the CPU is still pinned at 100%

@cwfitzgerald
Copy link
Member

I think our main blocker is getting some more information - the hotspot information is useful, but does not really explain how we're hitting 100% on a single core when we should be blocking on vsync.

@c0repwn3r
Copy link

Got it. Anything I can provide to be helpful? I've got a super minimal (just a textured quad moving around) setup, around 300 lines, so I can grab some perf data if it's helpful

@c0repwn3r
Copy link

Alternatively, is there a way around it for the meantime? It's locking up my entire machine every time I start my program.. it's unsustainable. GL apparently just doesn't work on nvidia cards, otherwise I'd use that

@c0repwn3r
Copy link

Interesting thing: Got a buddy running voidlinux to run this, and it got normal CPU usage (2-5%). Could this be some strange interaction with systemd somehow? It wouldn't make sense, but it's the primary difference between his setup and the others posted in this thread

@John-Nagle
Copy link

Probably a different problem than I reported. I'm drawing enough stuff that I hit 100% CPU utilization on the render thread with the GPU only 25% loaded, but that's because I'm drawing a lot. If I draw less stuff, CPU usage goes down. It's all working, just not fast enough.

I've got a super minimal (just a textured quad moving around) setup

If you're hitting 100% CPU drawing that, but it still draws, maybe there's no frame rate limit and you're drawing at some insane frame rate, faster than the display.

@c0repwn3r
Copy link

That's what I thought too, so I added frame time tracing.. fairly constant 60fps (within a tenth).
Incidentally my nvidia driver exploded a few minutes ago and I tried again... now, it was using 350% CPU (as my other threads generating data to be rendered started to get activated), and rendering at a much higher framerate. So it does appear to be blocking on vsync, just in some wildly inefficient way.
Basic profiling shows the overwhelming majority of cpu time being spent in get_time. I suspect this is tracing_subscriber at play, so I'm just ignoring that. The rest of the time is spent in various C libraries.. no debugging symbols there (probably internals of the nvidia vulkan driver). Around 25% of cpu time was spent in mutex_lock. Only around 1% of cpu time was in actual rust code interestingly, which suggests to me something in vulkan is spinning (induced by wgpu?)

I can upload a perf file tomorrow if that would be helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api: vulkan Issues with Vulkan platform: x11 Issues with integration with linux/x11 type: bug Something isn't working
Projects
Status: Todo
Development

No branches or pull requests

5 participants