-
Notifications
You must be signed in to change notification settings - Fork 605
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add PGO support #797
Comments
I tested MSVC's PGO and it didn't make a difference. In Cemu most of the hot code is already hand-optimized to generate the most efficient assembly (e.g. avoiding branches, using SIMD where possible, even streamlining variable loads/stores). There just isn't enough wiggle room for PGO to really make a dent. That said, I haven't specifically tested PGO on clang or gcc but I'd expect similar results. If you believe there is performance to be gained, try it out for yourself and open a PR if you see any actual improvements. |
@Exzap is there available any benchmark for CEMU? Or the best way to collect PGO profiles and compare non-PGO vs PGO versions is to run games with CEMU and compare FPS/CPU utilization between the versions? |
Nah we have no benchmark suite. Just run a bunch of games multiple times and write down the metrics. Make sure the conditions are as equal as possible. E.g. for some games like Super Mario 3D World or BotW you can load a save and get the same camera angle every time if you dont touch the controls after loading. This way of testing will give you some noise in the results but if the improvement is smaller than the noise it's not worth it anyway. |
Well, I did some benchmarks. My setup:
At first, I cheated a little bit and chose CPU render mode since afaik it would be the only execution mode capped by CPU and not by Video card nor 60 FPS cap. PGO with instrumentation slows Cemu in twice (drop from 25 FPS to 10-12). However, even after collecting the profile with instrumentation and recompiling the optimization build, I found that FPS dropped (smth to 20 FPS from 25-28 FPS in the usual release build). Next, I tried PGO with sampling (AutoFDO). I recorded a profile with As a last step, I tried to apply BOLT to the AutoFDO-optimized build. Here the experiment was short - Cemu segfaults after optimization by BOLT. I will report it to the upstream later. Not sure, what should we do next with this result - let's at least it would be written for the history here :) |
Also tried to test with Bayonetta 2 but it SIGSEGV's in both (usual release and AutoFDO-optimized) Cemu binaries on a load stage. |
That sounds like #781. By default Bayonetta 2 uses multi-core recompiler, while Mario Kart 8 use single-core recompiler. |
Mario Kart 8 was a bad profile, has since been updated to be allowed to use multi-core. |
For gaining more performance - did anyone try to apply Profile-Guided Optimization (PGO) to CEMU? It could help with better compiler optimizations like inlining, hot-cold code split, etc. For projects like Rust, Clang, YDB, and CPython PGO brings a good performance boost (usually up to 20%).
If it really could help - would be nice to see PGO support in the upstream.
The text was updated successfully, but these errors were encountered: