-
Notifications
You must be signed in to change notification settings - Fork 351
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Evaluate Profile-Guided Optimization (PGO) #2386
Comments
@zamazan4ik Could you write the documentation on how to optimize with PGO in our official documentation? |
I think I can. But do you want to publish this documentation based only on the tests above? Do you think that PGO according to the results above worth it? |
@zamazan4ik
|
I already posted some results in the starting post with my current results:
|
@zamazan4ik Thanks for your guide. 1ms probably doesn't have a strong meaning in the real world. However, youki is a product that is open to many new challenges, depending on our interests. How many times can you run it and still arrive at the same result? |
it's up to you :) However, if Youki really cares about performance, it's not a bad thing to automatically increase the performance for 1-2% just with a compiler option.
As you see, |
👍
I'm curious about multiple kernel versions. Is this compilation kernel-independent? It would be attractive if it did not. |
Yes, it does not depend on the kernel. So it would be nice if you can reproduce the results on different setups. |
@zamazan4ik Thanks for this issue and initial investigation! I don't have detailed idea on PGO uses apart from this blog on using pgo in rust compiler blog, so correct me if I'm wrong : If I'm correct a "pgo-build" has extra instructions in it similar to a code-coverage build, and it dumps some data in a file regarding actual function call and uses similar to coverage info. Then the next compilation uses this to optimize it. If this is correct, then
Please let me know if anything is wrong above, looking forward to your thoughts! EDIT: Also, if I understand correctly, then if we do decide on using pgo, we will have two compilation steps in our release workflow, and in between, we'll have some steps to generate the pgo data? Can we use pre-generated data to do pgo instead of having to generate data right before the second compilation (ignoring that this might not be as accurate)? |
Yes, you are right. I used only one test scenario because it was the only scenario that I found in the Youki repo. If we can collect profiles from other real-life workloads - it would be awesome!
Yes, technically it's possible to use the profiles from the unit tests for PGO. But I do not recommend it since usually, unit tests do not represent the real-life workload - unit tests tend to cover as many as possible cases (mostly cold paths of the program), so the optimizer will optimize not for real-life cases but for the unit tests.
Maybe, here I trust you as a domain expert :)
Yes, you can use a single run - it's completely fine. The only reason why I collected 100 profiles instead of one - that I was too lazy to change the running command :) If you want to collect multiple profiles from multiple workloads - it's also fine. You will need just merge them into one "prepared" profile with
Yes, usually PGO (via Instrumentation) means having a 2-stage compilation process. There is another PGO kind that is called Sampling PGO (or AutoFDO) - you can read about it at https://clang.llvm.org/docs/UsersManual.html#using-sampling-profilers but let's talk only about Instrumentation PGO for now. Yes, you can generate PGO data directly during the build process every time (compile with Instrumentation, run on some workload, recompile once again with the just collected profiles) or use pre-generated profiles and skip the Instrumentation stage. But if you use pre-generated profiles, you need to keep in mind the following things:
|
Hi!
Here I am posting an idea for optimizing Youki with Profile-Guided Optimization (PGO). Recently I started evaluating PGO across multiple software domains - all my current results are available here: https://github.com/zamazan4ik/awesome-pgo . For Youki I did some quick benchmarks on my local Linux machine and want to share the actual performance numbers.
Test environment
rustc 1.72.0 (5680fa18f 2023-08-23)
646c1034f78454904cc3e1ccec2cd8dc270ab3fd
commit) in themain
branchBenchmark
As a benchmark, I use the suggested in the README file workload with
sudo ./youki create -b tutorial a && sudo ./youki start a && sudo ./youki delete -f a
youki_release
is built withjust youki-release
. PGO optimized build is done with cargo-pgo (cargo pgo build
+ run the benchmark with the Instrumented Youki +cargo pgo optimize build
). As a training workload, I use the benchmark itself.Results
The results are presented in
hyperfine
format. All benchmarks are done multiple times, in different order, etc - the results are reproducible.Just for reference, I also share the results for Instrumentation mode:
According to the tests, PGO helps with achieving quite better performance (1-2%). Not a great win but it's not bad "just" for a compiler option. On a scale, even 1% is a good thing to achieve.
Further steps
If you think that it's worth it, I think we can perform more robust PGO benchmarks for Youki. And then document the results of the project. So other people will be able to optimize Youki for their own workloads.
The text was updated successfully, but these errors were encountered: