-
-
Notifications
You must be signed in to change notification settings - Fork 3.1k
Module: omit frame pointer in ReleaseFast #24145
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Because it frees up a register it can have performance implications on the program and because we're optimizing for speed and not debugging, I think it's a no-brainer to make this true. In general, I expect performance with this change to be strictly better or unchanged, both at the cost of it being more difficult to obtain a stack trace but that's not the goal of ReleaseFast.
red_zone: ?bool = null, | ||
/// Whether to omit the stack frame pointer. Frees up a register and makes it | ||
/// Whether to omit the stack frame pointer. Frees up a register but makes it | ||
/// more difficult to obtain stack traces. Has target-dependent effects. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there any "target-dependent effects" that can possibly be bad for performance? In that case it might make sense to leave this up to the user after all, or decide based on the target.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Omitting the frame pointer on i386 and riscv64 appear to result in worse performance:
Here's the test code I wrote generated:
mandelbrot.zig.txt
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interesting. Well I really seem to get absolutely no difference on x86_64:
$ zig build-exe mandelbrot.zig -fno-omit-frame-pointer -OReleaseFast --name keep
$ zig build-exe mandelbrot.zig -fomit-frame-pointer -OReleaseFast --name omit
$ poop ./omit ./keep
Benchmark 1 (4 runs): ./omit
measurement mean ± σ min … max outliers delta
wall_time 1.49s ± 53.8ms 1.45s … 1.57s 0 ( 0%) 0%
peak_rss 319KB ± 0 319KB … 319KB 0 ( 0%) 0%
cpu_cycles 3.49G ± 6.55M 3.49G … 3.50G 0 ( 0%) 0%
instructions 5.94G ± 77.0 5.94G … 5.94G 0 ( 0%) 0%
cache_references 55.8K ± 45.2K 30.4K … 124K 0 ( 0%) 0%
cache_misses 46.9K ± 43.4K 24.4K … 112K 0 ( 0%) 0%
branch_misses 56.2M ± 43.3K 56.2M … 56.3M 0 ( 0%) 0%
Benchmark 2 (4 runs): ./keep
measurement mean ± σ min … max outliers delta
wall_time 1.45s ± 48.7ms 1.40s … 1.52s 0 ( 0%) - 2.5% ± 6.0%
peak_rss 319KB ± 0 319KB … 319KB 0 ( 0%) - 0.0% ± 0.0%
cpu_cycles 3.52G ± 5.95M 3.51G … 3.52G 0 ( 0%) + 0.7% ± 0.3%
instructions 5.94G ± 115 5.94G … 5.94G 1 (25%) + 0.0% ± 0.0%
cache_references 72.1K ± 32.4K 28.7K … 107K 0 ( 0%) + 29.3% ± 122.1%
cache_misses 58.8K ± 24.9K 25.1K … 84.9K 0 ( 0%) + 25.3% ± 130.5%
branch_misses 56.4M ± 66.4K 56.4M … 56.5M 0 ( 0%) + 0.4% ± 0.2%
$ ../zig/poop ./keep ./omit
Benchmark 1 (4 runs): ./keep
measurement mean ± σ min … max outliers delta
wall_time 1.48s ± 71.7ms 1.39s … 1.57s 0 ( 0%) 0%
peak_rss 319KB ± 0 319KB … 319KB 0 ( 0%) 0%
cpu_cycles 3.51G ± 3.47M 3.51G … 3.51G 0 ( 0%) 0%
instructions 5.94G ± 59.2 5.94G … 5.94G 0 ( 0%) 0%
cache_references 37.0K ± 6.45K 31.1K … 44.5K 0 ( 0%) 0%
cache_misses 27.6K ± 4.20K 22.1K … 32.3K 0 ( 0%) 0%
branch_misses 56.3M ± 99.0K 56.2M … 56.4M 0 ( 0%) 0%
Benchmark 2 (4 runs): ./omit
measurement mean ± σ min … max outliers delta
wall_time 1.39s ± 41.1ms 1.35s … 1.44s 0 ( 0%) - 5.6% ± 6.8%
peak_rss 319KB ± 0 319KB … 319KB 0 ( 0%) - 0.0% ± 0.0%
cpu_cycles 3.49G ± 1.53M 3.49G … 3.49G 0 ( 0%) - 0.5% ± 0.1%
instructions 5.94G ± 39.1 5.94G … 5.94G 0 ( 0%) - 0.0% ± 0.0%
cache_references 33.4K ± 3.26K 28.8K … 36.5K 1 (25%) - 9.7% ± 23.9%
cache_misses 26.3K ± 2.85K 23.3K … 30.2K 0 ( 0%) - 4.8% ± 22.5%
branch_misses 56.2M ± 42.9K 56.2M … 56.3M 0 ( 0%) - 0.2% ± 0.2%
Also are you sure that's worse? Is it still worse if you run multiple times and swap the order of the commands?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't test x86_64, I just tested x86 (32 bit) there. I'm pretty confident in my benchmark of x86, but I could test further.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's interesting. And really doesn't make sense.
Does that mean they should in fact be included only for those targets by default?
From what I'm reading though, especially on 32-bit x86 it was popular to omit the frame pointer because it has far less registers. It was faster to omit.
On x86_64 though there are enough registers that it really doesn't seem to have any impact.
See #22180 which has some reasoning from when frame pointers were added by default in ReleaseFast and ReleaseSafe |
See https://www.brendangregg.com/blog/2024-03-17/the-return-of-the-frame-pointers.html
|
Ah, well, no matter how small the overhead, there's an overhead and If you want to do "debugging and profiling work" either use debug mode or enable frame pointers on your own if you need them anyway. In the case of that article, it seems like looking at flame graphs would be difficult if frame pointers were disabled by default in ReleaseFast. So how about how enable them if you do benchmarking with flame graphs? A lot of Zig users seem to use tools like Maybe if something like flamegraph (the tool written in Rust) were written in Zig, it could integrate with the build system and turn on frame pointers automatically. |
Anyone here that actually uses flame graphs or some profiling tool that would break if frame pointers were omitted in ReleaseFast builds by default? And what would inconvenience you so much to enable them on your own when you do profiling? |
Regarding flamegraphs, my opinion is simply that software that is released should support being profiled. For that to happen, either the toolchain needs to not omit frame pointers by default, or every distribution and every project needs to override the default. I shouldn't have to go recompile everything. This is particularly important for whole-system profiling where you are looking at every program on the system and hence they would all have to have frame pointers. Frame pointers are also very useful in debugging (yes, sometimes you have to debug a release binary) and crash reporting. Suppose I release a Zig program which has a crash. If frame pointers are omitted by default, then the default UX is:
This is far more disruptive than the program being slightly slower. |
They're for different purposes. Poop, hyperfine, and similar tools are good at measuring outcomes: once I've made some change, they can tell me it is X% faster or slower. Flame graphs are a tool for investigating: they will show you which parts of a program are taking a long time to run and therefore which parts it might be profitable to try optimizing. |
Yeah, I can't really see this PR being merged anyway so no point. I was just going by what the ReleaseFast option is supposed to be (an all-or-nothing). |
Because it frees up a register it can have performance implications on the program and because we're optimizing for speed and not debugging, I think it's a no-brainer to make this true.
In general, I expect performance with this change to be strictly better or unchanged, both at the cost of it being more difficult to obtain a stack trace but that's not the goal of ReleaseFast.
How can performance be possibly worse if you have an additional register unless the code generator's register allocator has some bug where that somehow makes it perform worse? Can't really imagine that.
Wanted to include some kind of benchmark here but from the benchmarking I was able to do on the compiler, there doesn't seem to be a difference for
zig ast-check
with or without the frame pointer (anything beyond-Ddev=ast_gen
takes too long to build with ReleaseFast...)