[NativeAOT/ARM64] Generate frames compatible with Apple compact unwinding #107766
+329
−86
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Contributes to #76371
The are two changes in the PR that work in tandem. The first is a JIT change for generating slightly different frame layout on NativeAOT/ARM64/Apple platforms (iOS, macOS and tvOS). The second change is ObjWriter code that recognizes the structurally compatible unwinding information and generates compact unwinding codes in the object files instead of verbose DWARF unwinding information.
For NativeAOT/ARM64/Apple ABI do the following:
lvaFrameAddress
to rewrite FP-x references to SP+y when possible. This allows efficient addressing using positive indexes when FP points to the top of the frame. It mimics similar optimization on ARM32.Each of these changes comes with some caveats:
lvaFrameAddress
. The additional increase in prolog size also causes a cascading effect where loop alignment and related code alignment (32-byte alignment for method start) significantly contribute to the code section size. In some cases the same alignment logic may also reduce the size.As you can see, this is a bit of a trade-off and it's valid to ask if it's worth it.
Firstly, let me address the size question. The loop alignment seems to be the biggest contributor to the code size variation, and I filed issue #107284 to investigate whether we can come up with a better defaults for Apple platforms. The code size changes are predominantly contained to the small fixed change in the prolog size and the additional alignment. When testing the prototype on MAUI apps the biggest culprit to increased code size was the code generated from XAML that generates methods with ton of local variables. The change in
lvaFrameAddress
practically eliminated any negative effect of the frame layout on this type of code. Without the change the regression was nearly 50% due to stack references needing an indirect load with a register and extra instruction. Outside of MAUI the biggest visible effect is on code from Regex source generator but in that case it's quite evenly split between size improvements and regressions. That suggests the regex code may be a good candidate for measuring the performance characteristics of the loop alignment. The code size regressions on few examples I tried amount to around 2% (incl. the alignment). The saved space in the DWARF unwinding section is hovering around 90% +- 3%. To put that into absolute numbers we are looking at savings around 0.7 Mb fordotnet new maui
app and around 3 Mb forSystem.Runtime.Tests
in the linked executables. The savings in the size of the unwinding information far outweigh any increase in the code size.Secondly, part of the motivation is that the Apple linker is notoriously buggy with processing the DWARF unwinding data. The compact unwinding tables are used as an index to the DWARF data for anything that cannot be expressed using compact unwinding code directly. Due to the structure of the tables this limits the effective offset of DWARF info to 24 bits and places a hard limit on the DWARF unwinding info size. At least with some versions of the Apple linker, breaking this limit results in silent corruption and runtime failures.
Comparison between `main` and PR for System.Runtime.Tests in Release build
Compare raw size of linked binaries:
Bloaty check of linked binaries:
Bloaty check of object files:
Bloaty check of object files (detailed):