Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use .alt_entry/.private_extern on Apple platforms #106224

Closed
wants to merge 12 commits into from

Conversation

filipnavara
Copy link
Member

@filipnavara filipnavara commented Aug 10, 2024

Background

Unlike ELF which can support pretty much arbitrary number of sections, the Mach-O object files are limited to ~255 sections. Apple uses the .subsections_via_symbols linking approach to circumvent this limitation. In this model the external symbols are used to slice the section into subsections, or atoms, which are then used as the linking unit (for purpose of dead stripping, reordering, etc.).

.alt_entry is an escape hatch to introduce symbols that don't divide a (sub)section into multiple atoms. It can be combined with .global or .private_extern to create symbols pointing into the middle of a method or data while making sure that the linker doesn't try to slice the method/data and place it into two different locations in the output.

.private_extern is similar to combination of symbol marked with .global and .hidden on ELF platforms. It's used to mark symbols with limited visibility. When the file is fed to the static linker, it clears the N_EXT bit for each symbol with the N_PEXT bit set. The ld option -keep_private_externs turns off this behavior.

Problem

NativeAOT and CoreCLR have assembly files that need to reference instructions in the middle of the method. This is used for marking places recognized by the SIGSEGV handler as valid instructions that are expected to produce a signal which can transformed to NullReferenceException. CoreCLR also uses it in parts of code that are patchable at runtime (applies only to macOS x64 for the purpose of this PR).

Additionally, NativeAOT has a concept of dehydrated data that are unpacked at runtime into the .hydrated section. The dehydrated data use relative offsets and rely on the linker keeping the layout of the .hydrated section intact.

.subsections_via_symbols was emitted by Clang for over a decade now and the new Apple linker introduced in Xcode 15 no longer correctly supports object files without it. The old linker is scheduled to be removed in Xcode 16 which is currently in beta. Additionally, Xcode 16 is more strict at enforcing the rules for .global labels inside assembly files. While the current Xcode 16 betas have additional issues (upstream LLVM links: llvm/llvm-project#82261, llvm/llvm-project#97116), it is prudent to change the code to emit .alt_entry symbols instead of depending on undefined linker behavior.

Solution

This PR is changing the code to correctly use .alt_entry/.private_extern and compile with Xcode 14/15.

It's best reviewed commit-by-commit:

  • 13e6013 updates macros used in NativeAOT assembly code to use .alt_entry + .private_extern instead of .global to mark code location labels.
  • 5fe3a06 updates the ILC compiler to emit N_ALT_ENTRY flag for any symbols pointing to the middle of the node.
  • ea6ab79 updates the ILC compiler to treat the whole .hydrated section as single unbreakable atom for the linker. It also updates the Mach-O emitter to emit the N_PEXT (.private_extern) flag to match the ELF behavior.

@dotnet-issue-labeler dotnet-issue-labeler bot added the needs-area-label An area label is needed to ensure this gets routed to the appropriate area owners label Aug 10, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Aug 10, 2024
the Mach-O linker. We depend on the layout when rehydrating it using
relative offsets. This was previously achieved using the N_NO_DEAD_STRIP
flag but N_ALT_ENTRY is a stronger guarantee.

Also, emit N_PEXT (.private_extern) flag for all non-global symbols to
align the behavior with ELF targets.
Type = N_SECT | N_EXT,
Descriptor =
definition.AltEntry || definition.SectionIndex == _hydrationTargetSectionIndex ?
N_ALT_ENTRY : N_NO_DEAD_STRIP,
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strictly speaking the N_NO_DEAD_STRIP should no longer be required with the changes in this PR. I left it in out of abundance of caution and to minimize the chance of something going wrong.

Copy link
Member Author

@filipnavara filipnavara Aug 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Turns out, caution was warranted. New Xcode linker still messes up the .hydrated section even if it's completely composed of N_ALT_ENTRY symbols. I changed it to N_ALT_ENTRY | N_NO_DEAD_STRIP and I will probably follow up with a bug report to Apple if I can make a small repro.

Copy link
Member Author

@filipnavara filipnavara Aug 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the pattern, will fix ILC to avoid producing it:

test_data.S:

.subsections_via_symbols
.section __DATA,hydrated
.global .hydrated // <-- symbol we generate at beginning of .hydrated section
.hydrated:
lsection6: // <-- local symbol for beginning of section
.alt_entry _foo_data
.private_extern _foo_data
_foo_data:
  .dword 0
.alt_entry _bar_data
.private_extern _bar_data
_bar_data:
  .dword 1

test_code.S:

extern char *foo_data;
int main() { return (int)(long)&foo_data; }

Build with clang test_data.S test_code.c -o ./test -Wl,-dead_strip [-Wl,-ld_classic]. Check output with nm -nm ./test

For new linker we get:

0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100003f8c (__TEXT,__text) external _main
0000000100004000 (__DATA,hydrated) external .hydrated
0000000100004000 (__DATA,hydrated) non-external (was a private external) _foo_data

For old linker we get:

0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100003f9c (__TEXT,__text) external _main
0000000100004000 (__DATA,hydrated) external .hydrated
0000000100004000 (__DATA,hydrated) non-external (was a private external) _foo_data
0000000100004008 (__DATA,hydrated) non-external (was a private external) _bar_data

This only happens if we produce the local section symbol.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Filed as Apple bug FB14743667. The workaround in ILC is orthogonal to the changes in this PR, so I will likely submit it separately after I do sufficient testing. Keeping N_NO_DEAD_STRIP is sufficient workaround for now.

Copy link
Member Author

@filipnavara filipnavara Aug 10, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Funnily enough, if I rename lsectionXX to ltmpXX (ie. the same symbol prefix as emitted for LLVM temporary variables) it works. That seems to be cheap enough workaround.

Turns out, even in the old ld64 there are hacks (1, 2) for l-prefixed and ltmp-prefixed labels getting special treatment.

@filipnavara filipnavara marked this pull request as draft August 10, 2024 14:45
@filipnavara
Copy link
Member Author

filipnavara commented Aug 10, 2024

Temporarily switched back to draft while I try to validate what is happening with the new linker (Xcode 15/16). While we currently disable it I want to ensure that this doesn't introduce any additional regression that we would have to deal with. Seems like the .hydrated section may not be linked correctly with it.

@filipnavara filipnavara marked this pull request as ready for review August 10, 2024 14:58
@filipnavara
Copy link
Member Author

filipnavara commented Aug 10, 2024

Re-tested with the following scenarios:

  • Xcode 15.4 running the whole build, building and running NativeAOT smoke tests
  • Xcode 15.4 used for building the runtime and libs; Xcode 16 beta 5 used for building the NativeAOT smoke tests
  • Xcode 15.4 used for building the runtime and libs; Xcode 16 beta 5 and new linker used for building the NativeAOT smoke tests. I switched the active Xcode after building the runtime and commented out these lines in artifacts/bin/coreclr/osx.arm64.Release/build/Microsoft.NETCore.Native.Unix.targets:
    <ItemGroup Condition="'$(UseLdClassicXCodeLinker)' != 'false' and '$(_IsApplePlatform)' == 'true'">
    <CustomLinkerArg Condition="'$(_XcodeVersion)' &gt;= '15'" Include="-ld_classic" />
    </ItemGroup>

Copy link
Contributor

Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas
See info in area-owners.md if you want to be subscribed.

@MichalStrehovsky
Copy link
Member

Strictly speaking the N_NO_DEAD_STRIP should no longer be required with the changes in this PR. I left it in out of abundance of caution and to minimize the chance of something going wrong.

Does this work if dehydration is turned off? I don't fully understand why the hydrated section is special. Is it because we happen to place everything that cannot be broken up into that section right now (when dehydration is turned on, that is)?

@filipnavara
Copy link
Member Author

Does this work if dehydration is turned off?

Yes

I don't fully understand why the hydrated section is special. Is it because we happen to place everything that cannot be broken up into that section right now (when dehydration is turned on, that is)?

It's special because the dehydrated data indexes the hydrated section using relative offsets embedded in the dehydrated data that are not understood by the native linker.

In the atom linking model the native linker is free to reorder atoms (subsections), or discard them if they are unreferenced (from the linker's point of view, ie. relocation targets). Thus the whole "hydrated" section is by default considered a bunch of atoms that can be reordered or removed. If that happens the offsets in the dehydrated data no longer match the symbols and relocations that point to them.

The idea is to turn the whole section into one single non-breakable and non-relocatable atom for the native linker. That's accomplished by making the whole section start with the ".hydrated" symbol, which is the start of the atom, and then marking any other symbol in the section as "alternative entry" symbol. "alternative entry" symbols don't break up atoms (*).

We previously started to mark all the symbols with "no dead strip" marker. That guarantees that the symbol is preserved in the final linked image. It doesn't guarantee that it's present in the original order which is a stronger guarantee that this PR is trying to enforce with documented behavior instead of relying on undefined behavior.

(*) Unless you hit a linker bug, like with the new Xcode 16 linker

@MichalStrehovsky
Copy link
Member

It's special because the dehydrated data indexes the hydrated section using relative offsets embedded in the dehydrated data that are not understood by the native linker.

Don't we do the same thing when dehydration is turned off? E.g. the fact that MethodTable symbols are prefixed by GCDesc, or non-GC static bases are prefixed by the class constructor context. If the linker decides to shuffle these, bad things will happen. I would expect we do the same thing for all sections we emit. The compiler doesn't expect linker messing with section contents.

@filipnavara
Copy link
Member Author

Don't we do the same thing when dehydration is turned off? E.g. the fact that MethodTable symbols are prefixed by GCDesc, or non-GC static bases are prefixed by the class constructor context. If the linker decides to shuffle these, bad things will happen. I would expect we do the same thing for all sections we emit. The compiler doesn't expect linker messing with section contents.

That largely depends on which symbols are produced in the object file and whether they are referenced or not. As far as I can tell, EETypeNode calls ObjectDataBuilder.AddSymbol only once. If you have single symbol for a node, you are creating a single atom for the native linker.

For places which call ObjectDataBuilder.AddSymbol more than once it would result in more atoms in the native linker (*). They are currently all marked with N_NO_DEAD_STRIP, so the linker will not remove them. It will also very likely not reorder them but that's not guaranteed without N_ALT_ENTRY merging the symbols into single atom. Commit 5fe3a06 in this PR does exactly that.

(*) ...or rather it may result in more atoms in the linker. The heuristics for code also look at unwinding information, so I am more concerned with DATA.

Copy link
Member

@MichalStrehovsky MichalStrehovsky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me otherwise! Thank you!

Comment on lines +476 to +479
// We emit a local N_NO_DEAD_STRIP symbol for the beginning of all sections marked with
// KeepDataLayout to ensure the final layout is kept intact by the linker. This works in
// tandem with setting the N_ALT_ENTRY flag for all other symbols in the same section in
// the loop below.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the PR version still doing this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, it is.

// Match the logic for KeepDataLayout in MachSection.CreateSection. If we are in a data section
// we don't need to have a symbol for beginning of the node data. For other nodes, particularly
// executable code, we enforce it though.
Debug.Assert(hasInitialEntrypoint || section.Type is SectionType.ReadOnly or SectionType.Writeable or SectionType.Uninitialized);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we avoid duplicating the section.Type is SectionType.ReadOnly or SectionType.Writeable or SectionType.Uninitialized part?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a public getter.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@MichalStrehovsky
Copy link
Member

/azp run runtime-extra-platforms

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

/azp run runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@MichalStrehovsky
Copy link
Member

/azp run runtime-extra-platforms, runtime-nativeaot-outerloop

Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@MichalStrehovsky
Copy link
Member

clang or linker seems to be segfaulting in the outerloop runs for at least one of the larger tests :(

  clang: error: linker command failed due to signal (use -v to see invocation)
/Users/runner/work/1/s/artifacts/bin/coreclr/osx.x64.Release/build/Microsoft.NETCore.Native.targets(376,5): error MSB3073: The command ""/usr/bin/clang" "/Users/runner/work/1/s/artifacts/obj/System.Text.Json.SourceGeneration.Roslyn4.4.Tests/Release/net9.0/native/System.Text.Json.SourceGeneration.Roslyn4.4.Tests.o" -o "/Users/runner/work/1/s/artifacts/bin/System.Text.Json.SourceGeneration.Roslyn4.4.Tests/Release/net9.0/native/System.Text.Json.SourceGeneration.Roslyn4.4.Tests" -exported_symbols_list "/Users/runner/work/1/s/artifacts/obj/System.Text.Json.SourceGeneration.Roslyn4.4.Tests/Release/net9.0/native/System.Text.Json.SourceGeneration.Roslyn4.4.Tests.exports" -Wl,-dead_strip -gz=zlib /Users/runner/work/1/s/artifacts/bin/coreclr/osx.x64.Release/aotsdk/libbootstrapper.o /Users/runner/work/1/s/artifacts/bin/coreclr/osx.x64.Release/aotsdk/libRuntime.WorkstationGC.a /Users/runner/work/1/s/artifacts/bin/coreclr/osx.x64.Release/aotsdk/libeventpipe-disabled.a /Users/runner/work/1/s/artifacts/bin/coreclr/osx.x64.Release/aotsdk/libRuntime.VxsortEnabled.a /Users/runner/work/1/s/artifacts/bin/coreclr/osx.x64.Release/aotsdk/libstandalonegc-disabled.a /Users/runner/work/1/s/artifacts/bin/coreclr/osx.x64.Release/aotsdk/libstdc++compat.a /Users/runner/work/1/s/artifacts/bin/testhost/net9.0-osx-Release-x64/shared/Microsoft.NETCore.App/9.0.0/libSystem.Native.a /Users/runner/work/1/s/artifacts/bin/testhost/net9.0-osx-Release-x64/shared/Microsoft.NETCore.App/9.0.0/libSystem.Globalization.Native.a /Users/runner/work/1/s/artifacts/bin/testhost/net9.0-osx-Release-x64/shared/Microsoft.NETCore.App/9.0.0/libSystem.IO.Compression.Native.a /Users/runner/work/1/s/artifacts/bin/testhost/net9.0-osx-Release-x64/shared/Microsoft.NETCore.App/9.0.0/libSystem.Net.Security.Native.a /Users/runner/work/1/s/artifacts/bin/testhost/net9.0-osx-Release-x64/shared/Microsoft.NETCore.App/9.0.0/libSystem.Security.Cryptography.Native.Apple.a /Users/runner/work/1/s/artifacts/bin/testhost/net9.0-osx-Release-x64/shared/Microsoft.NETCore.App/9.0.0/libSystem.Security.Cryptography.Native.OpenSsl.a --target=x86_64-apple-macos12.0 -g -Wl,-rpath,'@executable_path' -ldl -lobjc -lswiftCore -lswiftFoundation -lz -licucore -lm -L/usr/lib/swift -framework CoreFoundation -framework CryptoKit -framework Foundation -framework Security -framework GSS" exited with code 254. [/Users/runner/work/1/s/src/libraries/System.Text.Json/tests/System.Text.Json.SourceGeneration.Tests/System.Text.Json.SourceGeneration.Roslyn4.4.Tests.csproj::TargetFramework=net9.0]

@filipnavara
Copy link
Member Author

clang or linker seems to be segfaulting in the outerloop runs for at least one of the larger tests :(

I'll check it locally but likely not until tomorrow :-/

@filipnavara
Copy link
Member Author

filipnavara commented Aug 14, 2024

For posterity, the linker crash is stack overflow in the recursive call:

ld::tool::Resolver::markLive(ld::Atom const&, ld::tool::Resolver::WhyLiveBackChain*)

I am not sure there's an easy solution without some out-of-the-box thinking. (The extreme measure would be to resolve the data symbols to section-based relocations with offset, but that seems like quite a harsh solution.)

@agocke
Copy link
Member

agocke commented Aug 14, 2024

We've also reached out to Apple about this issue and hope they will get back to us

@filipnavara
Copy link
Member Author

I will try to extract the non-problematic bits of this PR tomorrow into separate one(s).

I have some idea how to make the data section layout reliable without triggering the ld64 crash. In any case it’s going to be a non-trivial change and I need to test it properly.

@filipnavara
Copy link
Member Author

filipnavara commented Aug 15, 2024

I have some idea how to make the data section layout reliable without triggering the ld64 crash. In any case it’s going to be a non-trivial change and I need to test it properly.

I came up with two hacky prototypes:

  • Generate a single symbol for beginning of each data section and internally resolve the relocations pointing to the data section to (sectionSymbol + symbol value). The symbols in the data section are then not emitted at all. I had to do some nasty hacks to handle EHInfo & AssociatedData references from unwinding data, and globally visible data symbols (eg. g_compilerEmbeddedSettingsBlob, g_compilerEmbeddedKnobsBlob). It's not difficult to do all this and it would like be possible to make it a maintainable code change. I'm concerned about the impact on debuggability. I often relied on address to symbol lookup to identify method table from a pointer, and this would be lost. Perhaps there's a way to emit debug-only (N_STABS) or L-prefixed local symbols but that steers us back to the N_ALT_ENTRY solution which is prone to hit linker bugs.

  • Get back to the state as of 9ed567b and try to clean it up so we don't do too many special cases. This means emitting the data in a linker visible way and potentially allowing it to reorder them. It also imposes additional restrictions:

    1. Symbol must be emitted at the beginning of each node (can be done by ObjWriter).
    2. All symbols with non-zero offsets need to be marked with N_ALT_ENTRY in order to keep the node data as single chunk (can be done by ObjWriter)
    3. Node data must be relocatable by the linker in the final output relative to other node data. This is currently violated by the contents of the hydrated section, hence the special casing I did earlier.

    The main obstacle to this solution would be remodeling how the hydrated data are emitted. The solution I used earlier doesn't really work well since it can still run into the limits of number of consecutive N_ALT_ENTRY symbols that crash the linker with stack overflow. We would likely need to take inspiration from the other prototype mentioned earlier and emit one symbol for the hydrated data and then use relative references to that symbol. And that once again leads to the debuggability degradation...

Neither of those solutions sounds like a slam dunk, but I wanted to share them anyway for posterity. Lastly, there's always the option of relying on the undefined behavior that the linker doesn't do the reordering for our code/data and that N_NO_DEAD_STRIP prevents any kind of dead code/data stripping. It may simply be good enough for now unless we come up with something cleaner.

I extracted the more obvious bits from this PR into #106442, #106444, and #106446. I'll close this PR since the general approach is a dead-end but I am open to continue discussion if there's any interest or guidance from Apple.

MichalStrehovsky pushed a commit that referenced this pull request Aug 19, 2024
…e beginning of the section are not stripped if there's no other N_NO_DEAD_STRIP symbol referencing them (#106444)

Extracted from #106224.

PR #103039 added `N_NO_DEAD_STRIP` flag to all symbols emitted by ILC and enabled the dead code stripping in the native linker.

It failed to handle one specific edge case that is luckily not happening in the wild. If the first node emitted into a section has a symbol with non-zero offset NN the first `N_NO_DEAD_STRIP` symbol is not pointing at the start of the section. The native linker then splits up the section into atom and the first atom from offset 0 to offset NN is never referenced and becomes eligible for dead code stripping. Since we emit a symbol for each section start (for use in section-relative relocations) we can just mark the symbol with `N_NO_DEAD_STRIP` to resolve the issue.
MichalStrehovsky pushed a commit that referenced this pull request Aug 19, 2024
…h ELF (#106446)

Extracted from #106224

`.private_extern` is the logical equivalent of `.hidden`+`.global` in ELF. We already emit those flags in ELF, so do it in Mach-O too.

Documentation for `.private_extern`:
> It's used to mark symbols with limited visibility. When the file is fed to the static linker, it clears the N_EXT bit for each symbol with the N_PEXT bit set. The ld option -keep_private_externs turns off this behavior.
@filipnavara
Copy link
Member Author

For posterity's sake, I received some feedback from Apple:

The case new linker isn’t handling correctly is a combination of symbols at overlapping addresses and alt-entry symbols, like here:

00000000000602c0 (__TEXT,__managedcode) external _GetRuntimeException
00000000000602c0 (__TEXT,__managedcode) private external _S_P_CoreLib_System_RuntimeExceptionHelpers__GetRuntimeException
0000000000060684 (__TEXT,__managedcode) private external [alt entry] _fram1_S_P_CoreLib_System_RuntimeExceptionHelpers__GetRuntimeException

In this specific case marking _S_P_CoreLib_System_RuntimeExceptionHelpers__GetRuntimeException as an alt entry symbol would resolve the problem. That may not always work when using -ld_classic though (e.g., in the test_data.S sample) because old linker doesn’t handle alt-entry symbols at a start of a section. So if that can be a problem, we recommend using -ld_classic until the fix is available.

TL;DR:

  • The new linker (ld-prime) has a bug with overlapping start symbols in combination with .alt_entry symbols. We can likely workaround this.
  • -ld_classic continues to be a viable workaround even in Xcode 16, despite the warning that classic linker will be removed in future release (ie. future release is likely Xcode 17+). This removes some pressure to support the new linker for the .NET 9 timeframe (/cc @ivanpovazan).

I filed another issue (FB14897581) for the old linker (ld64) crashing with stack overflow for huge number of .alt_entry symbols and dead stripping. I asked for guidance on producing an unbreakable atom in DATA/BSS section with symbols that would be good enough not to break the "image lookup" experience in lldb debugger (to allow lookup of name from method table address).

@ivanpovazan
Copy link
Member

Thanks for following up on this!

-ld_classic continues to be a viable workaround even in Xcode 16

Is there an official statement from Apple on this?

@filipnavara
Copy link
Member Author

filipnavara commented Aug 23, 2024

Is there an official statement from Apple on this?

No official statement, as far as I know. The best I can tell is that I still keep receiving guidance we recommend using -ld_classic until the fix is available for bugs against Xcode 16 betas.

(That said, the new linker works for the NativeAOT smoke tests now, so we are already in a better place than with Xcode 15.)

@github-actions github-actions bot locked and limited conversation to collaborators Sep 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-NativeAOT-coreclr community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants