-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NativeAOT/ARM] Avoid generating linker range extension thunks #99208
[NativeAOT/ARM] Avoid generating linker range extension thunks #99208
Conversation
Tagging subscribers to this area: @agocke, @MichalStrehovsky, @jkotas Issue DetailsIncludes #99207, only the second commit belongs to this PR Assume everything but import thunks needs long jumps. Note that this doesn't necessarily handle import of methods that are also marked as
|
return (ushort)RelocType.IMAGE_REL_BASED_THUMB_BRANCH24; | ||
#else | ||
var targetObject = HandleToObject(target); | ||
if (targetObject is ExternSymbolNode) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note: I would love to use something like targetObject is ExternMethodSymbolNode
or targetObject is ExternSymbolNode and not RuntimeImportMethodNode
. Unfortunately, I cannot, because multiple C library calls are imported as InternalCall
. That's the case at least for memmove
and the float/double math functions like ceil
.
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Can I get a re-run, please? ;) |
/azp run runtime-nativeaot-outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Looks like a first clean pass of System.Runtime.Tests in a while. (Let's not jinx it, hopefully the Pri0 tests will pass as well.) Once #99207 is merged I will rebase this and update the PR description. |
@@ -4018,8 +4018,10 @@ private ushort getRelocTypeHint(void* target) | |||
case TargetArchitecture.X64: | |||
return (ushort)RelocType.IMAGE_REL_BASED_REL32; | |||
|
|||
#if READYTORUN | |||
case TargetArchitecture.ARM: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How much perf does this leave on the table?
Would it be better to detect and skip the linker generated thunks in the code manager at runtime?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't have the measurements for performance but there are several things to consider:
- The thunks may appear in front of the managed code section or at the end. Either way it can only cover a limited range (+- 16MB) and we would still get a failure if this range is exceeded. Some of the tests are not far away from hitting such limit.
- Detecting the thunks is non-trivial. The exception unwinding table is indexed only by method start address and does not contain the end address or method length. It's assumed that the next method follows the previous one. To detect thunks inserted at the beginning of the managed code section we could look for the lowest address of unwinding info within the section. Detecting thunks at the end of the managed code requires inserting an extra "cantunwind" marker at the end of our managed code to get an idea where the inserted thunks start (unless the linker moves them to a separate synthetic section which happened in my earlier experiment). The linker doesn't generate any unwinding table information for the thunks.
- Inserting custom label symbols to mark the beginning+end of our managed code wouldn't work when using the multi-object-file mode. Likewise, in that mode the thunks could appear in the middle of the section when multiple object files are merged. That may still be solvable with the "cantunwind" end marker but it's noticably harder to reason about.
- CoreCLR doesn't seem to send the relocation hint to JIT either. It doesn't ask for relocatable code so the sequence is slightly shorter (2 bytes iirc) but it still doesn't use the short BL as far as I can tell.
Given all the considerations I would prefer to have a correct baseline and optimize later.
…iter for unresolved external symbols.
b68b1af
to
2928e0d
Compare
Contributes to #99079
Ref: #99079 (comment)
Prevent the native linker from generating range extension thunks inside the managed code section. The unwinding code cannot deal with them and the short relocations limit the maximum code size.