-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support Arm64 "constructed" constants in SuperPMI asm diffs #76616
Support Arm64 "constructed" constants in SuperPMI asm diffs #76616
Conversation
SuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data. Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple `mov`/`movk` instructions. One case is the `allocPgoInstrumentationBySchema()` API which returns a pointer to a PGO data buffer. An address within this buffer is constructed via a sequence such as: ``` mov x0, dotnet#63408 movk x0, dotnet#23602, lsl dotnet#16 movk x0, dotnet#606, lsl dotnet#32 ``` When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above. This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsSuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data. Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple One case is the
When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above. This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.
|
/azp run runtime-coreclr superpmi-asmdiffs-checked-release |
Azure Pipelines successfully started running 1 pipeline(s). |
@AndyAyersMS @jakobbotsch @dotnet/jit-contrib PTAL |
Nice! |
Yes, I think so. (Thanks; I hadn't seen that issue.) Note that
|
I don't think we fully support 64-bit replay on 32-bit host, but this fix at least makes it possible for this case.
/azp run runtime-coreclr superpmi-asmdiffs-checked-release |
Azure Pipelines successfully started running 1 pipeline(s). |
addr1 += (DWORDLONG)con4_1 << 48; | ||
addr2 += (DWORDLONG)con4_2 << 48; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do I understand right that this only works if both sides end up with the same number of instructions? For example, it is conceivable that one side may omit a movk
because that section of the constant is all zeros.
Ultimately, I feel like we need to redesign some of the JIT-EE parts to make this robust. E.g. @EgorBo ended up needing to redesign the JIT-EE interface in #76112 in a way that ends up fixing #53773 for good, by splitting the addresses that JIT embeds in codegen from the notion of where the values are. This also helps support things in R2R and NAOT.
Can the same sort of fix not be applied to PGO? I already see that allocPgoInstrumentationBySchema
returns a profileMemory
pointer. Is anything stopping us from returning the same pointer to both JITs? From my cursory glance, it seems like it would work and it would make things completely deterministic. The pointer seems to only be used for codegen purposes, not accessed by the JIT.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right: it's checking if the instruction stream is exactly the same modulo the constants, you're right that one movk
could be omitted if it is zero, or perhaps (unlikely) one could have an address in the >48 bit range and one does not.
Is anything stopping us from returning the same pointer to both JITs?
The fact that two compilations are happening is unknown at this point in the code. In fact, we currently don't even remember we allocated the buffer (except to put it on a list to free).
We would need to record it as a map from ftnHnd => pointer. There's nothing preventing the JIT from calling the allocPgoInstrumentationBySchema
(or similar API) multiple times and getting multiple different results.
I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fact that two compilations are happening is unknown at this point in the code. In fact, we currently don't even remember we allocated the buffer (except to put it on a list to free).
We would need to record it as a map from ftnHnd => pointer. There's nothing preventing the JIT from calling the
allocPgoInstrumentationBySchema
(or similar API) multiple times and getting multiple different results.I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.
That's true, it would need an SPMI update (and by extension, JIT-EE GUID update) to save profileMemory
during recording. I think then keying it by ftnHnd
would make sense, it is the relationship in the VM today because the VM needs to be able to hand the same pointer back for multiple versions of the same method.
Another benefit is that there are other things that can affect the codegen. Constant CSE could potentially affect the codegen depending on the addresses too. So it would be best if we get the same exact pointer that was returned at runtime to make sure we are seeing the exact same codegen.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I wrote:
I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.
I was thinking about during replay.
So it would be best if we get the same exact pointer that was returned at runtime to make sure we are seeing the exact same codegen.
By "runtime" do you mean during collection time? I don't think that's possible in general because the JIT might want to write to or in some cases read from that address, and we can't guarantee the same pointer at replay as at collection time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By "runtime" do you mean during collection time? I don't think that's possible in general because the JIT might want to write to or in some cases read from that address, and we can't guarantee the same pointer at replay as at collection time.
Yes, I mean the same pointer during replay as during collection. This is what I was saying previously. I think we should be trying to design the JIT-EE interface so that the JIT does not need to read directly from handles it is using for codegen purposes. This is necessary not just for SPMI, but usually also for R2R/NAOT where the handles are not actually memory addresses. @EgorBo is moving static field addresses to this model in #76112.
I think it is already the case with allocPgoInstrumentationBySchema
today. From a cursory glance the JIT does not read or write to the returned profileMemory
pointer. It is purely used for codegen purposes.
If the JIT wants to start reading or writing this pointer I think it should be accomplished with changes to the JIT-EE interface. That way we could also support instrumentation during prejit, for example. In those cases pointers like these are presumably going to be some opaque handle that cannot just be dereferenced.
Allow JIT1 and JIT2 to have a different sequence of mov/movk[/movk[/movk]] that map to the same address in the address map. That is, the replay constant might require a different set of instructions (e.g., if a `movk` is missing because its constant is zero).
/azp run runtime-coreclr superpmi-asmdiffs-checked-release |
Azure Pipelines successfully started running 1 pipeline(s). |
@jakobbotsch I've fixed the case where JIT1 and JIT2 could have a different sequence of mov/movk instructions. I don't want to spend time at this point on any broader rearchitecting, such as adding a handle->pointer map for |
Does that work? Isn't the differ going to complain about the different number of instructions no matter what?
Fair enough, but it may just become more flakey. I don't mind doing the work to fix this and #74718 together. |
Ugh, yes :-( Maybe I'm too tired to code today :) |
fwiw, we could engineer the code to skip the number of instructions we choose, if this is a problem. |
SuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data.
Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple
mov
/movk
instructions.One case is the
allocPgoInstrumentationBySchema()
API which returns a pointer to a PGO data buffer. An address within this buffer is constructed via a sequence such as:When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above.
This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.
Fixes #76347