Support Arm64 "constructed" constants in SuperPMI asm diffs #76616

BruceForstall · 2022-10-04T15:58:12Z

SuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data.

Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple mov/movk instructions.

One case is the allocPgoInstrumentationBySchema() API which returns a pointer to a PGO data buffer. An address within this buffer is constructed via a sequence such as:

mov     x0, #63408
movk    x0, #23602, lsl #16
movk    x0, #606, lsl #32

When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above.

This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.

Fixes #76347

SuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data. Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple `mov`/`movk` instructions. One case is the `allocPgoInstrumentationBySchema()` API which returns a pointer to a PGO data buffer. An address within this buffer is constructed via a sequence such as: ``` mov x0, dotnet#63408 movk x0, dotnet#23602, lsl dotnet#16 movk x0, dotnet#606, lsl dotnet#32 ``` When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above. This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.

ghost · 2022-10-04T15:58:41Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

SuperPMI asm diffs tries to ignore constants that can change between multiple replays, such as addresses that the replay engine must generate and not simply hand back from the collected data.

Often, addresses have associated relocations generated during replay. SuperPMI can use these relocations to adjust the constants to allow two replays to match. However, there are cases on Arm64 where an address both doesn't report a relocation and is "constructed" using multiple mov/movk instructions.

One case is the allocPgoInstrumentationBySchema() API which returns a pointer to a PGO data buffer. An address within this buffer is constructed via a sequence such as:

mov     x0, #63408
movk    x0, #23602, lsl #16
movk    x0, #606, lsl #32

When SuperPMI replays this API, it constructs a new buffer and returns that pointer, which is used to construct various actual addresses that are generated as "constructed" constants, shown above.

This change "de-constructs" the constants and looks them up in the replay address map. If base and diff match the mapped constants, there is no asm diff.

Author:	BruceForstall
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

BruceForstall · 2022-10-04T15:58:47Z

/azp run runtime-coreclr superpmi-asmdiffs-checked-release

azure-pipelines · 2022-10-04T15:58:59Z

Azure Pipelines successfully started running 1 pipeline(s).

BruceForstall · 2022-10-04T15:59:25Z

@AndyAyersMS @jakobbotsch @dotnet/jit-contrib PTAL

jakobbotsch · 2022-10-04T16:03:43Z

Nice!
Is this orthogonal with #74718 (comment)?

BruceForstall · 2022-10-04T16:23:04Z

Is this orthogonal with #74718 (comment)?

Yes, I think so. (Thanks; I hadn't seen that issue.)

Note that MethodContext::repAllocPgoInstrumentationBySchema() has the comment:

   // Todo, perhaps: record the buffer as a compile result instead, and defer copying until
   // jit completion so we can snapshot the offsets the jit writes.

I don't think we fully support 64-bit replay on 32-bit host, but this fix at least makes it possible for this case.

BruceForstall · 2022-10-04T16:43:06Z

/azp run runtime-coreclr superpmi-asmdiffs-checked-release

azure-pipelines · 2022-10-04T16:43:15Z

Azure Pipelines successfully started running 1 pipeline(s).

jakobbotsch · 2022-10-04T18:45:18Z

src/coreclr/tools/superpmi/superpmi/neardiffer.cpp

+                        addr1 += (DWORDLONG)con4_1 << 48;
+                        addr2 += (DWORDLONG)con4_2 << 48;
+                    }
+                }


Do I understand right that this only works if both sides end up with the same number of instructions? For example, it is conceivable that one side may omit a movk because that section of the constant is all zeros.

Ultimately, I feel like we need to redesign some of the JIT-EE parts to make this robust. E.g. @EgorBo ended up needing to redesign the JIT-EE interface in #76112 in a way that ends up fixing #53773 for good, by splitting the addresses that JIT embeds in codegen from the notion of where the values are. This also helps support things in R2R and NAOT.

Can the same sort of fix not be applied to PGO? I already see that allocPgoInstrumentationBySchema returns a profileMemory pointer. Is anything stopping us from returning the same pointer to both JITs? From my cursory glance, it seems like it would work and it would make things completely deterministic. The pointer seems to only be used for codegen purposes, not accessed by the JIT.

You're right: it's checking if the instruction stream is exactly the same modulo the constants, you're right that one movk could be omitted if it is zero, or perhaps (unlikely) one could have an address in the >48 bit range and one does not.

Is anything stopping us from returning the same pointer to both JITs?

The fact that two compilations are happening is unknown at this point in the code. In fact, we currently don't even remember we allocated the buffer (except to put it on a list to free).

We would need to record it as a map from ftnHnd => pointer. There's nothing preventing the JIT from calling the allocPgoInstrumentationBySchema (or similar API) multiple times and getting multiple different results.

I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.

The fact that two compilations are happening is unknown at this point in the code. In fact, we currently don't even remember we allocated the buffer (except to put it on a list to free).

We would need to record it as a map from ftnHnd => pointer. There's nothing preventing the JIT from calling the allocPgoInstrumentationBySchema (or similar API) multiple times and getting multiple different results.

I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.

That's true, it would need an SPMI update (and by extension, JIT-EE GUID update) to save profileMemory during recording. I think then keying it by ftnHnd would make sense, it is the relationship in the VM today because the VM needs to be able to hand the same pointer back for multiple versions of the same method.

Another benefit is that there are other things that can affect the codegen. Constant CSE could potentially affect the codegen depending on the addresses too. So it would be best if we get the same exact pointer that was returned at runtime to make sure we are seeing the exact same codegen.

When I wrote:

I guess if the MethodContext remembers if it ever returned a pointer for a ftnHnd, then it can look up in the table and return the same pointer.

I was thinking about during replay.

So it would be best if we get the same exact pointer that was returned at runtime to make sure we are seeing the exact same codegen.

By "runtime" do you mean during collection time? I don't think that's possible in general because the JIT might want to write to or in some cases read from that address, and we can't guarantee the same pointer at replay as at collection time.

By "runtime" do you mean during collection time? I don't think that's possible in general because the JIT might want to write to or in some cases read from that address, and we can't guarantee the same pointer at replay as at collection time.

Yes, I mean the same pointer during replay as during collection. This is what I was saying previously. I think we should be trying to design the JIT-EE interface so that the JIT does not need to read directly from handles it is using for codegen purposes. This is necessary not just for SPMI, but usually also for R2R/NAOT where the handles are not actually memory addresses. @EgorBo is moving static field addresses to this model in #76112.

I think it is already the case with allocPgoInstrumentationBySchema today. From a cursory glance the JIT does not read or write to the returned profileMemory pointer. It is purely used for codegen purposes.

If the JIT wants to start reading or writing this pointer I think it should be accomplished with changes to the JIT-EE interface. That way we could also support instrumentation during prejit, for example. In those cases pointers like these are presumably going to be some opaque handle that cannot just be dereferenced.

Allow JIT1 and JIT2 to have a different sequence of mov/movk[/movk[/movk]] that map to the same address in the address map. That is, the replay constant might require a different set of instructions (e.g., if a `movk` is missing because its constant is zero).

BruceForstall · 2022-10-04T22:34:49Z

/azp run runtime-coreclr superpmi-asmdiffs-checked-release

azure-pipelines · 2022-10-04T22:35:03Z

Azure Pipelines successfully started running 1 pipeline(s).

BruceForstall · 2022-10-04T22:36:25Z

@jakobbotsch I've fixed the case where JIT1 and JIT2 could have a different sequence of mov/movk instructions.

I don't want to spend time at this point on any broader rearchitecting, such as adding a handle->pointer map for repAllocPgoInstrumentationBySchema; I want to unblock the Checked/Release asmdiffs pipeline and move on.

jakobbotsch · 2022-10-04T22:59:06Z

@jakobbotsch I've fixed the case where JIT1 and JIT2 could have a different sequence of mov/movk instructions.

Does that work? Isn't the differ going to complain about the different number of instructions no matter what?

I don't want to spend time at this point on any broader rearchitecting, such as adding a handle->pointer map for repAllocPgoInstrumentationBySchema; I want to unblock the Checked/Release asmdiffs pipeline and move on.

Fair enough, but it may just become more flakey. I don't mind doing the work to fix this and #74718 together.
We can take the fix in the meantime, but I would rather substitute it for a more complete fix at a later date.

BruceForstall · 2022-10-04T23:06:58Z

Does that work? Isn't the differ going to complain about the different number of instructions no matter what?

Ugh, yes :-( Maybe I'm too tired to code today :)

BruceForstall · 2022-10-04T23:16:14Z

Does that work? Isn't the differ going to complain about the different number of instructions no matter what?

fwiw, we could engineer the code to skip the number of instructions we choose, if this is a problem.

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 4, 2022

BruceForstall requested review from AndyAyersMS and jakobbotsch October 4, 2022 15:58

ghost assigned BruceForstall Oct 4, 2022

Fix 32-bit build

9104a84

I don't think we fully support 64-bit replay on 32-bit host, but this fix at least makes it possible for this case.

BruceForstall mentioned this pull request Oct 4, 2022

[JIT] Cleanup to lsra inspired by #73424 #76481

Merged

jakobbotsch reviewed Oct 4, 2022

View reviewed changes

jakobbotsch approved these changes Oct 4, 2022

View reviewed changes

BruceForstall merged commit b303ffe into dotnet:main Oct 5, 2022

BruceForstall deleted the SupportArm64InlineConstants branch October 5, 2022 14:49

ghost locked as resolved and limited conversation to collaborators Nov 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Support Arm64 "constructed" constants in SuperPMI asm diffs #76616

Support Arm64 "constructed" constants in SuperPMI asm diffs #76616

BruceForstall commented Oct 4, 2022 •

edited

Loading

ghost commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

azure-pipelines bot commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

jakobbotsch commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

azure-pipelines bot commented Oct 4, 2022

jakobbotsch Oct 4, 2022 •

edited

Loading

BruceForstall Oct 4, 2022

jakobbotsch Oct 4, 2022

BruceForstall Oct 4, 2022

jakobbotsch Oct 4, 2022 •

edited

Loading

BruceForstall commented Oct 4, 2022

azure-pipelines bot commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

jakobbotsch commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

Support Arm64 "constructed" constants in SuperPMI asm diffs #76616

Support Arm64 "constructed" constants in SuperPMI asm diffs #76616

Conversation

BruceForstall commented Oct 4, 2022 • edited Loading

ghost commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

azure-pipelines bot commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

jakobbotsch commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

azure-pipelines bot commented Oct 4, 2022

jakobbotsch Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

BruceForstall Oct 4, 2022

Choose a reason for hiding this comment

jakobbotsch Oct 4, 2022

Choose a reason for hiding this comment

BruceForstall Oct 4, 2022

Choose a reason for hiding this comment

jakobbotsch Oct 4, 2022 • edited Loading

Choose a reason for hiding this comment

BruceForstall commented Oct 4, 2022

azure-pipelines bot commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

jakobbotsch commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

BruceForstall commented Oct 4, 2022

BruceForstall commented Oct 4, 2022 •

edited

Loading

jakobbotsch Oct 4, 2022 •

edited

Loading

jakobbotsch Oct 4, 2022 •

edited

Loading