[Arm64] Implement stack probe helper #43250

echesakov · 2020-10-10T03:41:25Z

Below is an algorithm I propose and its comparison to the current implementation:

Case 1: compiler->compLclFrameSize < getVeryLargeFrameSize(). I am going to talk what should be chosen as a value returned by getVeryLargeFrameSize() at a later point, but note that currently it is equal to three page sizes (and let's assume for now that pageSize = 0x1000). In other words, if distance between a current value of sp and the final value of sp is less than some predefined value then the JIT inlines a stack probing instructions sequence into a function prolog.

Currently, this is done be inserting a sequence of mov tempReg, #imm followed by ldr wzr, [sp, tempReg]. For example, for compLclFrameSize = 0x2000 the JIT emits

        9281FFE9          movn    x9, #0xfff
        B8696BFF          ldr     wzr, [sp, x9]
        9283FFE9          movn    x9, #0x1fff
        B8696BFF          ldr     wzr, [sp, x9]

You can immediately see that the current way is suboptimal in both performance and code size. First, the memory access is unaligned. Second, we don't need to emit 4 instructions in order to probe 2 pages.

There is more optimal implementation I propose and it utilizes a fact that ldr (immediate) can address up to 32 Kbytes (in positive direction) of data. In order to to that, the JIT would emit sub tempReg, sp, 0x8000 followed by ldr xzr, [tempReg, #imm1]; ldr xzr, [tempReg, #imm2] unless reaches the stack frame boundary. For example, for the same compLclFrameSize = 0x2000 the JIT would emit

        D14023E9          sub     x9, sp, #8, LSL #12
        F978013F          ldr     xzr, [x9,#0x7000]
        F970013F          ldr     xzr, [x9,#0x6000]

that would save one instruction.

Case 2 compiler->compLclFrameSize >= getVeryLargeFrameSize(). This is what I was trying originally address by this PR and replace the inlined stack probing with a helper call. The mechanics is similar to other platforms but slighly complicated by the fact that Arm64 can have up to 6 different frame types as defined in codegencommon.cpp

Turns out, that for this case we only care about two - frameType = 3 and frameType = 5. The major difference between them is a location where fp, lr record is stored on the stack - at the bottom (frameType = 3) or at the top (frameType = 5). At the moment, the JIT uses frameType = 5 for methods with localloc and GS cookies

Otherwise, frameType = 3 is used.
In order, to be able to call a helper, lr must be saved on the stack before the call. That means, that, in order to call the stack probing helper, the JIT would need to force frameType = 5. However, there is an issue with approach. At the moment, addresses of locals are computed based on fp value and frameType = 5 meaning that their offsets are becoming negative (that might cause regressions in extreme cases with large number of locals when ldr\sdr wouldn't be able to encode such offsets with immediate).

Let's compare the JIT generated code for this case. Suppose compLclFrameSize = 0x10000, the current implementation of the JIT inlines a stack probing loop

        9281FFE9          movn    x9, #0xfff
        928001E0          movn    x0, #15
        F2BFFFC0          movk    x0, #0xfffe LSL #16
        B8696BFF          ldr     wzr, [sp, x9]
        D1400529          sub     x9, x9, #1, LSL #12
        EB09001F          cmp     x0, x9
        54FFFFA9          bls     pc-16 (-4 instructions)

The JIT with stack probe helper would emit

        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
        D2800209          movz    x9, #16
        F2A00029          movk    x9, #1 LSL #16
        CB2963E9          sub     x9, sp, x9, LSL #0
        94000000          bl      CORINFO_HELP_STACK_PROBE

Note, that x9 contains the final value of sp and fp, lr are stored on the stack before the call.

While it seems that we only got one instruction win - it is more that this. First, we avoided having a loop in a function prolog. As Tamar commented in #43789 (comment) such loop can cause performance degradation and in order to avoid we must ensure that the loop is properly aligned. Second, for large frame sizes that are more likely to cause StackOverflow we are not able to take advantages of JanV's work in #32167 and display stack trace during StackOverflow. Using the helper solves both issues.

Let's talk about getVeryLargeFrameSize().
Given how we optimized the inlined sequence of instructions it might be beneficial to increase the value and defer a moment when the JIT would need to call a helper. I propose to have the value of getVeryLargeFrameSize() to be such that stack probing can be done by inlining one sub tempReg, sp, 0x8000 followed by up to 8 ldr xzr, [tempReg, #imm].

For example, compLclFrameSize = 0x3000

        D14023E9          sub     x9, sp, #8, LSL #12
        F978013F          ldr     xzr, [x9,#0x7000]
        F970013F          ldr     xzr, [x9,#0x6000]
        F968013F          ldr     xzr, [x9,#0x5000]

compLclFrameSize = 0x4000

        D14023E9          sub     x9, sp, #8, LSL #12
        F978013F          ldr     xzr, [x9,#0x7000]
        F970013F          ldr     xzr, [x9,#0x6000]
        F968013F          ldr     xzr, [x9,#0x5000]
        F960013F          ldr     xzr, [x9,#0x4000]

compLclFrameSize = 0x8D80

        D14023E9          sub     x9, sp, #8, LSL #12
        F978013F          ldr     xzr, [x9,#0x7000]
        F970013F          ldr     xzr, [x9,#0x6000]
        F968013F          ldr     xzr, [x9,#0x5000]
        F960013F          ldr     xzr, [x9,#0x4000]
        F958013F          ldr     xzr, [x9,#0x3000]
        F950013F          ldr     xzr, [x9,#0x2000]
        F948013F          ldr     xzr, [x9,#0x1000]
        F940013F          ldr     xzr, [x9]

compLclFrameSize = 0x8E00

        A9BF7BFD          stp     fp, lr, [sp,#-16]!
        910003FD          mov     fp, sp
        D291C209          mov     x9, #0x8e10
        CB2963E9          sub     x9, sp, x9, LSL #0
        94000000          bl      CORINFO_HELP_STACK_PROBE

The following is a chart comparing prolog sizes for different values of compLclFrameSizefor the current implementation ("base clrjit.dll") and the proposed implementation ("diff clrjit.dll").

As you can see, for the smaller frame sizes - the proposed implementation produces smaller code size, and it keeps inlining the above-mentioned instruction sequences until it had to compute a new value of tempReg and calls a helper after that point.

echesakov · 2020-10-30T00:55:00Z

@BruceForstall I believe this is ready for review now. I plan to do more testing next week. I know that there are two code size regressions related to "negative fp-offsets of locals"-issue - I am planning to look into how this can be mitigated separately. It seems that computing the local addresses based on sp value, as we discussed, should be sufficient.

@TamarChristinaArm I would like to ask your opinion about inlining more instructions in a prolog in order to do stack probing. In particular, how far we should go? Should we go beyond sub followed by 8 ldr-s? Or it seems to be a reasonable boundary and after that point it's better to switch to a helper call.

cc @dotnet/jit-contrib

BruceForstall · 2020-11-03T01:44:44Z

I'm a little confused with your inline examples. E.g.,

"For example, compLclFrameSize = 0x3000"

        D14023E9          sub     x9, sp, #8, LSL #12
        F978013F          ldr     xzr, [x9,#0x7000]
        F970013F          ldr     xzr, [x9,#0x6000]
        F968013F          ldr     xzr, [x9,#0x5000]

why are we subtracting 0x8000? Shouldn't it be:

        sub     x9, sp, #3, LSL #12
        ldr     xzr, [x9,#0x2000] // one ldr per page
        ldr     xzr, [x9,#0x1000]
        ldr     xzr, [x9,#0x0] // always probe the very bottom last

?

BruceForstall · 2020-11-03T01:45:01Z

fyi @janvorli

echesakov · 2020-11-03T02:11:22Z

I'm a little confused with your inline examples. E.g.,

"For example, compLclFrameSize = 0x3000"

        D14023E9          sub     x9, sp, #8, LSL #12
        F978013F          ldr     xzr, [x9,#0x7000]
        F970013F          ldr     xzr, [x9,#0x6000]
        F968013F          ldr     xzr, [x9,#0x5000]

why are we subtracting 0x8000? Shouldn't it be:

        sub     x9, sp, #3, LSL #12
        ldr     xzr, [x9,#0x2000] // one ldr per page
        ldr     xzr, [x9,#0x1000]
        ldr     xzr, [x9,#0x0] // always probe the very bottom last

?

@BruceForstall Sure, we could subtract 0x3000. In fact we could subtract any value that is min(0x8000, currentSpToFinalSp- currentSpToTempReg) and can be encoded in one sub tempReg, sp, #imm instruction (i.e. it must be either smaller than 0x1000 or be a multiple of 0x1000).

I decided not to go into the math and simplified it by always subtracting 0x8000 since ldr xzr, [tempReg, #imm] can encode any positive offset in range [0, 0x8000 - 8]

BruceForstall · 2020-11-03T02:18:37Z

Oh, I see; you always subtract 0x8000, but that's just for probing; when the actual SP subtract happens, it's of the actual required amount.

Don't we need to move SP when probing on Linux?

echesakov · 2020-11-03T02:27:46Z

Oh, I see; you always subtract 0x8000, but that's just for probing; when the actual SP subtract happens, it's of the actual required amount.

Don't we need to move SP when probing on Linux?

We do on linux-x64. However, even on linux-x64 we still could probe below SP but not very far. There is some limit when such access becomes treated as illegal and the app will be terminated by the kernel. As far as I remember, this check in linux memory manager was enabled for linux-x64 only, but not for linux-arm, linux-arm64.

BruceForstall

Generally looks good. Mostly, I'm requesting more comments. It makes sense to extract out genPushCalleeSavedRegisters to reduce ifdefs; that's a nice change.

BruceForstall · 2020-11-03T02:25:48Z

src/coreclr/src/vm/arm64/asmhelpers.asm

+;   x9   - points to the lowest address on the stack frame being allocated (i.e. [InitialSp - FrameSize])
+;   sp   - points to some byte on the last probed page
+; On exit:
+;   x9   - is preserved


Should you mention x30 is trashed?

x30 is never trashed. x30 is the same register as lr. I am using x30 name instead of lr since the register is not used as link register and I wanted to emphasize that.

BruceForstall · 2020-11-03T02:27:17Z

src/coreclr/src/vm/arm64/asmhelpers.asm

+        cmp     sp, x30, lsl #0
+        bhs     ProbeLoop                            ; if (sp >= x30), then we need to probe at least one more page
+
+        mov     sp, fp


Where does fp get set/saved? By PROLOG_SAVE_REG_PAIR?

BruceForstall · 2020-11-03T02:32:10Z

src/coreclr/src/vm/arm64/asmhelpers.asm

+        bhs     ProbeLoop                            ; if (sp >= x30), then we need to probe at least one more page
+
+        mov     sp, fp
+        EPILOG_RESTORE_REG_PAIR fp, lr, 16!


I assume it's ok to have a sequence:

Function prolog

Call probe helper

Probe helper probes, subtracting sp, then restores sp to location at call to probe helper, returns to caller

Function changes sp

In particular, between 3 & 4, the probed pages remain mapped (the OS never "reclaims" them) even though sp has been reverted. (Presumably, e.g., the OS could use the probed space then for interrupt handler, say).

I believe this is true. Stack pages can only be reclaimed after the thread exits.

BruceForstall · 2020-11-03T02:33:28Z

src/coreclr/src/vm/arm64/asmhelpers.S

+#define PAGE_SIZE_LOG12 12
+#define PAGE_SIZE 4096
+
+LEAF_ENTRY JIT_StackProbe, _TEXT


I think you should add a header comment with "on entry" and "on exit" conditions spelled out.

BruceForstall · 2020-11-03T02:35:03Z

src/coreclr/src/jit/compiler.h

-        return 2 * eeGetPageSize();
+        return 2 * pageSize;
+#elif defined(TARGET_ARM64)
+        constexpr target_size_t ldrLargestPositiveImmByteOffset = 0x8000;


It would be worthwhile having a comment here (even a quite detailed comment) explaining this math, or at least a pointer to someplace else in the code that has such a comment.

BruceForstall · 2020-11-03T02:39:16Z

src/coreclr/src/jit/lclvars.cpp

@@ -5853,6 +5853,12 @@ void Compiler::lvaAssignVirtualFrameOffsetsToLocals()
    {
        codeGen->SetSaveFpLrWithAllCalleeSavedRegisters(true); // Force using new frames
    }
+
+    if (compLclFrameSize >= getVeryLargeFrameSize())


A comment here would be useful

BruceForstall · 2020-11-03T02:40:47Z

src/coreclr/src/jit/codegenxarch.cpp

@@ -8944,4 +8944,63 @@ void CodeGen::genProfilingLeaveCallback(unsigned helper)

 #endif // PROFILING_SUPPORTED

+/*-----------------------------------------------------------------------------


I'm assuming this is just extracted and logic is unchanged

BruceForstall · 2020-11-03T02:44:17Z

src/coreclr/src/jit/target.h

+  #define RBM_STACK_PROBE_HELPER_ARG         RBM_R9
+  #define REG_STACK_PROBE_HELPER_CALL_TARGET REG_IP0
+  #define RBM_STACK_PROBE_HELPER_CALL_TARGET RBM_IP0
+  #define RBM_STACK_PROBE_HELPER_TRASH       RBM_NONE


Shouldn't the trash set be x30?

BruceForstall · 2020-11-03T02:47:10Z

src/coreclr/src/jit/codegenarmarch.cpp

+
+    int totalFrameSize = genTotalFrameSize();
+
+    bool      useStackProbeHelper = false;


There should be more overall comments about the probing logic, here, preferably with examples for the various important cases.

janvorli · 2020-11-03T09:39:35Z

src/coreclr/src/vm/arm64/asmhelpers.S

@@ -1262,6 +1262,26 @@ GenerateProfileHelper ProfileTailcall, PROFILE_TAILCALL

 #endif

+#define PAGE_SIZE_LOG12 12
+#define PAGE_SIZE 4096


A nit - I would prefer naming this PROBE_PAGE_SIZE to indicate that it isn't necessarily the OS page size.

janvorli · 2020-11-03T10:30:56Z

src/coreclr/src/vm/arm64/asmhelpers.asm

+        bhs     ProbeLoop                            ; if (sp >= x30), then we need to probe at least one more page
+
+        mov     sp, fp
+        EPILOG_RESTORE_REG_PAIR fp, lr, 16!


I believe this is true. Stack pages can only be reclaimed after the thread exits.

TamarChristinaArm · 2020-11-03T17:23:29Z

@TamarChristinaArm I would like to ask your opinion about inlining more instructions in a prolog in order to do stack probing. In particular, how far we should go? Should we go beyond sub followed by 8 ldr-s? Or it seems to be a reasonable boundary and after that point it's better to switch to a helper call.

That's a reasonably amount. It's mostly a code size thing more than anything really. For GCC we inline up to 4 instructions after which we emit an inline loop. So a maximum of 8 is fine.

We do on linux-x64. However, even on linux-x64 we still could probe below SP but not very far. There is some limit when such access becomes treated as illegal and the app will be terminated by the kernel. As far as I remember, this check in linux memory manager was enabled for linux-x64 only, but not for linux-arm, linux-arm64.

Just a side note, the behavior of the kernel aside, that does violates the AAPCS see Universal stack constraints at https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#the-stack and some tools like valgrind use this invariant to detect invalid memory access.

Also note that reading from an unallocated stack area is explicitly prohibited when MTE (Memory Tagging) is enabled. The commit ARM-software/abi-aa@c09ef09 has some more information on the salient parts.

echesakov · 2020-11-03T18:33:43Z

Just a side note, the behavior of the kernel aside, that does violates the AAPCS see Universal stack constraints at https://github.com/ARM-software/abi-aa/blob/master/aapcs64/aapcs64.rst#the-stack and some tools like valgrind use this invariant to detect invalid memory access.

Hmm,

A process may only access (for reading or writing) the closed interval of the entire stack delimited by [SP, stack-base – 1].

@TamarChristinaArm Doesn't this mean that the proposed algorithm would violate the rule since sp is never adjusted before the probing finishes?

TamarChristinaArm · 2020-11-04T12:21:35Z

@TamarChristinaArm Doesn't this mean that the proposed algorithm would violate the rule since sp is never adjusted before the probing finishes?

Correct it does, and on MTE enabled systems this may result in a hardware fault (depending on the mode the hardware is set to).

There are proposals to slightly adjust the AAPCS to specifically allow probes (and only probes) below SP but those have yet to reach a conclusion.

The reverse scheme of probing after dropping SP also has issues in that if you take a signal the signal handler doesn't know whether the SP is valid or not, so it has to check.

echesakov · 2020-11-04T18:34:45Z

Correct it does, and on MTE enabled systems this may result in a hardware fault (depending on the mode the hardware is set to).

There are proposals to slightly adjust the AAPCS to specifically allow probes (and only probes) below SP but those have yet to reach a conclusion.

The reverse scheme of probing after dropping SP also has issues in that if you take a signal the signal handler doesn't know whether the SP is valid or not, so it has to check.

The scheme where SP changes as we probe will have its own issues with stack unwinding. In particular, as in #42885

However, always calling a helper for probing seems too expensive alternative. Especially, given the fact that we need to force a specific frame type in the JIT where fp, lr pair is placed on top of the locals. This has already caused some regressions that I am investigating at the moment where we have loads from/stores to locals on the stack and their address computed relative to fp meaning all the offsets become negative and some of them non-encodable with str,ldr immediates. Although, I should be able to resolve them by allowing to compute the local address based on sp value.

@TamarChristinaArm You mentioned that

For GCC we inline up to 4 instructions after which we emit an inline loop. So a maximum of 8 is fine.

meaning that on GCC you chose to break the rule and probe below SP? Perhaps, we can do the same unless we are strictly prohibited (as in the case with MTE). Or we can make such option configurable?

@janvorli @BruceForstall What are your thoughts?

TamarChristinaArm · 2020-11-04T18:51:41Z

The scheme where SP changes as we probe will have its own issues with stack unwinding. In particular, as in #42885

long thread, I'll have a read :)

meaning that on GCC you chose to break the rule and probe below SP? Perhaps, we can do the same unless we are strictly prohibited (as in the case with MTE). Or we can make such option configurable?

No those two parts were unrelated.. With GCC we drop then probe. We have a slightly different ABI (one that clang will also follow) for probing (which we do for stack clash mitigation) where we try to minimize the number of probes that we need to emit since the storing of lr counts as an implicit probe.

So for

int foo (){
  volatile int x[57000];
  x[0] = 0;
}

we generate:

foo:
        sub     sp, sp, #65536
        str     xzr, [sp, 1024]
        sub     sp, sp, #65536
        str     xzr, [sp, 1024]
        sub     sp, sp, #65536
        str     xzr, [sp, 1024]
        mov     x12, 31392
        sub     sp, sp, x12
        str     wzr, [sp]
        add     sp, sp, 2720
        add     sp, sp, 225280
        ret

with -O2 -fstack-clash-protection on GCC 10 or newer for guard page size of 64kb.

echesakov · 2020-11-04T18:59:42Z

@TamarChristinaArm I see, the scheme is quite different from what we do.

BruceForstall · 2020-11-06T18:42:20Z

Also note that reading from an unallocated stack area is explicitly prohibited when MTE (Memory Tagging) is enabled

I guess that's why your probes are str instead of ldr?

BruceForstall · 2020-11-06T18:46:20Z

Oh, I see; you always subtract 0x8000, but that's just for probing; when the actual SP subtract happens, it's of the actual required amount.

@echesakovMSFT could doing this cause sp to point beyond the guard pages such that some OS activity like an interrupt handler using this stack will crash if it reads/writes to the stack?

echesakov · 2020-11-07T19:35:57Z

Oh, I see; you always subtract 0x8000, but that's just for probing; when the actual SP subtract happens, it's of the actual required amount.

@echesakovMSFT could doing this cause sp to point beyond the guard pages such that some OS activity like an interrupt handler using this stack will crash if it reads/writes to the stack?

@BruceForstall No, since the sp never changes during the probe - I am using a scratch register to store a base of the location to probe and the immediate values in ldr to compute the exact address. However, as Tamar pointed out above, such method would violate the calling convention. In fact, the current implementation also violates the convention, so I am thinking how to re-design the algorithm so it would fit into our frame types model and wouldn't cause significant regressions.

TamarChristinaArm · 2020-11-09T08:47:14Z

Also note that reading from an unallocated stack area is explicitly prohibited when MTE (Memory Tagging) is enabled

I guess that's why your probes are str instead of ldr?

@BruceForstall no, I should have been more precise here. With MTE enabled the stack is colored based on who allocated the space. An unallocated stack space is uncolored and so any access of it is invalid. What invalid here means depends on the value of SCTLR_ELx.TCF but one possible mode is a synchronous data exception being raised.

There's no real particular reason why we used str in this case. Both str and ldr work out to about the same functionally and performance wise in this case.

….cpp

…n.cpp to CodeGen::genPushCalleeSavedRegisters in codegenarmarch.cpp

…runhelpers.h jitinterface.h

…L_TARGET and RBM_STACK_PROBE_HELPER_TRASH in target.h

….cpp

…the same as initReg we clear *pInitRegZeroed in codegenarmarch.cpp

…S src/coreclr/vm/arm64/asmhelpers.asm

… in src/coreclr/jit/codegencommon.cpp

…p src/coreclr/jit/codegencommon.cpp

…r/jit/codegen.h src/coreclr/jit/codegenarm.cpp src/coreclr/jit/codegencommon.cpp src/coreclr/jit/codegenxarch.cpp

…rc/coreclr/vm/jitinterface.cpp

…src/coreclr/vm/arm/asmhelpers.asm

… src/coreclr/jit/codegenarm.cpp src/coreclr/jit/codegenarm64.cpp

…/coreclr/jit/lclvars.cpp src/coreclr/jit/target.h

echesakov · 2021-02-26T23:31:53Z

Extracted refactoring changes to #48199
Will open PR with Arm64 implementation later

echesakov added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels Oct 10, 2020

echesakov self-assigned this Oct 10, 2020

runfoapp bot mentioned this pull request Oct 12, 2020

Builds legs getting abandoned due to agent notification issues #35223

Closed

echesakov force-pushed the Arm64-Implement-Jit-StackProbe-Helper branch from 4e0e02d to 70c14bf Compare October 12, 2020 20:05

echesakov mentioned this pull request Oct 20, 2020

[Arm64] Planned JIT work in .NET 6 #43629

Closed

29 tasks

echesakov force-pushed the Arm64-Implement-Jit-StackProbe-Helper branch 2 times, most recently from 4843124 to 0c46021 Compare October 30, 2020 00:43

echesakov marked this pull request as ready for review October 30, 2020 00:44

echesakov requested a review from BruceForstall November 2, 2020 22:55

BruceForstall requested changes Nov 3, 2020

View reviewed changes

janvorli reviewed Nov 3, 2020

View reviewed changes

This was referenced Nov 3, 2020

OSX deprovision jaredpar/runfo#41

Closed

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

echesakov added 12 commits January 21, 2021 17:48

Remove code under #ifdef/#endif that never executes in codegenarmarch…

de9cfa0

….cpp

Move call to genAllocLclFrame in CodeGen::genFnProlog in codegencommo…

40cac3c

…n.cpp to CodeGen::genPushCalleeSavedRegisters in codegenarmarch.cpp

Define JIT_StackProbe helper on all platforms in jithelpers.h readyto…

537bac2

…runhelpers.h jitinterface.h

Implement Arm64 JIT_StackProbe helper in asmhelpers.asm

f3f39b4

Implement Arm64 JIT_StackProbe helper in asmhelpers.S

6abb3ce

Display stack trace at stack overflow on Arm64 in excep.cpp

456ebb9

Define REG/RBM_STACK_PROBE_HELPER_ARG, REG/RBM_STACK_PROBE_HELPER_CAL…

95a307c

…L_TARGET and RBM_STACK_PROBE_HELPER_TRASH in target.h

Remove genAllocLclFrame and use stack probing helper on Arm64

4b5468d

Improve inlined stack probing instructions sequence in codegenarmarch…

c80961e

….cpp

Increase the size of "very large frame" in compiler.h

eb3b64d

Remove assertion and add proper logic to ensure that when tempReg is …

62851fa

…the same as initReg we clear *pInitRegZeroed in codegenarmarch.cpp

Rename PAGE_SIZE->PROBE_PAGE_SIZE in src/coreclr/vm/arm64/asmhelpers.…

2772cb2

…S src/coreclr/vm/arm64/asmhelpers.asm

echesakov force-pushed the Arm64-Implement-Jit-StackProbe-Helper branch from 91b927a to 2772cb2 Compare January 25, 2021 19:02

echesakov marked this pull request as draft January 25, 2021 19:07

echesakov added 6 commits January 28, 2021 13:58

Remove artifact from having a stack probing loop in the past on XArch…

b9822cb

… in src/coreclr/jit/codegencommon.cpp

Remove stack probing under sp on Arm in src/coreclr/jit/codegenarm.cp…

5f98ef8

…p src/coreclr/jit/codegencommon.cpp

Remove maskArgRegsLiveIn argument from genAllocLclFrame in src/corecl…

e255807

…r/jit/codegen.h src/coreclr/jit/codegenarm.cpp src/coreclr/jit/codegencommon.cpp src/coreclr/jit/codegenxarch.cpp

Remove getVeryLargeFrameSize() in src/coreclr/jit/compiler.h

16e4b75

In AOT scenarios the VM reports to the JIT the minimum page size in s…

c9c2ab3

…rc/coreclr/vm/jitinterface.cpp

Rename PAGE_SIZE->PROBE_PAGE_SIZE in src/coreclr/vm/arm/asmhelpers.S …

8aba2d5

…src/coreclr/vm/arm/asmhelpers.asm

echesakov mentioned this pull request Jan 29, 2021

arm64 skippage6.sh test fails to JIT code when page size > 4KB #42023

Closed

Add CodeGen::genEmitStackProbeHelperCall in src/coreclr/jit/codegen.h…

138faab

… src/coreclr/jit/codegenarm.cpp src/coreclr/jit/codegenarm64.cpp

echesakov mentioned this pull request Feb 3, 2021

[Arm64] Extend Compiler::lvaFrameAddress() and JIT to allow using SP as base register #47810

Open

Implement Arm64 Stack Probe in src/coreclr/jit/codegenarmarch.cpp src…

bcb4a74

…/coreclr/jit/lclvars.cpp src/coreclr/jit/target.h

echesakov force-pushed the Arm64-Implement-Jit-StackProbe-Helper branch from 48ae7d9 to bcb4a74 Compare February 5, 2021 23:25

echesakov mentioned this pull request Feb 12, 2021

Separate refactoring changes in 43250 #48199

Merged

echesakov closed this Feb 26, 2021

ghost locked as resolved and limited conversation to collaborators Mar 29, 2021

echesakov deleted the Arm64-Implement-Jit-StackProbe-Helper branch April 13, 2021 20:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Arm64] Implement stack probe helper #43250

[Arm64] Implement stack probe helper #43250

echesakov commented Oct 10, 2020 •

edited

Loading

echesakov commented Oct 30, 2020

BruceForstall commented Nov 3, 2020

BruceForstall commented Nov 3, 2020

echesakov commented Nov 3, 2020

BruceForstall commented Nov 3, 2020

echesakov commented Nov 3, 2020 •

edited

Loading

BruceForstall left a comment

BruceForstall Nov 3, 2020

echesakov Nov 3, 2020

BruceForstall Nov 3, 2020

BruceForstall Nov 3, 2020

janvorli Nov 3, 2020

BruceForstall Nov 3, 2020

BruceForstall Nov 3, 2020

BruceForstall Nov 3, 2020

BruceForstall Nov 3, 2020

BruceForstall Nov 3, 2020

BruceForstall Nov 3, 2020

janvorli Nov 3, 2020

janvorli Nov 3, 2020

TamarChristinaArm commented Nov 3, 2020

echesakov commented Nov 3, 2020

TamarChristinaArm commented Nov 4, 2020

echesakov commented Nov 4, 2020

TamarChristinaArm commented Nov 4, 2020

echesakov commented Nov 4, 2020

BruceForstall commented Nov 6, 2020

BruceForstall commented Nov 6, 2020

echesakov commented Nov 7, 2020

TamarChristinaArm commented Nov 9, 2020

echesakov commented Feb 26, 2021

		@@ -8944,4 +8944,63 @@ void CodeGen::genProfilingLeaveCallback(unsigned helper)

		#endif // PROFILING_SUPPORTED

		/*-----------------------------------------------------------------------------


		int totalFrameSize = genTotalFrameSize();

		bool useStackProbeHelper = false;

[Arm64] Implement stack probe helper #43250

[Arm64] Implement stack probe helper #43250

Conversation

echesakov commented Oct 10, 2020 • edited Loading

echesakov commented Oct 30, 2020

BruceForstall commented Nov 3, 2020

BruceForstall commented Nov 3, 2020

echesakov commented Nov 3, 2020

BruceForstall commented Nov 3, 2020

echesakov commented Nov 3, 2020 • edited Loading

BruceForstall left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

TamarChristinaArm commented Nov 3, 2020

echesakov commented Nov 3, 2020

TamarChristinaArm commented Nov 4, 2020

echesakov commented Nov 4, 2020

TamarChristinaArm commented Nov 4, 2020

echesakov commented Nov 4, 2020

BruceForstall commented Nov 6, 2020

BruceForstall commented Nov 6, 2020

echesakov commented Nov 7, 2020

TamarChristinaArm commented Nov 9, 2020

echesakov commented Feb 26, 2021

echesakov commented Oct 10, 2020 •

edited

Loading

echesakov commented Nov 3, 2020 •

edited

Loading