Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(byte[], (int, int)) struct spilled onto the stack when it doesn't have to #91517

Open
neon-sunset opened this issue Sep 3, 2023 · 3 comments
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@neon-sunset
Copy link
Contributor

neon-sunset commented Sep 3, 2023

Description

Given a struct with field layout (byte[], (int, int)), it appears that such struct is spilled onto the stack in many situations where it could have been kept in registers entirely on platforms with ABI supporting multi-reg returns (e.g. SysV or osx-arm64).

Configuration

.NET SDK:
 Version:   8.0.100-rc.2.23451.1
 Commit:    dabc29073d

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  14.0
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /usr/local/share/dotnet/sdk/8.0.100-rc.2.23451.1/

Regression?

No.

Steps to reproduce

  1. Clone https://github.com/neon-sunset/U8StrRepro with submodules (git clone --recurse-submodules https://github.com/neon-sunset/U8StrRepro)
  2. cd U8StrRepro/Repro1
  3. DOTNET_JitDisasm='StripSpace' dotnet run -c release

Analysis

Given methods accepting a struct of layout (byte[], (int, int)):

[MethodImpl(MethodImplOptions.AggressiveOptimization)]
private static U8String StripSpace(U8String str)
{
    return str.Remove(' ');
}
...
partial readonly struct U8String
{
    public U8String Remove(char value) => U8Manipulation.Remove(this, value);
}

.NET currently emits:

; Assembly listing for method Program:StripSpace(U8Primitives.U8String):U8Primitives.U8String (Tier1)
G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x40]!
            mov     fp, sp
            str     xzr, [fp, #0x20]    // 
            str     xzr, [fp, #0x10]    // 
            stp     x0, x1, [fp, #0x30] // , 
 
G_M000_IG02:                ;; offset=0x0014
            ldp     x0, x1, [fp, #0x30] // , 
            mov     w2, #32
            movz    x3, #0xE970
            movk    x3, #0x4C4 LSL #16
            movk    x3, #1 LSL #32
            ldr     x3, [x3]
            blr     x3
            stp     x0, x1, [fp, #0x10] // , 
 
G_M000_IG03:                ;; offset=0x0034
            ldp     x0, x1, [fp, #0x10]
            stp     x0, x1, [fp, #0x20]
 
G_M000_IG04:                ;; offset=0x003C
            ldp     x0, x1, [fp, #0x20] // , 
 
G_M000_IG05:                ;; offset=0x0040
            ldp     fp, lr, [sp], #0x40
            ret     lr
 
; Total bytes of code 72

As we can see, there are (unnecessary) stores and loads to/from stack of the struct. If my understanding is correct, they can be omitted:

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x40]!
            mov     fp, sp

G_M000_IG02:                ;; offset=0x0014
            mov     w2, #32
            movz    x3, #0xE970
            movk    x3, #0x4C4 LSL #16
            movk    x3, #1 LSL #32
            ldr     x3, [x3]
            blr     x3 ; sets x0, x1 regs which we do not need to store from/load to
 
G_M000_IG03:                ;; offset=0x0040
            ldp     fp, lr, [sp], #0x40
            ret     lr
@neon-sunset neon-sunset added the tenet-performance Performance related issue label Sep 3, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 3, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 3, 2023
@ghost
Copy link

ghost commented Sep 3, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Given a struct with field layout (byte[], (int, int)), it appears that such struct is spilled onto the stack in many situations where it could have been kept in registers entirely on platforms with ABI supporting multi-reg returns (e.g. SysV or osx-arm64).

Configuration

.NET SDK:
 Version:   8.0.100-rc.2.23451.1
 Commit:    dabc29073d

Runtime Environment:
 OS Name:     Mac OS X
 OS Version:  14.0
 OS Platform: Darwin
 RID:         osx-arm64
 Base Path:   /usr/local/share/dotnet/sdk/8.0.100-rc.2.23451.1/

Regression?

No.

Steps to reproduce

  1. Clone https://github.com/neon-sunset/U8StrRepro with submodules (git clone --recurse-submodules https://github.com/neon-sunset/U8StrRepro)
  2. cd U8StrRepro/Repro1
  3. DOTNET_JitDisasm='SplitSpace' dotnet run -c release

Analysis

Given methods accepting a struct of layout (byte[], (int, int)):

[MethodImpl(MethodImplOptions.AggressiveOptimization)]
private static U8String StripSpace(U8String str)
{
    return str.Remove(' ');
}
...
partial readonly struct U8String
{
    public U8String Remove(char value) => U8Manipulation.Remove(this, value);
}

.NET currently emits:

; Assembly listing for method Program:StripSpace(U8Primitives.U8String):U8Primitives.U8String (Tier1)
G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x40]!
            mov     fp, sp
            str     xzr, [fp, #0x20]    // 
            str     xzr, [fp, #0x10]    // 
            stp     x0, x1, [fp, #0x30] // , 
 
G_M000_IG02:                ;; offset=0x0014
            ldp     x0, x1, [fp, #0x30] // , 
            mov     w2, #32
            movz    x3, #0xE970
            movk    x3, #0x4C4 LSL #16
            movk    x3, #1 LSL #32
            ldr     x3, [x3]
            blr     x3
            stp     x0, x1, [fp, #0x10] // , 
 
G_M000_IG03:                ;; offset=0x0034
            ldp     x0, x1, [fp, #0x10]
            stp     x0, x1, [fp, #0x20]
 
G_M000_IG04:                ;; offset=0x003C
            ldp     x0, x1, [fp, #0x20] // , 
 
G_M000_IG05:                ;; offset=0x0040
            ldp     fp, lr, [sp], #0x40
            ret     lr
 
; Total bytes of code 72

As we can see, there are (unnecessary) stores and loads to/from stack of the struct. If my understanding is correct, they can be omitted:

G_M000_IG01:                ;; offset=0x0000
            stp     fp, lr, [sp, #-0x40]!
            mov     fp, sp

G_M000_IG02:                ;; offset=0x0014
            mov     w2, #32
            movz    x3, #0xE970
            movk    x3, #0x4C4 LSL #16
            movk    x3, #1 LSL #32
            ldr     x3, [x3]
            blr     x3 ; sets x0, x1 regs which we do not need to store from/load to
 
G_M000_IG03:                ;; offset=0x0040
            ldp     fp, lr, [sp], #0x40
            ret     lr
Author: neon-sunset
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr

Milestone: -

@neon-sunset neon-sunset changed the title (byte[], (int, int)) struct spilled onto the stack when it doesn't have to (byte[], (int, int)) struct spilled onto the stack when it doesn't have to Sep 3, 2023
@JulieLeeMSFT
Copy link
Member

cc @jakobbotsch.

@JulieLeeMSFT JulieLeeMSFT added this to the Future milestone Sep 6, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Sep 6, 2023
@jakobbotsch
Copy link
Member

jakobbotsch commented Sep 7, 2023

This is sort of a long-standing issue -- the JIT does not have a first class representation that allows it to keep multiple struct fields in a single register, which is sort of what would be required here. Currently we are only able to keep multireg args in registers if the fields map cleanly into the registers (so e.g. Span<T> can be kept in registers on arm64). In this case we don't even promote the struct, which is also a necessary prerequisite to avoid the stack spills.

Physical promotion might be able to handle this case with some enhancements:

  1. ABI constraints should induce accesses -- e.g. in this case we would induce a TYP_REF @ 0 access, and a TYP_LONG @ 8 access
  2. Physical promotion needs to produce IR that is compatible with multireg arg passing (e.g. FIELD_LIST), or we need to optimize the existing pattern produced in the backend (hard without struct liveness)
  3. Lowering/LSRA needs to learn how to optimize the patterns produced at the beginning of functions to directly access parameter registers without spills (essentially what is described here and here)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

3 participants