Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HVA structs can be unpacked without spilling to stack #96372

Open
neon-sunset opened this issue Dec 30, 2023 · 4 comments
Open

HVA structs can be unpacked without spilling to stack #96372

neon-sunset opened this issue Dec 30, 2023 · 4 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Milestone

Comments

@neon-sunset
Copy link
Contributor

Description

Currently, when passing structs or their member that are HVA of (int, int), (short, short, short, short), etc., .NET spills them to stack first and then loads them back before performing further operations.

This, however, can be improved by avoiding spilling by extracting contents of the structs passed in such way via bitwise operations.

Configuration

.NET 8.0.100

Regression?

No

Data

Given

static int Sum(HVA32 value) => value.A + value.B;
static long Sum(HVA64 value) => value.A + value.B;

record struct HVA32(int A, int B);
record struct HVA64(long A, long B);

The produced assembly is

Sum(HVA32)

G_M56534_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x20]!
            mov     fp, sp
            str     x0, [fp, #0x18]	// [V00 arg0]
G_M56534_IG02:  ;; offset=0x000C
            ldp     w0, w1, [fp, #0x18]	// [V00 arg0], [V00 arg0+0x04]
            add     w0, w0, w1
G_M56534_IG03:  ;; offset=0x0014
            ldp     fp, lr, [sp], #0x20
            ret     lr
; Total bytes of code: 28

Sum(HVA64)

G_M14860_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
G_M14860_IG02:  ;; offset=0x0008
            add     x0, x0, x1
G_M14860_IG03:  ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 20

Analysis

In the data above, HVA64 variant has ideal codegen because its members are passed in separate regisers.
HVA32 causes spill and load. Ideally, it would be nice to see the following emitted for it instead:

G_M49852_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
G_M49852_IG02:  ;; offset=0x0008
            asr     x1, x0, #32
            add     w0, w1, w0
G_M49852_IG03:  ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr

This can be manually replicated with

static int Sum(long packed)
{
    var a = (int)(packed >> 32);
    var b = (int)packed;

    return a + b;
}

but leads to other codegen issues and is not usable as a micro-optimization as a result.

Thanks!

@neon-sunset neon-sunset added the tenet-performance Performance related issue label Dec 30, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 30, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Dec 30, 2023
@ghost
Copy link

ghost commented Dec 30, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Currently, when passing structs or their member that are HVA of (int, int), (short, short, short, short), etc., .NET spills them to stack first and then loads them back before performing further operations.

This, however, can be improved by avoiding spilling by extracting contents of the structs passed in such way via bitwise operations.

Configuration

.NET 8.0.100

Regression?

No

Data

Given

static int Sum(HVA32 value) => value.A + value.B;
static long Sum(HVA64 value) => value.A + value.B;

record struct HVA32(int A, int B);
record struct HVA64(long A, long B);

The produced assembly is

Sum(HVA32)

G_M56534_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x20]!
            mov     fp, sp
            str     x0, [fp, #0x18]	// [V00 arg0]
G_M56534_IG02:  ;; offset=0x000C
            ldp     w0, w1, [fp, #0x18]	// [V00 arg0], [V00 arg0+0x04]
            add     w0, w0, w1
G_M56534_IG03:  ;; offset=0x0014
            ldp     fp, lr, [sp], #0x20
            ret     lr
; Total bytes of code: 28

Sum(HVA64)

G_M14860_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
G_M14860_IG02:  ;; offset=0x0008
            add     x0, x0, x1
G_M14860_IG03:  ;; offset=0x000C
            ldp     fp, lr, [sp], #0x10
            ret     lr
; Total bytes of code: 20

Analysis

In the data above, HVA64 variant has ideal codegen because its members are passed in separate regisers.
HVA32 causes spill and load. Ideally, it would be nice to see the following emitted for it instead:

G_M49852_IG01:  ;; offset=0x0000
            stp     fp, lr, [sp, #-0x10]!
            mov     fp, sp
G_M49852_IG02:  ;; offset=0x0008
            asr     x1, x0, #32
            add     w0, w1, w0
G_M49852_IG03:  ;; offset=0x0010
            ldp     fp, lr, [sp], #0x10
            ret     lr

This can be manually replicated with

static int Sum(long packed)
{
    var a = (int)(packed >> 32);
    var b = (int)packed;

    return a + b;
}

but leads to other codegen issues and is not usable as a micro-optimization as a result.

Thanks!

Author: neon-sunset
Assignees: -
Labels:

tenet-performance, area-CodeGen-coreclr

Milestone: -

@EgorBo
Copy link
Member

EgorBo commented Dec 30, 2023

I think we discussed independent promotion many times, but I don't see any existing issue to track it, perhaps, @jakobbotsch knows.

@MichalPetryka
Copy link
Contributor

Related to #89374 I think.

@jakobbotsch
Copy link
Member

Yeah, this, #11992, #89374, #91517 and likely a bunch other CQ issues would require some new fundamental handling of structs in the backend. It's one of the stretch goals of #93105. I do not expect to work towards that with the old promotion scheme, but long term I do want us to do better here. #92026 was an experiment trying out one representation to work towards it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI tenet-performance Performance related issue
Projects
None yet
Development

No branches or pull requests

4 participants