Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[draft] Use csel for conditional moves #67286

Closed
wants to merge 1 commit into from
Closed

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Mar 29, 2022

Early prototype.

Detect an If-branch and a following block.
Merge the two blocks, and conditionally execute using csel.

Currently only the given example can be optimised.
Else cases and additional conditions within If will not be caught.

Take this C# function:

static void TransformsGtSingle(uint op1, uint op2) {
  if (op1 > 0) {
    op1 = 5;
  }
  Consume(op1, op2);
}

Code before patch:

IN0005: 000000  stp     fp, lr, [sp,#-16]!
IN0006: 000004  mov     fp, sp
IN0001: 000008  cmp     w0, #0
IN0002: 00000C  beq     G_M46575_IG04
IN0003: 000010  mov     w0, #5
IN0004: .....

Code after patch:

IN0009: 000000  stp     fp, lr, [sp,#-16]!
IN000a: 000004  mov     fp, sp
IN0001: 000008  cmp     w0, #0
IN0002: 00000C  mov     w2, #5
IN0003: 000010  csel    w0, w2, w0, ne
IN0004: .....

Lowered IR before patch:

------------ BB01 [000..004) -> BB03 (cond), preds={} succs={BB02,BB03}
N003 (???,???) [000018] ------------                 IL_OFFSET void   INLRT @ 0x000[E-] REG NA
N005 (  1,  1) [000000] ------------         t0 =    LCL_VAR   int    V00 arg0         u:1 x0 REG x0 $80
N007 (  1,  2) [000001] -c----------         t1 =    CNS_INT   int    0 REG NA $40
                                                  /--*  t0     int
                                                  +--*  t1     int
N009 (  3,  4) [000002] N------N----              *  EQ        void   REG NA $100
N011 (  5,  6) [000003] ------------              *  JTRUE     void   REG NA $VN.Void

------------ BB02 [004..007), preds={BB01} succs={BB03}
N015 (???,???) [000019] ------------                 IL_OFFSET void   INLRT @ 0x004[E-] REG NA
N017 (  1,  2) [000008] ------------         t8 =    CNS_INT   int    5 REG x0 $43
                                                  /--*  t8     int
N019 (  1,  3) [000010] DA----------              *  STORE_LCL_VAR int    V00 arg0         d:3 x0 REG x0

------------ BB03 [007..00F) (return), preds={BB01,BB02} succs={}
N023 (???,???) [000020] ------------                 IL_OFFSET void   INLRT @ 0x007[E-] REG NA
....

Lowered IR after patch:

------------ BB01 [000..00F) (return), preds={} succs={}
N003 (???,???) [000018] ------------                 IL_OFFSET void   INLRT @ 0x000[E-] REG NA
N005 (  1,  1) [000000] ------------         t0 =    LCL_VAR   int    V00 arg0         u:1 x0 REG x0 $80
N007 (  1,  2) [000001] -c----------         t1 =    CNS_INT   int    0 REG NA $40
                                                  /--*  t0     int
                                                  +--*  t1     int
N009 (  3,  4) [000002] N------N----              *  EQ        void   REG NA $100
N011 (???,???) [000019] ------------                 IL_OFFSET void   INLRT @ 0x004[E-] REG NA
N013 (  1,  2) [000008] ------------         t8 =    CNS_INT   int    5 REG x2 $43
N015 (???,???) [000022] ------------        t22 =    LCL_VAR   int    V00 arg0          x0 (last use) REG x0
                                                  /--*  t8     int
                                                  +--*  t22    int
N017 (  1,  3) [000010] DA----------              *  CSTORE_LCL_VAR int    V00 arg0         d:3 x0 REG x0
N019 (???,???) [000020] ------------                 IL_OFFSET void   INLRT @ 0x007[E-] REG NA
....

Early prototype.

Detect an If-branch and a following block.
Merge the two blocks, and conditionally execute using csel.

Currently only the given example can be optimised.
Else cases and additional condtions within If will not be caught.

Take this C# function:
static void TransformsGtSingle(uint op1, uint op2) {
  if (op1 > 0) {
    op1 = 5;
  }
  Consume(op1, op2);
}

Code before patch:
IN0005: 000000  stp     fp, lr, [sp,#-16]!
IN0006: 000004  mov     fp, sp
IN0001: 000008  cmp     w0, #0
IN0002: 00000C  beq     G_M46575_IG04
IN0003: 000010  mov     w0, #5
IN0004: .....

Code after patch:
IN0009: 000000  stp     fp, lr, [sp,#-16]!
IN000a: 000004  mov     fp, sp
IN0001: 000008  cmp     w0, #0
IN0002: 00000C  mov     w2, #5
IN0003: 000010  csel    w0, w2, w0, ne
IN0004: .....

Lowered IR before patch:
------------ BB01 [000..004) -> BB03 (cond), preds={} succs={BB02,BB03}
N003 (???,???) [000018] ------------                 IL_OFFSET void   INLRT @ 0x000[E-] REG NA
N005 (  1,  1) [000000] ------------         t0 =    LCL_VAR   int    V00 arg0         u:1 x0 REG x0 $80
N007 (  1,  2) [000001] -c----------         t1 =    CNS_INT   int    0 REG NA $40
                                                  /--*  t0     int
                                                  +--*  t1     int
N009 (  3,  4) [000002] N------N----              *  EQ        void   REG NA $100
N011 (  5,  6) [000003] ------------              *  JTRUE     void   REG NA $VN.Void

------------ BB02 [004..007), preds={BB01} succs={BB03}
N015 (???,???) [000019] ------------                 IL_OFFSET void   INLRT @ 0x004[E-] REG NA
N017 (  1,  2) [000008] ------------         t8 =    CNS_INT   int    5 REG x0 $43
                                                  /--*  t8     int
N019 (  1,  3) [000010] DA----------              *  STORE_LCL_VAR int    V00 arg0         d:3 x0 REG x0

------------ BB03 [007..00F) (return), preds={BB01,BB02} succs={}
N023 (???,???) [000020] ------------                 IL_OFFSET void   INLRT @ 0x007[E-] REG NA
....

Lowered IR after patch:
------------ BB01 [000..00F) (return), preds={} succs={}
N003 (???,???) [000018] ------------                 IL_OFFSET void   INLRT @ 0x000[E-] REG NA
N005 (  1,  1) [000000] ------------         t0 =    LCL_VAR   int    V00 arg0         u:1 x0 REG x0 $80
N007 (  1,  2) [000001] -c----------         t1 =    CNS_INT   int    0 REG NA $40
                                                  /--*  t0     int
                                                  +--*  t1     int
N009 (  3,  4) [000002] N------N----              *  EQ        void   REG NA $100
N011 (???,???) [000019] ------------                 IL_OFFSET void   INLRT @ 0x004[E-] REG NA
N013 (  1,  2) [000008] ------------         t8 =    CNS_INT   int    5 REG x2 $43
N015 (???,???) [000022] ------------        t22 =    LCL_VAR   int    V00 arg0          x0 (last use) REG x0
                                                  /--*  t8     int
                                                  +--*  t22    int
N017 (  1,  3) [000010] DA----------              *  CSTORE_LCL_VAR int    V00 arg0         d:3 x0 REG x0
N019 (???,???) [000020] ------------                 IL_OFFSET void   INLRT @ 0x007[E-] REG NA
....

RUNTIME_BUILD: -s clr.jit

Change-Id: Ie9974f031bd34b8a1bfe77247f8364f026593ce8
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Mar 29, 2022
@dotnet-issue-labeler dotnet-issue-labeler bot added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed community-contribution Indicates that the PR has been added by a community member labels Mar 29, 2022
@ghost
Copy link

ghost commented Mar 29, 2022

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Early prototype.

Detect an If-branch and a following block.
Merge the two blocks, and conditionally execute using csel.

Currently only the given example can be optimised.
Else cases and additional conditions within If will not be caught.

Take this C# function:
static void TransformsGtSingle(uint op1, uint op2) {
if (op1 > 0) {
op1 = 5;
}
Consume(op1, op2);
}

Code before patch:
IN0005: 000000 stp fp, lr, [sp,#-16]!
IN0006: 000004 mov fp, sp
IN0001: 000008 cmp w0, #0
IN0002: 00000C beq G_M46575_IG04
IN0003: 000010 mov w0, #5
IN0004: .....

Code after patch:
IN0009: 000000 stp fp, lr, [sp,#-16]!
IN000a: 000004 mov fp, sp
IN0001: 000008 cmp w0, #0
IN0002: 00000C mov w2, #5
IN0003: 000010 csel w0, w2, w0, ne
IN0004: .....

Lowered IR before patch:
------------ BB01 [000..004) -> BB03 (cond), preds={} succs={BB02,BB03}
N003 (???,???) [000018] ------------ IL_OFFSET void INLRT @ 0x000[E-] REG NA
N005 ( 1, 1) [000000] ------------ t0 = LCL_VAR int V00 arg0 u:1 x0 REG x0 $80
N007 ( 1, 2) [000001] -c---------- t1 = CNS_INT int 0 REG NA $40
/--* t0 int
+--* t1 int
N009 ( 3, 4) [000002] N------N---- * EQ void REG NA $100
N011 ( 5, 6) [000003] ------------ * JTRUE void REG NA $VN.Void

------------ BB02 [004..007), preds={BB01} succs={BB03}
N015 (???,???) [000019] ------------ IL_OFFSET void INLRT @ 0x004[E-] REG NA
N017 ( 1, 2) [000008] ------------ t8 = CNS_INT int 5 REG x0 $43
/--* t8 int
N019 ( 1, 3) [000010] DA---------- * STORE_LCL_VAR int V00 arg0 d:3 x0 REG x0

------------ BB03 [007..00F) (return), preds={BB01,BB02} succs={}
N023 (???,???) [000020] ------------ IL_OFFSET void INLRT @ 0x007[E-] REG NA
....

Lowered IR after patch:
------------ BB01 [000..00F) (return), preds={} succs={}
N003 (???,???) [000018] ------------ IL_OFFSET void INLRT @ 0x000[E-] REG NA
N005 ( 1, 1) [000000] ------------ t0 = LCL_VAR int V00 arg0 u:1 x0 REG x0 $80
N007 ( 1, 2) [000001] -c---------- t1 = CNS_INT int 0 REG NA $40
/--* t0 int
+--* t1 int
N009 ( 3, 4) [000002] N------N---- * EQ void REG NA $100
N011 (???,???) [000019] ------------ IL_OFFSET void INLRT @ 0x004[E-] REG NA
N013 ( 1, 2) [000008] ------------ t8 = CNS_INT int 5 REG x2 $43
N015 (???,???) [000022] ------------ t22 = LCL_VAR int V00 arg0 x0 (last use) REG x0
/--* t8 int
+--* t22 int
N017 ( 1, 3) [000010] DA---------- * CSTORE_LCL_VAR int V00 arg0 d:3 x0 REG x0
N019 (???,???) [000020] ------------ IL_OFFSET void INLRT @ 0x007[E-] REG NA
....

Author: a74nh
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@dnfadmin
Copy link

dnfadmin commented Mar 29, 2022

CLA assistant check
All CLA requirements met.

@a74nh a74nh marked this pull request as draft March 29, 2022 09:34
@a74nh
Copy link
Contributor Author

a74nh commented Mar 29, 2022

@kunalspathak

Comment on lines +3304 to +3305
// This inherits from GenTreeOp because CSTORE_LCL_VAR has 2 ops
struct GenTreeLclVarCommon : public GenTreeOp
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a viable approach. It increases the size of small nodes by one pointer (80 -> 88 bytes on 64 bit hosts).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not a viable approach. It increases the size of small nodes by one pointer (80 -> 88 bytes on 64 bit hosts).

If that's the case then that also gets rid of my idea of just using STORE_LCL_VAR nodes for conditional stores (STORE_LCL_VAR would then just check in the emit if a conditional flag is set).

Ok then, I think I can just make a new class:
struct GenTreeCondLclVar : public GenTreeLclVarCommon

Copy link
Contributor

@SingleAccretion SingleAccretion Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the reason for attaching this to local stores at all?

I would expect you can just define a new GT_SELECT-like oper, that uses (implicitly, like GT_JCC/GT_SETCC) flags to produce a value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My original instinct was to avoid adding new nodes into the graph, as I was assuming that would result in additional instructions being generated (a csel for the new node, then a mov for the store). But, I'm now realising the register allocator would allocate correctly and the lcl var store would end up as no instruction.

Also, I think my code isn't going to work in cases where the STORE_LCL_VAR needed to store to memory (which probably explains why the c# libraries fail to build)

Copy link
Contributor

@SingleAccretion SingleAccretion Mar 29, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But, I'm now realising the register allocator would allocate correctly and the lcl var store would end up as no instruction.

Right. "Combining" stores to locals with instructions, if taken to its logical conclusion, would mean adding local stores for all opers, STORE_LCL_VAR_[ADD|SUB|MUL...], so that's not something we should do.

(Well -- as a little history detour -- we used to have legacy codegen that did that, but it was mainly motivated by the RMW semantics of most x86 instructions)

@a74nh
Copy link
Contributor Author

a74nh commented Mar 29, 2022

Not sure if this should be marked as WIP or draft. Right now I'm just looking for guidance that I'm heading in the right direction (thanks @SingleAccretion for the comments!).

I'll be away for a week from tomorrow, there will be node code updates until after then.

@teo-tsirpanis
Copy link
Contributor

Is it OK to close it @a74nh? It is superseded by #67894.

@a74nh
Copy link
Contributor Author

a74nh commented Apr 12, 2022

Done :)

@a74nh a74nh closed this Apr 12, 2022
@ghost ghost locked as resolved and limited conversation to collaborators Jun 26, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants