-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[draft] Use csel for conditional moves #67286
Conversation
Early prototype. Detect an If-branch and a following block. Merge the two blocks, and conditionally execute using csel. Currently only the given example can be optimised. Else cases and additional condtions within If will not be caught. Take this C# function: static void TransformsGtSingle(uint op1, uint op2) { if (op1 > 0) { op1 = 5; } Consume(op1, op2); } Code before patch: IN0005: 000000 stp fp, lr, [sp,#-16]! IN0006: 000004 mov fp, sp IN0001: 000008 cmp w0, #0 IN0002: 00000C beq G_M46575_IG04 IN0003: 000010 mov w0, #5 IN0004: ..... Code after patch: IN0009: 000000 stp fp, lr, [sp,#-16]! IN000a: 000004 mov fp, sp IN0001: 000008 cmp w0, #0 IN0002: 00000C mov w2, #5 IN0003: 000010 csel w0, w2, w0, ne IN0004: ..... Lowered IR before patch: ------------ BB01 [000..004) -> BB03 (cond), preds={} succs={BB02,BB03} N003 (???,???) [000018] ------------ IL_OFFSET void INLRT @ 0x000[E-] REG NA N005 ( 1, 1) [000000] ------------ t0 = LCL_VAR int V00 arg0 u:1 x0 REG x0 $80 N007 ( 1, 2) [000001] -c---------- t1 = CNS_INT int 0 REG NA $40 /--* t0 int +--* t1 int N009 ( 3, 4) [000002] N------N---- * EQ void REG NA $100 N011 ( 5, 6) [000003] ------------ * JTRUE void REG NA $VN.Void ------------ BB02 [004..007), preds={BB01} succs={BB03} N015 (???,???) [000019] ------------ IL_OFFSET void INLRT @ 0x004[E-] REG NA N017 ( 1, 2) [000008] ------------ t8 = CNS_INT int 5 REG x0 $43 /--* t8 int N019 ( 1, 3) [000010] DA---------- * STORE_LCL_VAR int V00 arg0 d:3 x0 REG x0 ------------ BB03 [007..00F) (return), preds={BB01,BB02} succs={} N023 (???,???) [000020] ------------ IL_OFFSET void INLRT @ 0x007[E-] REG NA .... Lowered IR after patch: ------------ BB01 [000..00F) (return), preds={} succs={} N003 (???,???) [000018] ------------ IL_OFFSET void INLRT @ 0x000[E-] REG NA N005 ( 1, 1) [000000] ------------ t0 = LCL_VAR int V00 arg0 u:1 x0 REG x0 $80 N007 ( 1, 2) [000001] -c---------- t1 = CNS_INT int 0 REG NA $40 /--* t0 int +--* t1 int N009 ( 3, 4) [000002] N------N---- * EQ void REG NA $100 N011 (???,???) [000019] ------------ IL_OFFSET void INLRT @ 0x004[E-] REG NA N013 ( 1, 2) [000008] ------------ t8 = CNS_INT int 5 REG x2 $43 N015 (???,???) [000022] ------------ t22 = LCL_VAR int V00 arg0 x0 (last use) REG x0 /--* t8 int +--* t22 int N017 ( 1, 3) [000010] DA---------- * CSTORE_LCL_VAR int V00 arg0 d:3 x0 REG x0 N019 (???,???) [000020] ------------ IL_OFFSET void INLRT @ 0x007[E-] REG NA .... RUNTIME_BUILD: -s clr.jit Change-Id: Ie9974f031bd34b8a1bfe77247f8364f026593ce8
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsEarly prototype. Detect an If-branch and a following block. Currently only the given example can be optimised. Take this C# function: Code before patch: Code after patch: Lowered IR before patch: ------------ BB02 [004..007), preds={BB01} succs={BB03} ------------ BB03 [007..00F) (return), preds={BB01,BB02} succs={} Lowered IR after patch:
|
// This inherits from GenTreeOp because CSTORE_LCL_VAR has 2 ops | ||
struct GenTreeLclVarCommon : public GenTreeOp |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a viable approach. It increases the size of small nodes by one pointer (80 -> 88 bytes on 64 bit hosts).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not a viable approach. It increases the size of small nodes by one pointer (80 -> 88 bytes on 64 bit hosts).
If that's the case then that also gets rid of my idea of just using STORE_LCL_VAR nodes for conditional stores (STORE_LCL_VAR would then just check in the emit if a conditional flag is set).
Ok then, I think I can just make a new class:
struct GenTreeCondLclVar : public GenTreeLclVarCommon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason for attaching this to local stores at all?
I would expect you can just define a new GT_SELECT
-like oper, that uses (implicitly, like GT_JCC/GT_SETCC
) flags to produce a value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My original instinct was to avoid adding new nodes into the graph, as I was assuming that would result in additional instructions being generated (a csel for the new node, then a mov for the store). But, I'm now realising the register allocator would allocate correctly and the lcl var store would end up as no instruction.
Also, I think my code isn't going to work in cases where the STORE_LCL_VAR needed to store to memory (which probably explains why the c# libraries fail to build)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But, I'm now realising the register allocator would allocate correctly and the lcl var store would end up as no instruction.
Right. "Combining" stores to locals with instructions, if taken to its logical conclusion, would mean adding local stores for all opers, STORE_LCL_VAR_[ADD|SUB|MUL...]
, so that's not something we should do.
(Well -- as a little history detour -- we used to have legacy codegen that did that, but it was mainly motivated by the RMW semantics of most x86 instructions)
Not sure if this should be marked as WIP or draft. Right now I'm just looking for guidance that I'm heading in the right direction (thanks @SingleAccretion for the comments!). I'll be away for a week from tomorrow, there will be node code updates until after then. |
Done :) |
Early prototype.
Detect an If-branch and a following block.
Merge the two blocks, and conditionally execute using csel.
Currently only the given example can be optimised.
Else cases and additional conditions within If will not be caught.
Take this C# function:
Code before patch:
Code after patch:
Lowered IR before patch:
Lowered IR after patch: