-
Notifications
You must be signed in to change notification settings - Fork 2.7k
JIT: Suppress emitting same-reg zero extending move #22454
JIT: Suppress emitting same-reg zero extending move #22454
Conversation
Add a peephole optimization to suppress emitting zero extending moves if the previous instruction has already done a suitable zero extension. Only implemented for x64 currently. Closes #21923
@BruceForstall PTAL Jit diffs shows:
sample diff (on example from #21923) G_M29305_IG02:
mov rax, bword ptr [rdx]
mov edx, dword ptr [rdx+8]
- mov edx, edx
shl rdx, 2
cmp rdx, 0xD1FFAB1E
ja SHORT G_M29305_IG04 |
Probably fixes https://github.com/dotnet/coreclr/issues/17963 also? |
Although that might need a double look back :-/ |
We'd get 2 out of 3, not too bad. movzx r8d,word ptr [rax]
add rax,2
mov r8d,r8d ; pointless mov Looking back further is certainly an option. The first prototype (which was almost certainly a bit too aggressive) was getting about 4x as many cases -- it assumed the upstream producer would properly zero extend. |
Thanks, Andy. |
src/jit/codegenxarch.cpp
Outdated
@@ -3112,7 +3112,7 @@ void CodeGen::genCodeForCpBlkRepMovs(GenTreeBlk* cpBlkNode) | |||
else | |||
#endif | |||
{ | |||
#ifdef _TARGET_X64_ | |||
#ifdef _TARGET_AMD64_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks unrelated; separate PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Removed it from this PR.
src/jit/codegenxarch.cpp
Outdated
@@ -6426,6 +6428,14 @@ void CodeGen::genIntToIntCast(GenTreeCast* cast) | |||
break; | |||
#ifdef _TARGET_64BIT_ | |||
case GenIntCastDesc::ZERO_EXTEND_INT: | |||
// We can skip emitting this zero extending move if the previous instruction zero extended implicitly | |||
if ((srcReg == dstReg) && (emit->emitCurIGinsCnt > 0) && compiler->opts.OptimizationEnabled()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not fond of leaking emitLastIns
out of the emitter. Maybe this should call a function like emitIsLastInsCall(), say emitIsLastInsZeroExtendingWrite(srcReg)
. Then this code could be:
canSkip = (srcReg == dstReg) && compiler->opts.OptimizationEnabled() && emit->emitIsLastInsZeroExtendingWrite(srcReg);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, let me update.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
emitAreUpper32bitZero
might be better, it doesn't matter how they ended up being zero, just that they are.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree -- I'll go with that.
src/jit/emitxarch.cpp
Outdated
@@ -145,6 +145,31 @@ bool emitter::IsDstSrcSrcAVXInstruction(instruction ins) | |||
return ((CodeGenInterface::instInfo[ins] & INS_Flags_IsDstSrcSrcAVXInstruction) != 0) && IsAVXInstruction(ins); | |||
} | |||
|
|||
bool emitter::doesZeroExtendingWrite(instrDesc* id, regNumber reg) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Needs a function header comment
src/jit/codegenxarch.cpp
Outdated
@@ -3112,7 +3112,7 @@ void CodeGen::genCodeForCpBlkRepMovs(GenTreeBlk* cpBlkNode) | |||
else | |||
#endif | |||
{ | |||
#ifdef _TARGET_X64_ | |||
#ifdef _TARGET_AMD64_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's up with this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Forgot that was in there.
src/jit/emitxarch.cpp
Outdated
case IF_RWR_ARD: | ||
|
||
// Can't rely on a "small" movsx as we will over-extend to 8 bytes | ||
return (id->idIns() != INS_movsx) && (id->idReg1() == reg) && (id->idOpSize() != EA_8BYTE); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The size check doesn't seem right - x64 only zero extends 32 bit register writes, mov al, 42
does not zero extend. So the check should be == EA_4BYTE
, though that will probably require special casing for movzx
where size has a different meaning.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point.
For movzx
you mean allow both 4 and 8 byte sizes...?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For movzx
and movsx
the size indicates the source operand size - 1 or 2 bytes - rather than the destination operand size. So movzx
always zeroes out the upper 32 bit, no matter the size.
It's a bit unfortunate. I'm considering changing movzx/movsx to movzxb/movzxw/movsxb/movsxw in the future, that might simplify things and perhaps allow more code to be shared with ARM.
Updated. Generates same diffs as the first version. |
OSX jenkins hangup, retrying @dotnet-bot retest OSX10.12 x64 Checked Innerloop Build and Test |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks Good
…2454) Add a peephole optimization to suppress emitting zero extending moves if the previous instruction has already done a suitable zero extension. Only implemented for x64 currently. Closes dotnet/coreclr#21923 Commit migrated from dotnet/coreclr@d5f638a
Add a peephole optimization to suppress emitting zero extending moves
if the previous instruction has already done a suitable zero extension.
Only implemented for x64 currently.
Closes #21923