Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

JIT: Suppress emitting same-reg zero extending move #22454

Merged
merged 2 commits into from
Feb 8, 2019

Conversation

AndyAyersMS
Copy link
Member

Add a peephole optimization to suppress emitting zero extending moves
if the previous instruction has already done a suitable zero extension.

Only implemented for x64 currently.

Closes #21923

Add a peephole optimization to suppress emitting zero extending moves
if the previous instruction has already done a suitable zero extension.

Only implemented for x64 currently.

Closes #21923
@AndyAyersMS
Copy link
Member Author

@BruceForstall PTAL
cc @dotnet/jit-contrib

Jit diffs shows:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -3488 (-0.01% of base)
    diff is an improvement.
Top file improvements by size (bytes):
       -1192 : System.Private.CoreLib.dasm (-0.03% of base)
        -348 : System.Memory.dasm (-0.22% of base)
        -228 : System.Net.Http.dasm (-0.04% of base)
        -208 : System.Net.NetworkInformation.dasm (-0.45% of base)
        -168 : System.Runtime.Numerics.dasm (-0.23% of base)
51 total files with size differences (51 improved, 0 regressed), 78 unchanged.
Top method improvements by size (bytes):
         -44 (-0.62% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (5 methods)
         -36 (-0.94% of base) : System.Memory.dasm - SequenceReader`1:GetNextSpan():this (4 methods)
         -32 (-0.85% of base) : System.Memory.dasm - SequenceReader`1:ResetReader():this (4 methods)
         -32 (-0.72% of base) : System.Memory.dasm - SequenceReader`1:IsNextSlow(struct,bool):bool:this (4 methods)
         -31 (-0.90% of base) : System.Memory.dasm - BuffersExtensions:CopyToMultiSegment(byref,struct) (5 methods)
Top method improvements by size (percentage):
          -2 (-20.00% of base) : System.Private.CoreLib.dasm - UInt32:System.IConvertible.ToInt64(ref):long:this
          -2 (-20.00% of base) : System.Private.CoreLib.dasm - UInt32:System.IConvertible.ToUInt64(ref):long:this
          -4 (-18.18% of base) : System.IO.FileSystem.dasm - FILE_TIME:ToTicks():long:this
          -2 (-18.18% of base) : System.Net.HttpListener.dasm - HttpListenerTimeoutManager:get_MinSendBytesPerSecond():long:this
          -2 (-18.18% of base) : System.Net.NetworkInformation.dasm - SystemIcmpV4Statistics:get_MessagesSent():long:this
868 total methods with size differences (868 improved, 0 regressed), 192460 unchanged.

sample diff (on example from #21923)

 G_M29305_IG02:
        mov      rax, bword ptr [rdx]
        mov      edx, dword ptr [rdx+8]
-       mov      edx, edx
        shl      rdx, 2
        cmp      rdx, 0xD1FFAB1E
        ja       SHORT G_M29305_IG04

@benaadams
Copy link
Member

@benaadams
Copy link
Member

Although that might need a double look back :-/

@AndyAyersMS
Copy link
Member Author

We'd get 2 out of 3, not too bad.

 movzx  r8d,word ptr [rax]  
 add    rax,2  
 mov    r8d,r8d                 ; pointless mov

Looking back further is certainly an option. The first prototype (which was almost certainly a bit too aggressive) was getting about 4x as many cases -- it assumed the upstream producer would properly zero extend.

@stephentoub
Copy link
Member

Thanks, Andy.

@@ -3112,7 +3112,7 @@ void CodeGen::genCodeForCpBlkRepMovs(GenTreeBlk* cpBlkNode)
else
#endif
{
#ifdef _TARGET_X64_
#ifdef _TARGET_AMD64_
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated; separate PR?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed it from this PR.

@@ -6426,6 +6428,14 @@ void CodeGen::genIntToIntCast(GenTreeCast* cast)
break;
#ifdef _TARGET_64BIT_
case GenIntCastDesc::ZERO_EXTEND_INT:
// We can skip emitting this zero extending move if the previous instruction zero extended implicitly
if ((srcReg == dstReg) && (emit->emitCurIGinsCnt > 0) && compiler->opts.OptimizationEnabled())
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not fond of leaking emitLastIns out of the emitter. Maybe this should call a function like emitIsLastInsCall(), say emitIsLastInsZeroExtendingWrite(srcReg). Then this code could be:

canSkip = (srcReg == dstReg) && compiler->opts.OptimizationEnabled() && emit->emitIsLastInsZeroExtendingWrite(srcReg);

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure, let me update.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

emitAreUpper32bitZero might be better, it doesn't matter how they ended up being zero, just that they are.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree -- I'll go with that.

@@ -145,6 +145,31 @@ bool emitter::IsDstSrcSrcAVXInstruction(instruction ins)
return ((CodeGenInterface::instInfo[ins] & INS_Flags_IsDstSrcSrcAVXInstruction) != 0) && IsAVXInstruction(ins);
}

bool emitter::doesZeroExtendingWrite(instrDesc* id, regNumber reg)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs a function header comment

@@ -3112,7 +3112,7 @@ void CodeGen::genCodeForCpBlkRepMovs(GenTreeBlk* cpBlkNode)
else
#endif
{
#ifdef _TARGET_X64_
#ifdef _TARGET_AMD64_
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's up with this change?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Forgot that was in there.

case IF_RWR_ARD:

// Can't rely on a "small" movsx as we will over-extend to 8 bytes
return (id->idIns() != INS_movsx) && (id->idReg1() == reg) && (id->idOpSize() != EA_8BYTE);
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The size check doesn't seem right - x64 only zero extends 32 bit register writes, mov al, 42 does not zero extend. So the check should be == EA_4BYTE, though that will probably require special casing for movzx where size has a different meaning.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.

For movzx you mean allow both 4 and 8 byte sizes...?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For movzx and movsx the size indicates the source operand size - 1 or 2 bytes - rather than the destination operand size. So movzx always zeroes out the upper 32 bit, no matter the size.

It's a bit unfortunate. I'm considering changing movzx/movsx to movzxb/movzxw/movsxb/movsxw in the future, that might simplify things and perhaps allow more code to be shared with ARM.

@AndyAyersMS
Copy link
Member Author

Updated. Generates same diffs as the first version.

@AndyAyersMS
Copy link
Member Author

OSX jenkins hangup, retrying

@dotnet-bot retest OSX10.12 x64 Checked Innerloop Build and Test

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks Good

@AndyAyersMS AndyAyersMS merged commit d5f638a into dotnet:master Feb 8, 2019
@AndyAyersMS AndyAyersMS deleted the SkipZeroExtendingSameRegMov branch February 8, 2019 16:11
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…2454)

Add a peephole optimization to suppress emitting zero extending moves
if the previous instruction has already done a suitable zero extension.

Only implemented for x64 currently.

Closes dotnet/coreclr#21923


Commit migrated from dotnet/coreclr@d5f638a
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants