JIT: Suppress emitting same-reg zero extending move #22454

AndyAyersMS · 2019-02-06T23:41:27Z

Add a peephole optimization to suppress emitting zero extending moves
if the previous instruction has already done a suitable zero extension.

Only implemented for x64 currently.

Closes #21923

Add a peephole optimization to suppress emitting zero extending moves if the previous instruction has already done a suitable zero extension. Only implemented for x64 currently. Closes #21923

AndyAyersMS · 2019-02-06T23:45:02Z

@BruceForstall PTAL
cc @dotnet/jit-contrib

Jit diffs shows:

PMI Diffs for System.Private.CoreLib.dll, framework assemblies for x64 default jit
Summary:
(Lower is better)
Total bytes of diff: -3488 (-0.01% of base)
    diff is an improvement.
Top file improvements by size (bytes):
       -1192 : System.Private.CoreLib.dasm (-0.03% of base)
        -348 : System.Memory.dasm (-0.22% of base)
        -228 : System.Net.Http.dasm (-0.04% of base)
        -208 : System.Net.NetworkInformation.dasm (-0.45% of base)
        -168 : System.Runtime.Numerics.dasm (-0.23% of base)
51 total files with size differences (51 improved, 0 regressed), 78 unchanged.
Top method improvements by size (bytes):
         -44 (-0.62% of base) : System.Memory.dasm - ReadOnlySequenceDebugView`1:.ctor(struct):this (5 methods)
         -36 (-0.94% of base) : System.Memory.dasm - SequenceReader`1:GetNextSpan():this (4 methods)
         -32 (-0.85% of base) : System.Memory.dasm - SequenceReader`1:ResetReader():this (4 methods)
         -32 (-0.72% of base) : System.Memory.dasm - SequenceReader`1:IsNextSlow(struct,bool):bool:this (4 methods)
         -31 (-0.90% of base) : System.Memory.dasm - BuffersExtensions:CopyToMultiSegment(byref,struct) (5 methods)
Top method improvements by size (percentage):
          -2 (-20.00% of base) : System.Private.CoreLib.dasm - UInt32:System.IConvertible.ToInt64(ref):long:this
          -2 (-20.00% of base) : System.Private.CoreLib.dasm - UInt32:System.IConvertible.ToUInt64(ref):long:this
          -4 (-18.18% of base) : System.IO.FileSystem.dasm - FILE_TIME:ToTicks():long:this
          -2 (-18.18% of base) : System.Net.HttpListener.dasm - HttpListenerTimeoutManager:get_MinSendBytesPerSecond():long:this
          -2 (-18.18% of base) : System.Net.NetworkInformation.dasm - SystemIcmpV4Statistics:get_MessagesSent():long:this
868 total methods with size differences (868 improved, 0 regressed), 192460 unchanged.

sample diff (on example from #21923)

 G_M29305_IG02:
        mov      rax, bword ptr [rdx]
        mov      edx, dword ptr [rdx+8]
-       mov      edx, edx
        shl      rdx, 2
        cmp      rdx, 0xD1FFAB1E
        ja       SHORT G_M29305_IG04

benaadams · 2019-02-06T23:48:45Z

Probably fixes https://github.com/dotnet/coreclr/issues/17963 also?

benaadams · 2019-02-06T23:50:08Z

Although that might need a double look back :-/

AndyAyersMS · 2019-02-07T00:10:49Z

We'd get 2 out of 3, not too bad.

 movzx  r8d,word ptr [rax]  
 add    rax,2  
 mov    r8d,r8d                 ; pointless mov

Looking back further is certainly an option. The first prototype (which was almost certainly a bit too aggressive) was getting about 4x as many cases -- it assumed the upstream producer would properly zero extend.

stephentoub · 2019-02-07T01:51:25Z

Thanks, Andy.

BruceForstall · 2019-02-07T01:35:49Z

src/jit/codegenxarch.cpp

@@ -3112,7 +3112,7 @@ void CodeGen::genCodeForCpBlkRepMovs(GenTreeBlk* cpBlkNode)
    else
 #endif
    {
-#ifdef _TARGET_X64_
+#ifdef _TARGET_AMD64_


This looks unrelated; separate PR?

Removed it from this PR.

BruceForstall · 2019-02-07T01:38:26Z

src/jit/codegenxarch.cpp

@@ -6426,6 +6428,14 @@ void CodeGen::genIntToIntCast(GenTreeCast* cast)
                break;
 #ifdef _TARGET_64BIT_
            case GenIntCastDesc::ZERO_EXTEND_INT:
+                // We can skip emitting this zero extending move if the previous instruction zero extended implicitly
+                if ((srcReg == dstReg) && (emit->emitCurIGinsCnt > 0) && compiler->opts.OptimizationEnabled())


I'm not fond of leaking emitLastIns out of the emitter. Maybe this should call a function like emitIsLastInsCall(), say emitIsLastInsZeroExtendingWrite(srcReg). Then this code could be:

canSkip = (srcReg == dstReg) && compiler->opts.OptimizationEnabled() && emit->emitIsLastInsZeroExtendingWrite(srcReg);

Sure, let me update.

emitAreUpper32bitZero might be better, it doesn't matter how they ended up being zero, just that they are.

Agree -- I'll go with that.

BruceForstall · 2019-02-07T01:58:23Z

src/jit/emitxarch.cpp

@@ -145,6 +145,31 @@ bool emitter::IsDstSrcSrcAVXInstruction(instruction ins)
    return ((CodeGenInterface::instInfo[ins] & INS_Flags_IsDstSrcSrcAVXInstruction) != 0) && IsAVXInstruction(ins);
 }

+bool emitter::doesZeroExtendingWrite(instrDesc* id, regNumber reg)


Needs a function header comment

mikedn · 2019-02-07T02:04:25Z

src/jit/codegenxarch.cpp

@@ -3112,7 +3112,7 @@ void CodeGen::genCodeForCpBlkRepMovs(GenTreeBlk* cpBlkNode)
    else
 #endif
    {
-#ifdef _TARGET_X64_
+#ifdef _TARGET_AMD64_


What's up with this change?

Forgot that was in there.

mikedn · 2019-02-07T02:13:04Z

src/jit/emitxarch.cpp

+        case IF_RWR_ARD:
+
+            // Can't rely on a "small" movsx as we will over-extend to 8 bytes
+            return (id->idIns() != INS_movsx) && (id->idReg1() == reg) && (id->idOpSize() != EA_8BYTE);


The size check doesn't seem right - x64 only zero extends 32 bit register writes, mov al, 42 does not zero extend. So the check should be == EA_4BYTE, though that will probably require special casing for movzx where size has a different meaning.

Good point.

For movzx you mean allow both 4 and 8 byte sizes...?

For movzx and movsx the size indicates the source operand size - 1 or 2 bytes - rather than the destination operand size. So movzx always zeroes out the upper 32 bit, no matter the size.

It's a bit unfortunate. I'm considering changing movzx/movsx to movzxb/movzxw/movsxb/movsxw in the future, that might simplify things and perhaps allow more code to be shared with ARM.

AndyAyersMS · 2019-02-07T03:31:21Z

Updated. Generates same diffs as the first version.

AndyAyersMS · 2019-02-07T04:46:23Z

OSX jenkins hangup, retrying

@dotnet-bot retest OSX10.12 x64 Checked Innerloop Build and Test

BruceForstall

LGTM

briansull

Looks Good

…2454) Add a peephole optimization to suppress emitting zero extending moves if the previous instruction has already done a suitable zero extension. Only implemented for x64 currently. Closes dotnet/coreclr#21923 Commit migrated from dotnet/coreclr@d5f638a

JIT: Suppress emitting same-reg zero extending move

b00ab74

Add a peephole optimization to suppress emitting zero extending moves if the previous instruction has already done a suitable zero extension. Only implemented for x64 currently. Closes #21923

BruceForstall suggested changes Feb 7, 2019

View reviewed changes

mikedn reviewed Feb 7, 2019

View reviewed changes

review feedback

ffa24ee

This was referenced Feb 7, 2019

JIT: fix ifdef guarding an assert #22460

Merged

WIP: Peephole optimize redundant and shuffle moves #21959

Closed

BruceForstall approved these changes Feb 7, 2019

View reviewed changes

briansull approved these changes Feb 7, 2019

View reviewed changes

AndyAyersMS merged commit d5f638a into dotnet:master Feb 8, 2019

AndyAyersMS deleted the SkipZeroExtendingSameRegMov branch February 8, 2019 16:11

AndyAyersMS mentioned this pull request Apr 3, 2019

[WIP] Performance improvements to Span.Slice #23665

Closed

GrabYourPitchforks mentioned this pull request Jan 31, 2020

JIT should suppress zero-extending same-register moves in more scenarios dotnet/runtime#12402

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Suppress emitting same-reg zero extending move #22454

JIT: Suppress emitting same-reg zero extending move #22454

AndyAyersMS commented Feb 6, 2019

AndyAyersMS commented Feb 6, 2019

benaadams commented Feb 6, 2019

benaadams commented Feb 6, 2019

AndyAyersMS commented Feb 7, 2019

stephentoub commented Feb 7, 2019

BruceForstall Feb 7, 2019

AndyAyersMS Feb 7, 2019

AndyAyersMS Feb 7, 2019

BruceForstall Feb 7, 2019

AndyAyersMS Feb 7, 2019

mikedn Feb 7, 2019

AndyAyersMS Feb 7, 2019

BruceForstall Feb 7, 2019

mikedn Feb 7, 2019

AndyAyersMS Feb 7, 2019

mikedn Feb 7, 2019

AndyAyersMS Feb 7, 2019

mikedn Feb 7, 2019

AndyAyersMS commented Feb 7, 2019

AndyAyersMS commented Feb 7, 2019

BruceForstall left a comment

briansull left a comment

JIT: Suppress emitting same-reg zero extending move #22454

JIT: Suppress emitting same-reg zero extending move #22454

Conversation

AndyAyersMS commented Feb 6, 2019

AndyAyersMS commented Feb 6, 2019

benaadams commented Feb 6, 2019

benaadams commented Feb 6, 2019

AndyAyersMS commented Feb 7, 2019

stephentoub commented Feb 7, 2019

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

AndyAyersMS commented Feb 7, 2019

AndyAyersMS commented Feb 7, 2019

BruceForstall left a comment

Choose a reason for hiding this comment

briansull left a comment

Choose a reason for hiding this comment