Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Improve init/copy block codegen #21711

Merged
merged 1 commit into from
Nov 5, 2019
Merged

Conversation

mikedn
Copy link

@mikedn mikedn commented Dec 29, 2018

This improved block op codegen by better containment of the source/destination address in the unrolled case:

  • Contain a local destination address on block init. This was already done for block copy.
  • Enable block init unroll on ARM32 (it was already enabled on ARM64 and the differences between the 2 architectures are trivial). (Done in Enable block init unroll on ARM32 #27450)
  • Fix block copy local address containment on ARM. Block store lowering didn't mark local addresses as contained yet the codegen did containment on its own by ignoring the address computed in a register and instead emitting load/stores using the frame pointer register directly. (Done in Fix block store local address containment on ARM #27338)
  • Form and contain address modes (on both XARCH and ARM).

Examples:

; obj1.val = obj2.val
       4983C018             add      r8, 24
       4883C218             add      rdx, 24
       C4C17A6F00           vmovdqu  xmm0, qword ptr [r8]
       C5FA7F02             vmovdqu  qword ptr [rdx], xmm0
       418B4010             mov      eax, dword ptr [r8+16]
       894210               mov      dword ptr [rdx+16], eax

With this change the following code is generated instead:

       C4C17A6F4018         vmovdqu  xmm0, xmmword ptr [r8+24]
       C5FA7F4218           vmovdqu  xmmword ptr [rdx+24], xmm0
       418B4028             mov      eax, dword ptr [r8+40]
       894228               mov      dword ptr [rdx+40], eax

This also applies to array elements and local variables. For example, initializing a struct local var may currently generate:

       33C9                 xor      rcx, rcx
       488D442420           lea      rax, bword ptr [rsp+20H]
       C5F857C0             vxorps   xmm0, xmm0
       C5FA7F00             vmovdqu  qword ptr [rax], xmm0
       894810               mov      dword ptr [rax+16], ecx

After this change the generated code is:

       33C9                 xor      ecx, ecx
       C5F857C0             vxorps   xmm0, xmm0
       C5FA7F442420         vmovdqu  xmmword ptr [rsp+20H], xmm0
       894C2430             mov      dword ptr [rsp+30H], ecx

@mikedn mikedn force-pushed the init-blk-contain branch 3 times, most recently from a4a2204 to 512b6ca Compare January 2, 2019 09:20
@mikedn mikedn changed the title [WIP] Improve local block init codegen [WIP] Improve init/copy block codegen Jan 2, 2019
@mikedn
Copy link
Author

mikedn commented Jan 2, 2019

x64:

Total bytes of diff: -75078 (-0.27% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          70 : System.Collections.Immutable.dasm (0.03% of base)
           5 : NuGet.Protocol.Core.Types.dasm (0.03% of base)
           5 : System.Collections.Concurrent.dasm (0.01% of base)
           2 : System.Security.Claims.dasm (0.01% of base)
Top file improvements by size (bytes):
      -51950 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-1.54% of base)
       -3449 : System.Private.CoreLib.dasm (-0.11% of base)
       -3205 : System.Private.Xml.dasm (-0.10% of base)
       -1853 : System.Linq.Expressions.dasm (-0.07% of base)
       -1520 : System.Linq.Parallel.dasm (-0.26% of base)
87 total files with size differences (83 improved, 4 regressed), 42 unchanged.
Top method regressions by size (bytes):
         130 ( 1.02% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
          99 ( 4.58% of base) : System.Collections.Immutable.dasm - ImmutableExtensions:GetEnumerableDisposable(ref):struct (5 methods)
          88 ( 2.36% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - ConstraintsHelper:RemoveDirectConstraintConflicts(ref,struct,ref,int,ref):struct
          72 ( 2.72% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - SyntaxFactory:.cctor() (2 methods)
          62 ( 3.86% of base) : System.Private.CoreLib.dasm - CustomAttributeData:.ctor(ref,struct,byref):this
Top method improvements by size (bytes):
       -2812 (-2.76% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(ref,ref):this
       -1548 (-0.14% of base) : System.Linq.Expressions.dasm - FuncCallInstruction`3:Run(ref):int:this (3375 methods)
       -1370 (-2.84% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(ref,ref):this
       -1094 (-3.01% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrPrivateTraceEventParser:EnumerateTemplates(ref,ref):this
        -968 (-2.90% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(ref,ref):this
Top method regressions by size (percentage):
          16 (28.57% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - EventPipeEventSource:ResetCompressedHeader():this
          39 (11.14% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - OverloadResolutionResult:GetBestResult(struct):struct
           2 ( 8.33% of base) : NuGet.Protocol.Core.Types.dasm - SourceCacheContext:set_ListMaxAge(struct):this
           2 ( 8.33% of base) : NuGet.Protocol.Core.Types.dasm - SourceCacheContext:set_NupkgMaxAge(struct):this
           2 ( 8.33% of base) : NuGet.Protocol.Core.Types.dasm - ClonedPackageSearchMetadata:set_Published(struct):this
Top method improvements by size (percentage):
         -13 (-38.24% of base) : System.Net.Http.dasm - RangeItemHeaderValue:.ctor(ref):this
         -21 (-31.34% of base) : System.Net.Http.dasm - ContentRangeHeaderValue:.ctor(ref):this
         -14 (-24.56% of base) : System.Private.CoreLib.dasm - REG_TZI_FORMAT:.ctor(byref):this
          -3 (-23.08% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - StartStopActivity:set_ActivityID(struct):this
          -3 (-23.08% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CtfTrace:set_UUID(struct):this
10318 total methods with size differences (9869 improved, 449 regressed), 139347 unchanged.

x86:

Total bytes of diff: -38332 (-0.17% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          64 : System.Text.RegularExpressions.dasm (0.05% of base)
          40 : System.Reflection.Metadata.dasm (0.01% of base)
          39 : System.Net.Mail.dasm (0.03% of base)
          10 : System.Threading.Tasks.Dataflow.dasm (0.01% of base)
           8 : System.Diagnostics.DiagnosticSource.dasm (0.04% of base)
Top file improvements by size (bytes):
      -25099 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.92% of base)
       -2766 : System.Private.CoreLib.dasm (-0.11% of base)
       -2368 : System.Private.Xml.dasm (-0.10% of base)
       -1214 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.07% of base)
        -954 : System.Data.Common.dasm (-0.10% of base)
85 total files with size differences (69 improved, 16 regressed), 44 unchanged.
Top method regressions by size (bytes):
        1062 ( 2.57% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - CtfTraceEventSource:InitEventMap():ref
         297 ( 1.85% of base) : System.Net.Http.dasm - Huffman:.cctor()
         283 ( 1.79% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this
         130 ( 1.56% of base) : System.Private.Xml.dasm - BigNumber:.cctor()
          84 ( 0.67% of base) : System.Text.RegularExpressions.dasm - RegexCharClass:.cctor()
Top method improvements by size (bytes):
        -624 (-2.17% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(ref,ref):this
        -592 (-1.97% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrPrivateTraceEventParser:EnumerateTemplates(ref,ref):this
        -520 (-2.46% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - AspNetTraceEventParser:EnumerateTemplates(ref,ref):this
        -505 (-21.67% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceEventSession:SetStackTraceIds(int,int,int):int
        -502 (-1.37% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(ref,ref):this
Top method regressions by size (percentage):
           4 (23.53% of base) : Microsoft.CodeAnalysis.dasm - <GetTypeDefsOrThrow>d__72:System.Collections.Generic.IEnumerator<Microsoft.CodeAnalysis.PEModule.TypeDefToNamespace>.get_Current():struct:this
           4 (23.53% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - EmbeddedTreeLocation:get_PossiblyEmbeddedOrMySourceSpan():struct:this
           4 (23.53% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MyTemplateLocation:get_PossiblyEmbeddedOrMySourceSpan():struct:this
           8 (23.53% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceProcess:get_ExitStatus():struct:this (2 methods)
           4 (23.53% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - TraceLog:get_UTCOffsetMinutes():struct:this
Top method improvements by size (percentage):
         -10 (-32.26% of base) : System.Net.Http.dasm - RangeItemHeaderValue:.ctor(ref):this
         -14 (-24.56% of base) : System.Net.Http.dasm - ContentRangeHeaderValue:.ctor(ref):this
         -16 (-23.19% of base) : System.IO.FileSystem.AccessControl.dasm - WIN32_FILE_ATTRIBUTE_DATA:PopulateFrom(byref):this
         -16 (-23.19% of base) : System.IO.FileSystem.dasm - WIN32_FILE_ATTRIBUTE_DATA:PopulateFrom(byref):this
         -16 (-23.19% of base) : System.Private.CoreLib.dasm - WIN32_FILE_ATTRIBUTE_DATA:PopulateFrom(byref):this
7747 total methods with size differences (7259 improved, 488 regressed), 141963 unchanged.

arm64:

Total bytes of diff: -149344 (-0.14% of base)
    diff is an improvement.
Top file improvements by size (bytes):
      -68576 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.62% of base)
      -16080 : System.Private.CoreLib.dasm (-0.15% of base)
       -8336 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.06% of base)
       -8160 : System.Private.Xml.dasm (-0.08% of base)
       -7824 : Microsoft.CodeAnalysis.CSharp.dasm (-0.06% of base)
89 total files with size differences (89 improved, 0 regressed), 40 unchanged.
Top method regressions by size (bytes):
         896 ( 0.55% of base) : Microsoft.CodeAnalysis.dasm - DesktopAssemblyIdentityComparer:.cctor() (4 methods)
         688 ( 1.45% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this (4 methods)
          64 ( 0.50% of base) : Microsoft.CodeAnalysis.CSharp.dasm - OverloadResolution:RemoveWorseMembers(ref,ref,byref):this (4 methods)
          16 ( 0.14% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Binder:AnalyzeAnonymousFunction(ref,ref):ref:this (4 methods)
          16 ( 0.44% of base) : System.Security.Cryptography.X509Certificates.dasm - Pkcs10CertificationRequestInfo:ToPkcs10Request(ref,struct):ref:this (2 methods)
Top method improvements by size (bytes):
       -1240 (-1.32% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(ref,ref):this (2 methods)
       -1216 (-4.20% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:.cctor() (2 methods)
       -1184 (-1.19% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrPrivateTraceEventParser:EnumerateTemplates(ref,ref):this (2 methods)
       -1040 (-1.64% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - AspNetTraceEventParser:EnumerateTemplates(ref,ref):this (2 methods)
        -992 (-0.77% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - KernelTraceEventParser:EnumerateTemplates(ref,ref):this (2 methods)
Top method regressions by size (percentage):
         688 ( 1.45% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this (4 methods)
         896 ( 0.55% of base) : Microsoft.CodeAnalysis.dasm - DesktopAssemblyIdentityComparer:.cctor() (4 methods)
          64 ( 0.50% of base) : Microsoft.CodeAnalysis.CSharp.dasm - OverloadResolution:RemoveWorseMembers(ref,ref,byref):this (4 methods)
          16 ( 0.44% of base) : System.Security.Cryptography.X509Certificates.dasm - Pkcs10CertificationRequestInfo:ToPkcs10Request(ref,struct):ref:this (2 methods)
          16 ( 0.14% of base) : Microsoft.CodeAnalysis.CSharp.dasm - Binder:AnalyzeAnonymousFunction(ref,ref):ref:this (4 methods)
Top method improvements by size (percentage):
         -32 (-33.33% of base) : System.Net.Http.dasm - RangeItemHeaderValue:.ctor(ref):this (2 methods)
         -48 (-31.58% of base) : System.Net.Http.dasm - ContentRangeHeaderValue:.ctor(ref):this (2 methods)
         -48 (-21.43% of base) : System.Net.Http.dasm - ContentRangeHeaderValue:System.ICloneable.Clone():ref:this (2 methods)
         -32 (-19.05% of base) : System.Net.Http.dasm - RangeItemHeaderValue:System.ICloneable.Clone():ref:this (2 methods)
         -16 (-15.38% of base) : System.Private.CoreLib.dasm - PropertyValue:.ctor(float):this (2 methods)
7894 total methods with size differences (7889 improved, 5 regressed), 141479 unchanged.

arm32:

Total bytes of diff: -72996 (-0.10% of base)
    diff is an improvement.
Top file regressions by size (bytes):
          52 : System.Runtime.Extensions.dasm (0.03% of base)
Top file improvements by size (bytes):
      -34256 : Microsoft.Diagnostics.Tracing.TraceEvent.dasm (-0.40% of base)
       -8952 : System.Private.CoreLib.dasm (-0.12% of base)
       -4308 : System.Private.Xml.dasm (-0.06% of base)
       -4024 : Microsoft.CodeAnalysis.VisualBasic.dasm (-0.04% of base)
       -3328 : Microsoft.CodeAnalysis.CSharp.dasm (-0.03% of base)
72 total files with size differences (71 improved, 1 regressed), 57 unchanged.
Top method regressions by size (bytes):
        1832 ( 0.97% of base) : Microsoft.CodeAnalysis.dasm - DesktopAssemblyIdentityComparer:.cctor() (4 methods)
         112 ( 0.04% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ApplicationServerTraceEventParser:EnumerateTemplates(ref,ref):this (2 methods)
          80 ( 0.16% of base) : System.Reflection.Metadata.dasm - MetadataReader:InitializeTableReaders(struct,ubyte,ref,ref):this (4 methods)
          72 ( 0.59% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Parser:ParseFromControlVars():struct:this (4 methods)
          72 ( 0.62% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Parser:ParseLetList():struct:this (4 methods)
Top method improvements by size (bytes):
       -1936 (-2.86% of base) : Microsoft.CodeAnalysis.dasm - AttributeDescription:.cctor() (4 methods)
        -680 (-5.72% of base) : System.Private.CoreLib.dasm - OpCodes:.cctor() (2 methods)
        -640 (-3.13% of base) : Microsoft.CSharp.dasm - PredefinedMembers:.cctor() (2 methods)
        -620 (-0.65% of base) : Microsoft.Diagnostics.Tracing.TraceEvent.dasm - ClrTraceEventParser:EnumerateTemplates(ref,ref):this (2 methods)
        -616 (-1.58% of base) : System.Private.Xml.dasm - XsdBuilder:.cctor() (2 methods)
Top method regressions by size (percentage):
           8 (16.67% of base) : Newtonsoft.Json.dasm - JsonSchema:get_Minimum():struct:this (2 methods)
           8 (16.67% of base) : Newtonsoft.Json.dasm - JsonSchema:get_Maximum():struct:this (2 methods)
           8 (16.67% of base) : System.Transactions.Local.dasm - TransactionStatePromotedNonMSDTCBase:get_Identifier(ref):struct:this (2 methods)
           8 (16.67% of base) : System.Transactions.Local.dasm - TransactionStatePromotedNonMSDTCEnded:get_Identifier(ref):struct:this (2 methods)
           8 (11.11% of base) : Newtonsoft.Json.dasm - JsonSchema:set_Minimum(struct):this (2 methods)
Top method improvements by size (percentage):
        -144 (-54.55% of base) : System.Net.Http.dasm - ContentRangeHeaderValue:.ctor(ref):this (2 methods)
         -88 (-52.38% of base) : System.IO.FileSystem.AccessControl.dasm - WIN32_FILE_ATTRIBUTE_DATA:PopulateFrom(byref):this (2 methods)
         -88 (-52.38% of base) : System.IO.FileSystem.dasm - WIN32_FILE_ATTRIBUTE_DATA:PopulateFrom(byref):this (2 methods)
         -88 (-52.38% of base) : System.Private.CoreLib.dasm - WIN32_FILE_ATTRIBUTE_DATA:PopulateFrom(byref):this (2 methods)
         -56 (-43.75% of base) : System.Net.Http.dasm - RangeItemHeaderValue:.ctor(ref):this (2 methods)
6943 total methods with size differences (6854 improved, 89 regressed), 142471 unchanged.

Not sure what Microsoft.Diagnostics.Tracing.TraceEvent is doing. Presumably it has a secret plan to take over the world by copying itself tens of thousands of times :)

@mikedn mikedn force-pushed the init-blk-contain branch 2 times, most recently from c3a516f to 6377fdd Compare January 4, 2019 10:30
@mikedn mikedn force-pushed the init-blk-contain branch 2 times, most recently from 4461b9b to db5d596 Compare June 9, 2019 08:47
@mikedn mikedn force-pushed the init-blk-contain branch 4 times, most recently from 448d186 to be19099 Compare June 18, 2019 15:15
@mikedn mikedn force-pushed the init-blk-contain branch 4 times, most recently from 81dde33 to 49dd036 Compare July 15, 2019 21:20
@mikedn mikedn force-pushed the init-blk-contain branch 4 times, most recently from 2985116 to 1b99f2d Compare August 4, 2019 16:25
@mikedn mikedn force-pushed the init-blk-contain branch 2 times, most recently from 801a7d6 to 04544c7 Compare September 7, 2019 13:39
@mikedn mikedn force-pushed the init-blk-contain branch 2 times, most recently from 3cf090a to 698b556 Compare September 9, 2019 22:22
@mikedn mikedn force-pushed the init-blk-contain branch 5 times, most recently from 4b694f9 to de44e29 Compare September 20, 2019 21:19
@mikedn mikedn force-pushed the init-blk-contain branch 3 times, most recently from e440d2f to c47445e Compare September 29, 2019 18:44
@mikedn mikedn force-pushed the init-blk-contain branch 2 times, most recently from 14604e7 to 44466c2 Compare October 2, 2019 21:19
@mikedn mikedn force-pushed the init-blk-contain branch 2 times, most recently from 2d91c84 to 65e840a Compare October 31, 2019 17:28
@mikedn mikedn changed the title [WIP] Improve init/copy block codegen Improve init/copy block codegen Oct 31, 2019
@mikedn
Copy link
Author

mikedn commented Oct 31, 2019

Whew, after a bunch of cleanup PRs I think this is done now. There are still CQ improvements that could be done but that's all I have at the moment.

@sandreenko sandreenko requested a review from a team October 31, 2019 20:08
@sandreenko
Copy link

Whew, after a bunch of cleanup PRs I think this is done now. There are still CQ improvements that could be done but that's all I have at the moment.

Could you please update the header with was is left in this PR?

@mikedn
Copy link
Author

mikedn commented Nov 2, 2019

Could you please update the header with was is left in this PR?

Right, it was still mentioning stuff done in other PRs or stuff that wasn't done at the time of its writing but that now is.

@sandreenko
Copy link

/azp run coreclr-outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 1 pipeline(s).

Copy link

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM with 1 question.

}
#endif

if (!IsSafeToContainMem(blkNode, addr))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From the description of IsSafeToContainMem it is not obvious that it allows non-immediate children, but from the code it is obvious that it does so calls like ContainBlockStoreAddress(blkNode, size, src->AsIndir()->Addr()); are correct.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, the parent/child naming doesn't make a lot of sense for IsSafeToContainMem. This more or less checks if it's safe to move a "childNode" forward to the "parentNode" point. The parent/child appeared because it was used for containment.

return;
}

if (!addr->OperIs(GT_ADD) || addr->gtOverflow() || !addr->AsOp()->gtGetOp2()->OperIs(GT_CNS_INT))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why TryCreateAddrMode can't be used here as it happens on XARCH?
I see that it does different checks, but it is unclear if this case is unique and other calls to TryCreateAddrMode on arm are correct or all other calls to ``TryCreateAddrModeneed to be changed to checkif ((size >= 2 * REGSIZE_BYTES) && (offset % REGSIZE_BYTES != 0))` for example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The address mode situation on ARM is pretty messy - TryCreateAddrMode (well, genCreateAddrMode really), was written for x86 and despite having tons of ifdefs in it (and I've rarely seen such an abuse of ifdefs) still generates address modes that aren't really valid on ARM and relies on the emitter to sort things out - for example:

if (addr->isContained())

And the emitter sorts things out only in certain cases (if you use emitInsLoadStoreOp and that only works with GT_IND nodes, does not support LDP/STP, requires temporary registers) so it's not suitable for block init/copy purposes.

So I preferred to avoid TryCreateAddrMode. It may be possible to use it to create an address mode node and then do extra checks before containing it. But there doesn't seem to be any point in doing that - it's a lot of extra code that needs to run only to produce a LEA that may not be useful and the original ADD should have been left alone.

@sandreenko sandreenko merged commit d572975 into dotnet:master Nov 5, 2019
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants