Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow folding of aligned loads when using the VEX encoding and optimizations are enabled #376

Merged
merged 1 commit into from
Jan 2, 2020

Conversation

tannergooding
Copy link
Member

This resolves https://github.com/dotnet/corefx/issues/33566 by updating the JIT to support folding of aligned loads when using the VEX encoding and when optimizations are enabled.

@tannergooding
Copy link
Member Author

Working on grabbing some diffs.

@sandreenko
Copy link
Contributor

PTAL @dotnet/jit-contrib

I think it should be labeled as area-Codegen.

@BruceForstall BruceForstall added area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI and removed area-System.Runtime.Intrinsics labels Dec 18, 2019
Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. I find the conditions a bit confusing, but I can't offhand think of a better way to describe these except maybe to change canContainXXLoads instead of supportsXXLoads, but I'm not sure that's a great deal clearer, and this was already the case before this change.

@tannergooding
Copy link
Member Author

No changes, just rebased onto current head.

No diffs for corelib, framework, or benchmarks. There are a number of diffs for the tests:

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -2032 (-0.00% of base)
    diff is an improvement.

Top file improvements (bytes):
        -556 : JIT\HardwareIntrinsics\X86\Sse2\Sse2_ro\Sse2_ro.dasm (-0.03% of base)
        -464 : JIT\HardwareIntrinsics\X86\Avx2\Avx2_ro\Avx2_ro.dasm (-0.02% of base)
        -316 : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm (-0.04% of base)
        -204 : JIT\HardwareIntrinsics\X86\Sse41\Sse41_ro\Sse41_ro.dasm (-0.03% of base)
        -128 : JIT\HardwareIntrinsics\X86\Ssse3\Ssse3_ro\Ssse3_ro.dasm (-0.05% of base)
         -88 : JIT\HardwareIntrinsics\X86\Sse\Sse_ro\Sse_ro.dasm (-0.02% of base)
         -48 : JIT\HardwareIntrinsics\X86\Fma_Vector128\Fma_ro\Fma_ro.dasm (-0.02% of base)
         -48 : JIT\HardwareIntrinsics\X86\Fma_Vector256\Fma_ro\Fma_ro.dasm (-0.03% of base)
         -40 : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm (-0.08% of base)
         -36 : JIT\HardwareIntrinsics\X86\Avx2_Vector128\Avx2_ro\Avx2_ro.dasm (-0.02% of base)
         -32 : JIT\HardwareIntrinsics\X86\Avx\ConvertToVector_ro\ConvertToVector_ro.dasm (-0.08% of base)
         -24 : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm (-0.06% of base)
         -24 : JIT\HardwareIntrinsics\X86\Sse3\Sse3_ro\Sse3_ro.dasm (-0.05% of base)
          -8 : JIT\HardwareIntrinsics\X86\Avx_Vector128\Avx_ro\Avx_ro.dasm (-0.02% of base)
          -4 : JIT\HardwareIntrinsics\X86\Sse41\MinHorizontal_ro\MinHorizontal_ro.dasm (-0.04% of base)
          -4 : JIT\HardwareIntrinsics\X86\Sse41\MultipleSumAbsoluteDifferences_ro\MultipleSumAbsoluteDifferences_ro.dasm (-0.04% of base)
          -4 : JIT\HardwareIntrinsics\X86\Sse42\Sse42_ro\Sse42_ro.dasm (-0.04% of base)
          -4 : JIT\Regression\JitBlue\GitHub_19550\GitHub_19550\GitHub_19550.dasm (-0.37% of base)

18 total files with Code Size differences (18 improved, 0 regressed), 3222 unchanged.

Top method improvements (bytes):
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__DecryptByte:RunBasicScenario_LoadAligned():this
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__DecryptLastByte:RunBasicScenario_LoadAligned():this
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__EncryptByte:RunBasicScenario_LoadAligned():this
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__EncryptLastByte:RunBasicScenario_LoadAligned():this
          -4 (-3.45% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesImmOpTest__KeygenAssistByte5:RunBasicScenario_LoadAligned():this
          -4 (-3.48% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesUnaryOpTest__InverseMixColumnsByte:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AddDouble:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AddSingle:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AddSubtractDouble:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AddSubtractSingle:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AndDouble:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AndSingle:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AndNotDouble:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__AndNotSingle:RunBasicScenario_LoadAligned():this
          -4 (-2.52% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleUnaryOpTest__CeilingDouble:RunBasicScenario_LoadAligned():this
          -4 (-2.52% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleUnaryOpTest__CeilingSingle:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__DivideDouble:RunBasicScenario_LoadAligned():this
          -4 (-1.85% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleBinaryOpTest__DivideSingle:RunBasicScenario_LoadAligned():this
          -4 (-2.55% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleUnaryOpTest__DuplicateEvenIndexedDouble:RunBasicScenario_LoadAligned():this
          -4 (-2.55% of base) : JIT\HardwareIntrinsics\X86\Avx\Avx_ro\Avx_ro.dasm - SimpleUnaryOpTest__DuplicateEvenIndexedSingle:RunBasicScenario_LoadAligned():this

Top method improvements (percentages):
          -4 (-3.54% of base) : JIT\HardwareIntrinsics\X86\Sse41\MinHorizontal_ro\MinHorizontal_ro.dasm - SimpleUnaryOpTest__MinHorizontal:RunBasicScenario_LoadAligned():this
          -4 (-3.48% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesUnaryOpTest__InverseMixColumnsByte:RunBasicScenario_LoadAligned():this
          -4 (-3.48% of base) : JIT\HardwareIntrinsics\X86\Avx\ConvertToVector_ro\ConvertToVector_ro.dasm - SimpleUnaryOpTest__ConvertToVector128Int32Double:RunBasicScenario_LoadAligned():this
          -4 (-3.48% of base) : JIT\HardwareIntrinsics\X86\Avx\ConvertToVector_ro\ConvertToVector_ro.dasm - SimpleUnaryOpTest__ConvertToVector128SingleDouble:RunBasicScenario_LoadAligned():this
          -4 (-3.48% of base) : JIT\HardwareIntrinsics\X86\Avx\ConvertToVector_ro\ConvertToVector_ro.dasm - SimpleUnaryOpTest__ConvertToVector128Int32WithTruncationDouble:RunBasicScenario_LoadAligned():this
          -4 (-3.45% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesImmOpTest__KeygenAssistByte5:RunBasicScenario_LoadAligned():this
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__DecryptByte:RunBasicScenario_LoadAligned():this
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__DecryptLastByte:RunBasicScenario_LoadAligned():this
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__EncryptByte:RunBasicScenario_LoadAligned():this
          -4 (-3.15% of base) : JIT\HardwareIntrinsics\X86\Aes\Aes_ro\Aes_ro.dasm - AesBinaryOpTest__EncryptLastByte:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyInt640:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyInt641:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyInt6416:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyInt6417:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyInt64129:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyUInt640:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyUInt641:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyUInt6416:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyUInt6417:RunBasicScenario_LoadAligned():this
          -4 (-3.12% of base) : JIT\HardwareIntrinsics\X86\Pclmulqdq\Pclmulqdq_ro\Pclmulqdq_ro.dasm - PclmulqdqOpTest__CarrylessMultiplyUInt64129:RunBasicScenario_LoadAligned():this

In all cases the diffs are things like:

-       vmovapd  xmm0, xmmword ptr [rax]
-       vaddpd   xmm6, xmm6, xmm0
+       vaddpd   xmm6, xmm6, xmmword ptr [rax]

-- NOTE: I had to make a few modifications to jitutils to work around: dotnet/jitutils#243

@tannergooding
Copy link
Member Author

Are the arm failures known? Only lowerxarch was changed, so this shouldn't impact any arm stuff and should be otherwise ready to merge.

@CarolEidt
Copy link
Contributor

Are the arm failures known?

Yes, the arm CI has been broken for some time now. #1097 is tracking this.

@CarolEidt CarolEidt merged commit bed5e44 into dotnet:master Jan 2, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 11, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants