Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding basic support for recognizing and handling SIMD intrinsics as HW intrinsics #35421

Merged
merged 40 commits into from
May 5, 2020

Conversation

tannergooding
Copy link
Member

This makes progress towards #956 and is based on the prototype discussed and implemented here: #9766 (comment)

Rather than reimplementing the SIMD intrinsics in managed code or duplicating a lot of the HWIntrinsic support for containment and the VEX encoding on the SIMD intrinsics, this merely recognizes the SIMD Intrinsics in importation via an alternative path and replaces them with equivalent HWIntrinsic nodes.
This allows the SIMD intrinsics to freely get support for features that have already been added to the HWIntrinsics feature such as being VEX aware, supporting containment, and other minor optimizations that have been made.

This does not cover all of the SIMD intrinsics yet, but does lay a foundational framework for the remaining intrinsics to be ported as well. It does cover x86, x64, and ARM64.

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 24, 2020
@tannergooding
Copy link
Member Author

CC. @CarolEidt, @echesakovMSFT

Will post a jit diff shortly.

// Returns the codegen type for a given SIMD size.
var_types getSIMDTypeForSize(unsigned size)
static var_types getSIMDTypeForSize(unsigned size)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could also change GenTreeHWIntrinsic to track the SimdType rather than the SimdSize, but it is a more involved change.


if ((ni > NI_SIMD_AS_HWINTRINSIC_START) && (ni < NI_SIMD_AS_HWINTRINSIC_END))
{
return impSimdAsHWIntrinsic(ni, clsHnd, method, sig, mustExpand);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the intrinsic isn't currently handled or ends up returning nullptr, we will currently still hit the existing impSIMDIntrinsic path and produce a GT_SIMD node later.


GenTree* op1 = node->gtGetOp1();
GenTree* op2 = node->gtGetOp2();
GenTree* op3 = nullptr;

if (!HWIntrinsicInfo::SupportsContainment(intrinsicId))
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId) || (simdSize == 8) || (simdSize == 12))
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to support containment on simdSize == 12 if it is a local or one of the other cases we allocate 16-bytes of storage for it.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that morph will retype TYP_SIMD12 locals as TYP_SIMD16 where possible, so I think it's probably reasonable to assume it's not safe here (or fix the cases where we don't widen it if we should).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just noting, it is retyped but the simdSize isn't changed, and even if node is retyped, it doesn't mean op1 or op2 are retyped.
So it will require a few changes to get correct, but shouldn't be too difficult overall.

@tannergooding
Copy link
Member Author

tannergooding commented Apr 25, 2020

x64 Windows AVX2 Diff:

Found 271 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -741 (-0.00% of base)
    diff is an improvement.

Top file regressions (bytes):
          21 : diff\System.Text.Encodings.Web.dasm (0.06% of base)

Top file improvements (bytes):
        -754 : diff\System.Private.CoreLib.dasm (-0.02% of base)
          -4 : diff\System.Net.WebSockets.dasm (-0.01% of base)
          -4 : diff\System.Net.WebSockets.WebSocketProtocol.dasm (-0.01% of base)

4 total files with Code Size differences (3 improved, 1 regressed), 262 unchanged.

Top method regressions (bytes):
          14 ( 4.96% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           7 ( 7.22% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
           4 ( 0.33% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4
           4 ( 8.00% of base) : diff\System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           4 ( 8.70% of base) : diff\System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float

Top method improvements (bytes):
         -40 (-4.87% of base) : diff\System.Private.CoreLib.dasm - Vector:AndNot(Vector`1,Vector`1):Vector`1 (6 methods)
         -38 (-15.02% of base) : diff\System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -34 (-12.36% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -32 (-11.64% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-9.76% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_UnaryNegation(Vector`1):Vector`1 (6 methods)
         -20 (-4.07% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-3.18% of base) : diff\System.Private.CoreLib.dasm - Vector`1:ConditionalSelect(Vector`1,Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-2.88% of base) : diff\System.Private.CoreLib.dasm - Vector:OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-7.58% of base) : diff\System.Private.CoreLib.dasm - Vector:EqualsAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)
         -20 (-3.15% of base) : diff\System.Private.CoreLib.dasm - Vector:LessThanOrEqualAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)
         -20 (-3.15% of base) : diff\System.Private.CoreLib.dasm - Vector:GreaterThanOrEqualAny(Vector`1,Vector`1):bool (6 methods)

Top method regressions (percentages):
           4 ( 8.70% of base) : diff\System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float
           4 ( 8.00% of base) : diff\System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           7 ( 7.22% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
          14 ( 4.96% of base) : diff\System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           4 ( 0.33% of base) : diff\System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4

Top method improvements (percentages):
          -8 (-24.24% of base) : diff\System.Private.CoreLib.dasm - Vector4:Clamp(Vector4,Vector4,Vector4):Vector4
          -4 (-17.39% of base) : diff\System.Private.CoreLib.dasm - Vector4:op_UnaryNegation(Vector4):Vector4
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : diff\System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Add(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Subtract(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Multiply(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:Divide(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : diff\System.Private.CoreLib.dasm - Vector4:op_Multiply(Vector4,float):Vector4
          -4 (-16.00% of base) : diff\System.Private.CoreLib.dasm - Vector4:op_Multiply(float,Vector4):Vector4
          -2 (-15.38% of base) : diff\System.Private.CoreLib.dasm - Sse42:Crc32(int,ubyte):int
          -5 (-15.15% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_Multiply(int,Vector`1):Vector`1
         -38 (-15.02% of base) : diff\System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
          -4 (-14.81% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_Multiply(Vector`1,double):Vector`1
          -2 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Sse42:Crc32(int,ushort):int
          -4 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector`1:op_Multiply(double,Vector`1):Vector`1
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : diff\System.Private.CoreLib.dasm - Vector:Add(Vector`1,Vector`1):Vector`1 (6 methods)

60 total methods with Code Size differences (55 improved, 5 regressed), 244683 unchanged.

1 files had text diffs but no metric diffs.
diff\System.Text.Json.dasm had 16 diffs

ARM64 AdvSIMD diff:

Found 274 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: 0 (0.00% of base)

0 total files with Code Size differences (0 improved, 0 regressed), 266 unchanged.

0 total methods with Code Size differences (0 improved, 0 regressed), 244744 unchanged.

8 files had text diffs but no metric diffs.
System.Private.CoreLib.dasm had 366 diffs
System.Runtime.Numerics.dasm had 48 diffs
xunit.console.dasm had 6 diffs
Microsoft.CSharp.dasm had 4 diffs
System.ComponentModel.Annotations.dasm had 4 diffs
System.Data.Common.dasm had 4 diffs
System.Data.OleDb.dasm had 4 diffs
System.Security.Cryptography.Primitives.dasm had 2 diffs

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of initial comments. I've only skimmed through the simdashwintrinsic* files. I'd like to see the impact on compile time and table sizes at some point.


GenTree* op1 = node->gtGetOp1();
GenTree* op2 = node->gtGetOp2();
GenTree* op3 = nullptr;

if (!HWIntrinsicInfo::SupportsContainment(intrinsicId))
if (!HWIntrinsicInfo::SupportsContainment(intrinsicId) || (simdSize == 8) || (simdSize == 12))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that morph will retype TYP_SIMD12 locals as TYP_SIMD16 where possible, so I think it's probably reasonable to assume it's not safe here (or fix the cases where we don't widen it if we should).

src/coreclr/src/jit/importer.cpp Outdated Show resolved Hide resolved
@tannergooding
Copy link
Member Author

I'd like to see the impact on compile time and table sizes at some point.

What is the best way to collect this information?

@CarolEidt
Copy link
Contributor

@tannergooding - For the tables, you can just look at file sizes. For JIT time, the best way is to use SuperPmi to compile a bunch of methods. SuperPmi is described here: https://github.com/dotnet/runtime/blob/master/src/coreclr/scripts/superpmi.md and I usually measure using pin. If it's a hassle I can probably measure for you.

@tannergooding
Copy link
Member Author

tannergooding commented Apr 25, 2020

I should be able to collect the SuperPMI diffs, although I don't see anything related to pin listed in the doc (or other superpmi reference).
I tried running python .\src\coreclr\scripts\superpmi.py asmdiffs D:\tagoo\Repos\runtime_base\artifacts\tests\coreclr\Windows_NT.x64.Checked\Tests\Core_Root\clrjit.dll for the time being and it immediately exits with an assert in superpmi, same as when doing just replay against runtime_base.

As for file sizes:

File Before After Diff
clrjit.dll 1,245,696 bytes 1,252,352 bytes + 6,656 Bytes
linuxnonjit.dll 1,047,040 bytes 1,057,280 bytes +10,240 Bytes
protononjit.dll 957,440 bytes 962,560 bytes + 5,120 Bytes

We will naturally be able to gain some or all of this back as more get implemented and we can start removing the GT_SIMD path.

I've listed the jit-pmi-diff for x64 and ARM64 here: #35421 (comment)

@tannergooding
Copy link
Member Author

and it immediately exits with an assert in superpmi, same as when doing just replay against runtime_base.

Looks like its because it expects something to be 226 bytes but it is actually 228 bytes. I'm guessing possibly the JIT/EE version changed which I believe means I'll need to do a new collection

Removing the [Intrinsic] attribute from some Vector2/3/4 methods which aren't intrinsic

There were a couple of operator * methods marked as intrinsic when they weren't actually. It isn't a problem for GT_SIMD since it tracks the expected argument kinds and checks it against the method signature.
I've resolved the issue and updated the jit-pmi-diff entries above.

@tannergooding
Copy link
Member Author

Need to fix OpOrEqual comparisons, likewise need to fix division for 8/12 byte to continue zeroing the upper bits after they have completed.

@tannergooding
Copy link
Member Author

I believe I've resolved the couple minor things I found and am collecting new diffs for the Abs methods and additional scenarios covered by the Vector static class being properly handled.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe you've addressed all my questions and concerns. And I've looked over the recent changes.

@tannergooding
Copy link
Member Author

Thanks @CarolEidt, I'm just working on ensuring the diffs are correct again now as there were a couple surprises with the Vector static class and its APIs differing from the equivalent APIs on Vector<T> 😄

@tannergooding tannergooding force-pushed the simd-as-hwintrinsic branch 2 times, most recently from 7127164 to 73f7315 Compare May 2, 2020 03:54
@tannergooding
Copy link
Member Author

Diff is back to expected and shows a few additional improvements.

Found 271 files with textual diffs.

Summary of Code Size diffs:
(Lower is better)

Total bytes of diff: -847 (-0.00% of base)
    diff is an improvement.

Top file regressions (bytes):
          21 : System.Text.Encodings.Web.dasm (0.06% of base)

Top file improvements (bytes):
        -852 : System.Private.CoreLib.dasm (-0.02% of base)
          -8 : System.Text.Json.dasm (-0.00% of base)
          -4 : System.Net.WebSockets.dasm (-0.01% of base)
          -4 : System.Net.WebSockets.WebSocketProtocol.dasm (-0.01% of base)

5 total files with Code Size differences (4 improved, 1 regressed), 261 unchanged.

Top method regressions (bytes):
          14 ( 4.96% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           7 ( 7.22% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
           4 ( 0.33% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4
           4 ( 8.00% of base) : System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           4 ( 8.70% of base) : System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float

Top method improvements (bytes):
         -40 (-6.37% of base) : System.Private.CoreLib.dasm - Vector`1:ConditionalSelect(Vector`1,Vector`1,Vector`1):Vector`1 (6 methods)
         -40 (-4.87% of base) : System.Private.CoreLib.dasm - Vector:AndNot(Vector`1,Vector`1):Vector`1 (6 methods)
         -38 (-15.02% of base) : System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : System.Private.CoreLib.dasm - Vector:LessThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -36 (-5.45% of base) : System.Private.CoreLib.dasm - Vector:GreaterThanOrEqual(Vector`1,Vector`1):Vector`1 (10 methods)
         -34 (-12.36% of base) : System.Private.CoreLib.dasm - Vector:GreaterThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -32 (-11.64% of base) : System.Private.CoreLib.dasm - Vector:LessThan(Vector`1,Vector`1):Vector`1 (10 methods)
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
         -22 (-14.97% of base) : System.Private.CoreLib.dasm - Vector:Abs(Vector`1):Vector`1 (6 methods)
         -20 (-9.76% of base) : System.Private.CoreLib.dasm - Vector`1:op_UnaryNegation(Vector`1):Vector`1 (6 methods)
         -20 (-4.07% of base) : System.Private.CoreLib.dasm - Vector`1:op_OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-2.88% of base) : System.Private.CoreLib.dasm - Vector:OnesComplement(Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-7.58% of base) : System.Private.CoreLib.dasm - Vector:EqualsAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : System.Private.CoreLib.dasm - Vector:LessThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)
         -20 (-3.15% of base) : System.Private.CoreLib.dasm - Vector:LessThanOrEqualAny(Vector`1,Vector`1):bool (6 methods)
         -20 (-2.54% of base) : System.Private.CoreLib.dasm - Vector:GreaterThanOrEqualAll(Vector`1,Vector`1):bool (6 methods)

Top method regressions (percentages):
           4 ( 8.70% of base) : System.Private.CoreLib.dasm - Vector3:DistanceSquared(Vector3,Vector3):float
           4 ( 8.00% of base) : System.Private.CoreLib.dasm - Vector3:Distance(Vector3,Vector3):float
           7 ( 7.22% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateAsciiMask(Vector128`1):Vector128`1
          14 ( 4.96% of base) : System.Text.Encodings.Web.dasm - Sse2Helper:CreateEscapingMask_UnsafeRelaxedJavaScriptEncoder(Vector128`1):Vector128`1 (2 methods)
           4 ( 0.33% of base) : System.Private.CoreLib.dasm - Matrix4x4:CreateConstrainedBillboard(Vector3,Vector3,Vector3,Vector3,Vector3):Matrix4x4

Top method improvements (percentages):
          -8 (-24.24% of base) : System.Private.CoreLib.dasm - Vector4:Clamp(Vector4,Vector4,Vector4):Vector4
          -4 (-17.39% of base) : System.Private.CoreLib.dasm - Vector4:op_UnaryNegation(Vector4):Vector4
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Min(Vector`1,Vector`1):Vector`1 (6 methods)
         -27 (-17.09% of base) : System.Private.CoreLib.dasm - Vector:Max(Vector`1,Vector`1):Vector`1 (6 methods)
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Add(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Subtract(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Multiply(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:Divide(Vector4,Vector4):Vector4
          -4 (-16.67% of base) : System.Private.CoreLib.dasm - Vector4:op_Multiply(Vector4,float):Vector4
          -4 (-16.00% of base) : System.Private.CoreLib.dasm - Vector4:op_Multiply(float,Vector4):Vector4
          -2 (-15.38% of base) : System.Private.CoreLib.dasm - Sse42:Crc32(int,ubyte):int
          -5 (-15.15% of base) : System.Private.CoreLib.dasm - Vector`1:op_Multiply(int,Vector`1):Vector`1
         -38 (-15.02% of base) : System.Private.CoreLib.dasm - Vector:Equals(Vector`1,Vector`1):Vector`1 (10 methods)
         -22 (-14.97% of base) : System.Private.CoreLib.dasm - Vector:Abs(Vector`1):Vector`1 (6 methods)
          -4 (-14.81% of base) : System.Private.CoreLib.dasm - Vector`1:op_Multiply(Vector`1,double):Vector`1
          -2 (-14.29% of base) : System.Private.CoreLib.dasm - Sse42:Crc32(int,ushort):int
          -4 (-14.29% of base) : System.Private.CoreLib.dasm - Vector`1:op_Multiply(double,Vector`1):Vector`1
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseAnd(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:BitwiseOr(Vector`1,Vector`1):Vector`1 (6 methods)
         -20 (-14.29% of base) : System.Private.CoreLib.dasm - Vector:Xor(Vector`1,Vector`1):Vector`1 (6 methods)

@tannergooding
Copy link
Member Author

Will wait for @echesakovMSFT to review before merging

@echesakov
Copy link
Contributor

Will wait for @echesakovMSFT to review before merging

I am taking a look now, sorry for the wait

Copy link
Contributor

@echesakov echesakov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall, looks good - I left couple question/suggestions.
Do we need to update simdashwintrinsiclistarm64.h as further progress on Arm64 intrinsic are made?

@@ -207,7 +207,7 @@ void CodeGen::genHWIntrinsic(GenTreeHWIntrinsic* node)
}
else
{
emitSize = EA_SIZE(node->gtSIMDSize);
emitSize = emitActualTypeSize(Compiler::getSIMDTypeForSize(node->gtSIMDSize));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this needed? To support Vector3 on Arm64?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, its required to support Vector3 as it is size = 12 but actualSize = 16

}

assert(id != NI_AVX_CompareGreaterThan);
return static_cast<int>(FloatComparisonMode::OrderedLessThanSignaling);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be confusing to someone who doesn't know that we expect later in the JIT to swap the intrinsic arguments.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, it would be clearer to leave these as special import intrinsics and move all the plumbing related to opportunisticallyDependsOnAVX to one place.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add a comment, but the entire point of moving it to lowering is so the rest of the JIT doesn't need to care that AVX supports proper GreaterThan while Pre-AVX emulates it.

As we continue adding more JIT optimizations around HWIntrinsics, the distinction doesn't matter to anything except for codegen and so handling the fixup in lowering makes this trivial for everything else.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will submit a follow up PR that adds the comment.

@@ -4143,7 +4143,7 @@ GenTree* Compiler::impIntrinsic(GenTree* newobjThis,
case NI_System_MathF_FusedMultiplyAdd:
{
#ifdef TARGET_XARCH
if (compExactlyDependsOn(InstructionSet_FMA))
if (compExactlyDependsOn(InstructionSet_FMA) && supportSIMDTypes())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not related to this change, but I wonder if this optimization can be implemented without requiring to support SIMD types

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants