-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Implement scalar Sse2 hardware intrinsics #16237
Conversation
src/jit/instrsxarch.h
Outdated
INST3( movd, "movd" , 0, IUM_WR, 0, 0, PCKDBL(0x7E), BAD_CODE, PCKDBL(0x6E)) | ||
INST3( movq, "movq" , 0, IUM_WR, 0, 0, PCKDBL(0xD6), BAD_CODE, SSEFLT(0x7E)) | ||
INST3( movd, "movd" , 0, IUM_WR, 0, 0, PCKDBL(0x6E), BAD_CODE, PCKDBL(0xFE)) // Move doubleword from r/m32 to xmm or move doubleword from xmm register to r/m32 | ||
INST3( movq, "movq" , 0, IUM_WR, 0, 0, PCKDBL(0xD6), BAD_CODE, SSEFLT(0x7E)) // Move quadword from r/m64 to xmm or move quadword from xmm register to r/m64 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Bad merge - will fix
Just a general nit. It is helpful to break these PRs into two logical commits (one containing the product code changes and one containing the test changes). The GitHub UI (and even various external code review tools) tend to not do so well with large commits, and making it easier to review the product code is generally desirable (IMO). |
src/jit/emitxarch.cpp
Outdated
@@ -267,9 +268,12 @@ bool emitter::Is4ByteSSE4Instruction(instruction ins) | |||
bool emitter::TakesVexPrefix(instruction ins) | |||
{ | |||
// special case vzeroupper as it requires 2-byte VEX prefix | |||
if (ins == INS_vzeroupper) | |||
switch (ins) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this seems to be in a weird state. The sfence and prefetch instructions are missing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, rebasing to latest head which includes commit containing memory instructions - will fix
src/jit/emitxarch.cpp
Outdated
{ | ||
return true; | ||
case INS_cvttsd2si: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cvttss2si is missing.
Additionally, given that you aren't adding anything new here, it would be generally desirable to move it into its own refactoring PR.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
break; | ||
} | ||
|
||
case NI_SSE2_CompareEqualOrderedScalar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These should be merged with the SSE implementations.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree, but since there quite a lot refactoring opportunities still coming I have created issue to track this work item #16014 and I plan to work on it once all SSE2 and other similar SSE intrinsics are in.
baseType = JITtype2varType(corType); | ||
|
||
#ifdef _TARGET_X86_ | ||
if (varTypeIsLong(JITtype2varType(corType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should already be handled by impHWIntrinsic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should already be handled by impHWIntrinsic
Currently, impHWIntrinsic
only handle r64
for return types. Maybe we can make the change latter.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a bug tracking this yet?
Do we also have any mechanism for determining when the intrinsic takes/returns a 64-bit value but also works on 32-bit machines (there is at least a couple of them)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we have a bug tracking this yet?
I do not think this is a bug. We can introduce more flags to hnit the framework that the intrinsic may have an r64
oprand on first/second/.. position.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me make this change after this PR merged.
Do you prefer to break this PR or next ones? |
Breaking this one into two should be fairly trivial, since the split is only along a directory:
|
43bf732
to
dc4e7cd
Compare
src/jit/emitxarch.cpp
Outdated
======= | ||
// these are the only ISA extension instructions taking zero operands | ||
|| ins == INS_vzeroupper | ||
>>>>>>> Implement scalar Sse2 hardware intrinsics |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please solve this conflict.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks will fix - next bad merge
baseType = JITtype2varType(corType); | ||
|
||
#ifdef _TARGET_X86_ | ||
if (varTypeIsLong(JITtype2varType(corType))) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This case should already be handled by impHWIntrinsic
Currently, impHWIntrinsic
only handle r64
for return types. Maybe we can make the change latter.
src/jit/hwintrinsicxarch.cpp
Outdated
|
||
if (baseType == TYP_STRUCT) | ||
{ | ||
baseType = getBaseTypeOfSIMDType(argClass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe that the baseType is always double
on Sse2.ConvertScalarToVector128Double
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, that's the case however I try to write code which will be easy to refactor or abstract to handle more cases later see issue #16014. If it is acceptable I would prefer to defer to refactoring PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Never mind. CVTSS2SD xmm, xmm/m32
and CVTSI2SD xmm, reg/m64
need different encoding.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In this case I use argType as baseType to use it later during instruction emission
src/jit/hwintrinsicxarch.cpp
Outdated
CorInfoType corType = | ||
strip(info.compCompHnd->getArgType(sig, argList, &argClass)); // type of the second argument | ||
|
||
baseType = getBaseTypeOfSIMDType(argClass); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The baseType is float
(retType's base type) on ConvertScalarToVector128Single
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need arg types to use correct encoding later (r64 versus r32) during instruction emitting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you talking about ConvertScalarToVector128Double
?
Sse2.ConvertScalarToVector128Single
just has one signature Vector128<float> ConvertScalarToVector128Single(Vector128<float> upper, Vector128<double> value)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Once I will work on refactoring I will have overloaded - where I need to use second argType as base type
ConvertScalarToVector128Single(Vector128<float> upper, int value)
ConvertScalarToVector128Single(Vector128<float> upper, long value)
If you prefer to simplify it for this PR and not defer changes to refactoring I will do it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. However, I do not think we should combine Sse.ConvertScalarToVector128Single
and Sse2.ConvertScalarToVector128Single
because they have different signatures and codgen consideration.
Furthermore, combining HW intrinsic code cross-ISA should rely on the current category/flag mechanism rather than "intrinsic names". Meanwhile, a little bit duplicated code is fine to me, it may be not worthwhile to complicate the framework just for a few special cases. We can implement these "special cases" individually, then add categories/flags to combine them if there are some obvious patterns.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK than I will simplify it and leave any changes during refactoring to the result of discussion of its scope.
assert(sig->numArgs == 1); | ||
op1 = impSIMDPopStack(TYP_SIMD16); | ||
retType = JITtype2varType(sig->retType); | ||
baseType = getBaseTypeOfSIMDType(info.compCompHnd->getArgClass(sig, sig->args)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can directly use the return type as base type.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There are overloads where it is necessary to select different instruction based on argument type - I use baseType to handle this. AFAIR during debugging I have seen problems with baseType detection and that's why I have explicitly recovered it from sig->retType
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, I see there are two special guys
/// <summary>
/// int _mm_cvtsd_si32 (__m128d a)
/// CVTSD2SI r32, xmm/m64
/// </summary>
public static int ConvertToInt32(Vector128<double> value) => ConvertToInt32(value);
/// <summary>
/// int _mm_cvtsi128_si32 (__m128i a)
/// MOVD reg/m32, xmm
/// </summary>
public static int ConvertToInt32(Vector128<int> value) => ConvertToInt32(value);
/// <summary>
/// __int64 _mm_cvtsd_si64 (__m128d a)
/// CVTSD2SI r64, xmm/m64
/// </summary>
public static long ConvertToInt64(Vector128<double> value) => ConvertToInt64(value);
/// <summary>
/// __int64 _mm_cvtsi128_si64 (__m128i a)
/// MOVQ reg/m64, xmm
/// </summary>
public static long ConvertToInt64(Vector128<long> value) => ConvertToInt64(value);
Perhasps, we can separate these two groups
case NI_SSE2_ConvertToInt32:
case NI_SSE2_ConvertToInt64:
// get baseType from arg
case NI_SSE2_ConvertToUInt32:
case NI_SSE2_ConvertToUInt64:
// use the return type as base type
because calling getBaseTypeOfSIMDType
and info.compCompHnd->getArgClass
are relatively expensive
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right. Will fix
1b76e1a
to
b7d95c7
Compare
case NI_SSE2_ConvertScalarToVector128Int64: | ||
case NI_SSE2_ConvertScalarToVector128UInt64: | ||
{ | ||
assert(sig->numArgs == 1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Due to above I have had to handle above conversion intrinsics as HW_Category_Special
a7c3f15
to
14fab5b
Compare
@tannergooding There are strange test results. Could you help verify if I should report them as issues to resolve or dig in and look for the reasons myself? All OSX and Ubuntu stress jobs failed due to the following failure:
While
|
Most of these appear to be @jkotas, are you aware of any changes that went in recently that could impact this? |
No idea what can be causing this. |
8a1c97f
to
0f0e6a4
Compare
@dotnet-bot test Alpine.3.6 x64 Debug Build please |
@dotnet-bot test OSX10.12 x64 Checked Innerloop Build and Test please |
emit->emitIns_SIMD_R_R_R_I(ins, emitTypeSize(TYP_SIMD16), targetReg, op1Reg, op2Reg, ival); | ||
|
||
break; | ||
} | ||
|
||
case NI_SSE2_CompareEqualOrderedScalar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would be good to have an explicit bug that tracks merging this with the SSE implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The issue #16330 tracks this work item. Could you assign it to me within hardware intrinsics project?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
assert(op2 != nullptr); | ||
op2Reg = op2->gtRegNum; | ||
instruction ins = Compiler::insOfHWIntrinsic(intrinsicID, baseType); | ||
emit->emitIns_SIMD_R_R_R(ins, emitTypeSize(TYP_SIMD16), targetReg, op1Reg, op2Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't this one go through genHWIntrinsic_R_R_RM
?
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
assert(op2 != nullptr); | ||
op2Reg = op2->gtRegNum; | ||
instruction ins = Compiler::insOfHWIntrinsic(intrinsicID, baseType); | ||
emit->emitIns_SIMD_R_R_R(ins, emitTypeSize(TYP_SIMD16), targetReg, op1Reg, op2Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here, genHWIntrinsic_R_R_RM
?
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
} | ||
else | ||
{ | ||
emit->emitIns_R_R(INS_mov_xmm2i, emitActualTypeSize(baseType), op1Reg, targetReg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why isn't this one using insOfHWIntrinsic
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's due to #16322 which I plan to fix in #16329
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure I follow. We are using INS_mov_xmm2i
in both the TYP_LONG
and TYP_ULONG
case, so it shouldn't matter which base type is used in the lookup...
That is, if only TYP_LONG
is being set, we can fill that column of the ConvertToUInt64
table entry with a TODO-XArch-Bug
note that it is a workaround.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if only
TYP_LONG
is being set, we can fill that column of theConvertToUInt64
table entry with aTODO-XArch-Bug
note that it is a workaround
It should be done for TYP_UINT
as well. In principle proposed change is equivalent workaround to that used in current code as we always select INS_mov_xmm2i
. The workaround currently requires one change but in case we would do it in table we should change instruction selection for multiple intrinsics having unsigned integral base types.
If you agree that current solution is simpler I will add TODO-XArch-Bug
comment here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks like this problem is from baseType = JITtype2varType(sig->retType);
. We should have a new sign-awared function (e.g., JITtype2BaseVarType
, or other name) that only used for HW intrinsics.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should have a new sign-awared function (e.g., JITtype2BaseVarType, or other name) that only used for HW intrinsics.
Yes, it is tracked by #16329
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tannergooding Could you reply to my comment #16237 (comment)
src/jit/lsraxarch.cpp
Outdated
@@ -2283,9 +2283,13 @@ void LinearScan::BuildHWIntrinsic(GenTreeHWIntrinsic* intrinsicTree) | |||
switch (intrinsicID) | |||
{ | |||
case NI_SSE_CompareEqualOrderedScalar: | |||
case NI_SSE2_CompareEqualOrderedScalar: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: It might be better to keep these grouped as NI_SSE
, then NI_SSE2
2947cb9
to
8d8ea6b
Compare
@dotnet-bot test Tizen armel Cross Checked Innerloop Build and Test please |
src/jit/hwintrinsiccodegenxarch.cpp
Outdated
break; | ||
} | ||
|
||
case NI_SSE2_ConvertScalarToVector128Single: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: This path could be merged with ConvertScalarToVector128Double
above. The only difference is the baseType
assertion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
} | ||
#endif // _TARGET_X86_ | ||
|
||
if (baseType == TYP_STRUCT) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we could simplify this to use getArgForHWIntrinsic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Will do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately, getArgForHWIntrinsic
does not handle this logic properly
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about it isn't being handled properly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It returns TYP_STRUCT
instead of TYP_FLOAT
- didn't dig deeper to solve it. Plan to have closer look while working on #16329
Couple more pieces of feedback/questions. Will probably be good for sign-off after the next round. |
8d8ea6b
to
cb767d2
Compare
All OSX and Ubuntu jitstress jobs timed out. |
test Ubuntu x64 Checked jitincompletehwintrinsic |
Again 2 Ubuntu jitstress jobs timeouts |
@@ -33,7 +33,7 @@ static const wchar_t *coreCLRDll = W("CoreCLR.dll"); | |||
static const wchar_t *coreCLRInstallDirectory = W("%windir%\\system32\\"); | |||
|
|||
// Encapsulates the environment that CoreCLR will run in, including the TPALIST | |||
class HostEnvironment |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file only contains whitespace changes and isn't related to the PR. It should probably be dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is my workflow error, as I use DebugBreak() in corerun.cpp to start debugging and my editor automatically corrects all whitespace errors. Just forgot to undo changes before commit. Will revert.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall this LGTM.
It would be good to undo the corerun.cpp changes, since they are unrelated and whitespace changes only.
It would also probably be good (long term) to have explicit comments as to why particular intrinsics aren't/can't be table driven.
cb767d2
to
f8fc2fb
Compare
test Windows_NT x64 Checked jitincompletehwintrinsic test Windows_NT x86 Checked jitincompletehwintrinsic test Ubuntu x64 Checked jitincompletehwintrinsic test OSX10.12 x64 Checked jitincompletehwintrinsic |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Thanks @4creators and also thanks @tannergooding for the reviews!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you so much for the work. I logged the PNSE issue at https://github.com/dotnet/coreclr/issues/16342.
test Windows_NT x86 Checked jitx86hwintrinsicnosimd |
test Windows_NT x64 Checked jitx86hwintrinsicnosimd |
Thanks @4creators |
No description provided.