Skip to content
This repository was archived by the owner on Jan 23, 2023. It is now read-only.

Updating the HWIntrinsic codegen to support marking LoadVector128 and LoadAlignedVector128 as contained. #16095

Merged
merged 2 commits into from
Feb 3, 2018
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions src/jit/codegenlinear.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1297,6 +1297,10 @@ void CodeGen::genConsumeRegs(GenTree* tree)
{
genConsumeReg(tree->gtGetOp1());
}
else if (tree->OperIsHWIntrinsic())

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like this would be a good place to assert that HW_Flag_NoContainment is not set, or that IsContainableHWIntrinsicOp(tree, tree->gtGetOp1()) but perhaps that's overkill.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think asserting is a good idea. I'll update.

This is also pending #16114 and #16115. The former to fixup the various flags and the latter to make it easier to add some tests with this change (we currently don't have any tests that use Load, LoadScalar, or LoadAligned in a way that they could be contained).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, actually, I can't easily check IsContainableHWIntrinsicOp.

It is a member of Lowering and depends on IsContainableMemoryOp, which itself depends on m_lsra.

I would need to make m_pLowering visible from compiler somewhere to use that check.

The other check, flagsOfHWIntrinsic, currently requires friend access to the compiler, but I don't see any reason why those can't be public.

@CarolEidt, what would you recommend here?

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall why IsContainableMemoryOp has its actual implementation on LinearScan when all the containment analysis is done in Lowering (even before we moved the TreeNodeInfoInit to LSRA). In any case, it seems like something we might want to check during code generation, so we may want to expose it in some way. That said, I don't feel that it's critical.

On the other hand, the flagsOfHwIntrinsic I think should definitely be public so that they are accessible from codegen.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can't recall why IsContainableMemoryOp has its actual implementation on LinearScan when all the containment analysis is done in Lowering (even before we moved the TreeNodeInfoInit to LSRA).

Looks to be one use in LSRA, for SIMDIntrinsicGetItem: https://github.com/dotnet/coreclr/blob/master/src/jit/lsraxarch.cpp#L3026

In any case, it seems like something we might want to check during code generation, so we may want to expose it in some way. That said, I don't feel that it's critical.

Exposing a getLowering() method in the compiler is easy enough, but it also requires us to #include "lower.h" somewhere where codegen can pick it up (probably just in codegen.h). I've commented out this particular assert with a TODO and will finish looking tomorrow if I get the chance.

On the other hand, the flagsOfHwIntrinsic I think should definitely be public so that they are accessible from codegen.

It is already available to codgen, I just forgot to put the assert in an #ifdef _TARGET_XARCH_ since its in codegenlinear. 😄

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both asserts I tried to add here would have been invalid. tree is the node which was contained, we would want to be doing the assert on the node which contains tree.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved the asserts to hwintrinsiccodegenxarch.cpp.

{
genConsumeReg(tree->gtGetOp1());
}
else
{
#ifdef FEATURE_SIMD
Expand Down
63 changes: 63 additions & 0 deletions src/jit/emitxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4009,6 +4009,28 @@ void emitter::emitIns_R_A_I(instruction ins, emitAttr attr, regNumber reg1, GenT
emitCurIGsize += sz;
}

void emitter::emitIns_R_AR_I(instruction ins, emitAttr attr, regNumber reg1, regNumber base, int offs, int ival)
{
noway_assert(emitVerifyEncodable(ins, EA_SIZE(attr), reg1));
assert(IsSSEOrAVXInstruction(ins));

instrDesc* id = emitNewInstrAmdCns(attr, offs, ival);

id->idIns(ins);
id->idReg1(reg1);

id->idInsFmt(IF_RRW_ARD_CNS);
id->idAddr()->iiaAddrMode.amBaseReg = base;
id->idAddr()->iiaAddrMode.amIndxReg = REG_NA;

// Plus one for the 1-byte immediate (ival)
UNATIVE_OFFSET sz = emitInsSizeAM(id, insCodeRM(ins)) + emitGetVexPrefixAdjustedSize(ins, attr, insCodeRM(ins)) + 1;
id->idCodeSize(sz);

dispIns(id);
emitCurIGsize += sz;
}

void emitter::emitIns_R_C_I(
instruction ins, emitAttr attr, regNumber reg1, CORINFO_FIELD_HANDLE fldHnd, int offs, int ival)
{
Expand Down Expand Up @@ -4202,6 +4224,30 @@ void emitter::emitIns_R_R_A_I(
dispIns(id);
emitCurIGsize += sz;
}

void emitter::emitIns_R_R_AR_I(
instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, regNumber base, int offs, int ival)
{
assert(IsSSEOrAVXInstruction(ins));
assert(IsThreeOperandAVXInstruction(ins));

instrDesc* id = emitNewInstrAmdCns(attr, offs, ival);

id->idIns(ins);
id->idReg1(reg1);
id->idReg2(reg2);

id->idInsFmt(IF_RWR_RRD_ARD_CNS);
id->idAddr()->iiaAddrMode.amBaseReg = base;
id->idAddr()->iiaAddrMode.amIndxReg = REG_NA;

// Plus one for the 1-byte immediate (ival)
UNATIVE_OFFSET sz = emitInsSizeAM(id, insCodeRM(ins)) + emitGetVexPrefixAdjustedSize(ins, attr, insCodeRM(ins)) + 1;
id->idCodeSize(sz);

dispIns(id);
emitCurIGsize += sz;
}
#endif // !LEGACY_BACKEND

void emitter::emitIns_R_R_C_I(
Expand Down Expand Up @@ -5396,6 +5442,23 @@ void emitter::emitIns_SIMD_R_R_A_I(
}
}

void emitter::emitIns_SIMD_R_R_AR_I(
instruction ins, emitAttr attr, regNumber reg, regNumber reg1, regNumber base, int ival)
{
if (UseVEXEncoding())
{
emitIns_R_R_AR_I(ins, attr, reg, reg1, base, 0, ival);
}
else
{
if (reg1 != reg)
{
emitIns_R_R(INS_movaps, attr, reg, reg1);
}
emitIns_R_AR_I(ins, attr, reg, base, 0, ival);
}
}

void emitter::emitIns_SIMD_R_R_C_I(
instruction ins, emitAttr attr, regNumber reg, regNumber reg1, CORINFO_FIELD_HANDLE fldHnd, int offs, int ival)
{
Expand Down
5 changes: 5 additions & 0 deletions src/jit/emitxarch.h
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,8 @@ void emitIns_R_A(instruction ins, emitAttr attr, regNumber reg1, GenTreeIndir* i

void emitIns_R_A_I(instruction ins, emitAttr attr, regNumber reg1, GenTreeIndir* indir, int ival);

void emitIns_R_AR_I(instruction ins, emitAttr attr, regNumber reg1, regNumber base, int offs, int ival);

void emitIns_R_C_I(instruction ins, emitAttr attr, regNumber reg1, CORINFO_FIELD_HANDLE fldHnd, int offs, int ival);

void emitIns_R_S_I(instruction ins, emitAttr attr, regNumber reg1, int varx, int offs, int ival);
Expand All @@ -405,6 +407,8 @@ void emitIns_R_R_R(instruction ins, emitAttr attr, regNumber reg1, regNumber reg
#ifndef LEGACY_BACKEND
void emitIns_R_R_A_I(
instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, GenTreeIndir* indir, int ival, insFormat fmt);
void emitIns_R_R_AR_I(
instruction ins, emitAttr attr, regNumber reg1, regNumber reg2, regNumber base, int offs, int ival);
#endif // !LEGACY_BACKEND

void emitIns_R_R_C_I(
Expand Down Expand Up @@ -475,6 +479,7 @@ void emitIns_AX_R(instruction ins, emitAttr attr, regNumber ireg, regNumber reg,
#ifdef FEATURE_HW_INTRINSICS
void emitIns_SIMD_R_R_AR(instruction ins, emitAttr attr, regNumber reg, regNumber reg1, regNumber base);
void emitIns_SIMD_R_R_A_I(instruction ins, emitAttr attr, regNumber reg, regNumber reg1, GenTreeIndir* indir, int ival);
void emitIns_SIMD_R_R_AR_I(instruction ins, emitAttr attr, regNumber reg, regNumber reg1, regNumber base, int ival);
void emitIns_SIMD_R_R_C_I(
instruction ins, emitAttr attr, regNumber reg, regNumber reg1, CORINFO_FIELD_HANDLE fldHnd, int offs, int ival);
void emitIns_SIMD_R_R_R_I(instruction ins, emitAttr attr, regNumber reg, regNumber reg1, regNumber reg2, int ival);
Expand Down
29 changes: 22 additions & 7 deletions src/jit/hwintrinsiccodegenxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,9 @@ void CodeGen::genHWIntrinsic_R_R_RM(GenTreeHWIntrinsic* node, instruction ins)

if (op2->isContained() || op2->isUsedFromSpillTemp())
{
assert((Compiler::flagsOfHWIntrinsic(node->gtHWIntrinsicId) & HW_Flag_NoContainment) == 0);
assert(compiler->m_pLowering->IsContainableHWIntrinsicOp(node, op2) || op2->IsRegOptional());

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!


TempDsc* tmpDsc = nullptr;
unsigned varNum = BAD_VAR_NUM;
unsigned offset = (unsigned)-1;
Expand All @@ -229,6 +232,11 @@ void CodeGen::genHWIntrinsic_R_R_RM(GenTreeHWIntrinsic* node, instruction ins)

compiler->tmpRlsTemp(tmpDsc);
}
else if (op2->OperIsHWIntrinsic())
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar code is needed in genHWIntrinsic_R_R_RM_I.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

{
emit->emitIns_SIMD_R_R_AR(ins, simdSize, targetReg, op1Reg, op2->gtGetOp1()->gtRegNum);
return;
}
else if (op2->isIndir())
{
GenTreeIndir* memIndir = op2->AsIndir();
Expand All @@ -242,7 +250,6 @@ void CodeGen::genHWIntrinsic_R_R_RM(GenTreeHWIntrinsic* node, instruction ins)
offset = 0;

// Ensure that all the GenTreeIndir values are set to their defaults.
assert(memBase->gtRegNum == REG_NA);
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was found to be incorrect in the larger emitInsBinary refactoring, but wasn't also removed from here.

assert(!memIndir->HasIndex());
assert(memIndir->Scale() == 1);
assert(memIndir->Offset() == 0);
Expand Down Expand Up @@ -310,6 +317,7 @@ void CodeGen::genHWIntrinsic_R_R_RM_I(GenTreeHWIntrinsic* node, instruction ins)
regNumber targetReg = node->gtRegNum;
GenTree* op1 = node->gtGetOp1();
GenTree* op2 = node->gtGetOp2();
emitAttr simdSize = (emitAttr)(node->gtSIMDSize);
int ival = Compiler::ivalOfHWIntrinsic(node->gtHWIntrinsicId);
emitter* emit = getEmitter();

Expand All @@ -323,6 +331,9 @@ void CodeGen::genHWIntrinsic_R_R_RM_I(GenTreeHWIntrinsic* node, instruction ins)

if (op2->isContained() || op2->isUsedFromSpillTemp())
{
assert((Compiler::flagsOfHWIntrinsic(node->gtHWIntrinsicId) & HW_Flag_NoContainment) == 0);
assert(compiler->m_pLowering->IsContainableHWIntrinsicOp(node, op2) || op2->IsRegOptional());

TempDsc* tmpDsc = nullptr;
unsigned varNum = BAD_VAR_NUM;
unsigned offset = (unsigned)-1;
Expand All @@ -337,6 +348,11 @@ void CodeGen::genHWIntrinsic_R_R_RM_I(GenTreeHWIntrinsic* node, instruction ins)

compiler->tmpRlsTemp(tmpDsc);
}
else if (op2->OperIsHWIntrinsic())
{
emit->emitIns_SIMD_R_R_AR_I(ins, simdSize, targetReg, op1Reg, op2->gtGetOp1()->gtRegNum, ival);
return;
}
else if (op2->isIndir())
{
GenTreeIndir* memIndir = op2->AsIndir();
Expand All @@ -350,7 +366,6 @@ void CodeGen::genHWIntrinsic_R_R_RM_I(GenTreeHWIntrinsic* node, instruction ins)
offset = 0;

// Ensure that all the GenTreeIndir values are set to their defaults.
assert(memBase->gtRegNum == REG_NA);
assert(!memIndir->HasIndex());
assert(memIndir->Scale() == 1);
assert(memIndir->Offset() == 0);
Expand All @@ -360,14 +375,14 @@ void CodeGen::genHWIntrinsic_R_R_RM_I(GenTreeHWIntrinsic* node, instruction ins)

case GT_CLS_VAR_ADDR:
{
emit->emitIns_SIMD_R_R_C_I(ins, emitTypeSize(targetType), targetReg, op1Reg,
memBase->gtClsVar.gtClsVarHnd, 0, ival);
emit->emitIns_SIMD_R_R_C_I(ins, simdSize, targetReg, op1Reg, memBase->gtClsVar.gtClsVarHnd, 0,
ival);
return;
}

default:
{
emit->emitIns_SIMD_R_R_A_I(ins, emitTypeSize(targetType), targetReg, op1Reg, memIndir, ival);
emit->emitIns_SIMD_R_R_A_I(ins, simdSize, targetReg, op1Reg, memIndir, ival);
return;
}
}
Expand Down Expand Up @@ -405,11 +420,11 @@ void CodeGen::genHWIntrinsic_R_R_RM_I(GenTreeHWIntrinsic* node, instruction ins)
assert((varNum != BAD_VAR_NUM) || (tmpDsc != nullptr));
assert(offset != (unsigned)-1);

emit->emitIns_SIMD_R_R_S_I(ins, emitTypeSize(targetType), targetReg, op1Reg, varNum, offset, ival);
emit->emitIns_SIMD_R_R_S_I(ins, simdSize, targetReg, op1Reg, varNum, offset, ival);
}
else
{
emit->emitIns_SIMD_R_R_R_I(ins, emitTypeSize(targetType), targetReg, op1Reg, op2->gtRegNum, ival);
emit->emitIns_SIMD_R_R_R_I(ins, simdSize, targetReg, op1Reg, op2->gtRegNum, ival);
}
}

Expand Down
6 changes: 3 additions & 3 deletions src/jit/hwintrinsiclistxarch.h
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,7 @@ HARDWARE_INTRINSIC(SSE41_BlendVariable, "BlendVaria

// SSE42 Intrinsics
HARDWARE_INTRINSIC(SSE42_IsSupported, "get_IsSupported", SSE42, -1, 0, 0, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_IsSupportedProperty, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(SSE42_Crc32, "Crc32", SSE42, -1, 0, 2, {INS_invalid, INS_crc32, INS_invalid, INS_crc32, INS_invalid, INS_crc32, INS_invalid, INS_crc32, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(SSE42_Crc32, "Crc32", SSE42, -1, 0, 2, {INS_invalid, INS_crc32, INS_invalid, INS_crc32, INS_invalid, INS_crc32, INS_invalid, INS_crc32, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed)

// AVX Intrinsics
// TODO-XArch When implementing SetZeroVector256 add case to switch table in gentree.cpp
Expand Down Expand Up @@ -207,14 +207,14 @@ HARDWARE_INTRINSIC(FMA_IsSupported, "get_IsSupp

// LZCNT Intrinsics
HARDWARE_INTRINSIC(LZCNT_IsSupported, "get_IsSupported", LZCNT, -1, 0, 0, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_IsSupportedProperty, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(LZCNT_LeadingZeroCount, "LeadingZeroCount", LZCNT, -1, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_lzcnt, INS_invalid, INS_lzcnt, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(LZCNT_LeadingZeroCount, "LeadingZeroCount", LZCNT, -1, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_lzcnt, INS_invalid, INS_lzcnt, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed)

// PCLMULQDQ Intrinsics
HARDWARE_INTRINSIC(PCLMULQDQ_IsSupported, "get_IsSupported", PCLMULQDQ, -1, 0, 0, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_IsSupportedProperty, HW_Flag_NoFlag)

// POPCNT Intrinsics
HARDWARE_INTRINSIC(POPCNT_IsSupported, "get_IsSupported", POPCNT, -1, 0, 0, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_invalid}, HW_Category_IsSupportedProperty, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(POPCNT_PopCount, "PopCount", POPCNT, -1, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_popcnt, INS_invalid, INS_popcnt, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFlag)
HARDWARE_INTRINSIC(POPCNT_PopCount, "PopCount", POPCNT, -1, 0, 1, {INS_invalid, INS_invalid, INS_invalid, INS_invalid, INS_popcnt, INS_invalid, INS_popcnt, INS_invalid, INS_invalid, INS_invalid}, HW_Category_Scalar, HW_Flag_NoFloatingPointUsed)
#endif // FEATURE_HW_INTRINSIC

#undef HARDWARE_INTRINSIC
Expand Down
8 changes: 8 additions & 0 deletions src/jit/hwintrinsicxarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -420,6 +420,7 @@ GenTree* Compiler::impHWIntrinsic(NamedIntrinsic intrinsic,
int numArgs = sig->numArgs;
var_types retType = JITtype2varType(sig->retType);
var_types baseType = TYP_UNKNOWN;

if (retType == TYP_STRUCT && featureSIMD)
{
unsigned int sizeBytes;
Expand Down Expand Up @@ -482,6 +483,13 @@ GenTree* Compiler::impHWIntrinsic(NamedIntrinsic intrinsic,
}
}

if ((flags & HW_Flag_NoFloatingPointUsed) == 0)
{
// Set `compFloatingPointUsed` to cover the scenario where an intrinsic is being on SIMD fields, but
// where no SIMD local vars are in use. This is the same logic as is used for FEATURE_SIMD.
compFloatingPointUsed = true;
}

// table-driven importer of simple intrinsics
if (impIsTableDrivenHWIntrinsic(category, flags))
{
Expand Down
15 changes: 10 additions & 5 deletions src/jit/lower.h
Original file line number Diff line number Diff line change
Expand Up @@ -319,11 +319,6 @@ class Lowering : public Phase
public:
static bool IndirsAreEquivalent(GenTree* pTreeA, GenTree* pTreeB);

private:
static bool NodesAreEquivalentLeaves(GenTree* candidate, GenTree* storeInd);

bool AreSourcesPossiblyModifiedLocals(GenTree* addr, GenTree* base, GenTree* index);

// return true if 'childNode' is an immediate that can be contained
// by the 'parentNode' (i.e. folded into an instruction)
// for example small enough and non-relocatable
Expand All @@ -335,6 +330,16 @@ class Lowering : public Phase
return m_lsra->isContainableMemoryOp(node);
}

#ifdef FEATURE_HW_INTRINSICS
// Return true if 'node' is a containable HWIntrinsic op.
bool IsContainableHWIntrinsicOp(GenTreeHWIntrinsic* containingNode, GenTree* node);
#endif // FEATURE_HW_INTRINSICS

private:
static bool NodesAreEquivalentLeaves(GenTree* candidate, GenTree* storeInd);

bool AreSourcesPossiblyModifiedLocals(GenTree* addr, GenTree* base, GenTree* index);

// Makes 'childNode' contained in the 'parentNode'
void MakeSrcContained(GenTree* parentNode, GenTree* childNode);

Expand Down
Loading