Skip to content

Commit 7e849fd

Browse files
committed
[X86] LowerFunnelShift - allow non-constant vXi8 unpack(y,x) << zext(z) lowering pre-AVX512
Without AVX512 (which can efficiently extend/truncate to vXi16/vXi32), unpacking/packing to vXi16 is more efficient that relying on the (uops-heavy) PBLENDV shift expansion
1 parent a6cabd9 commit 7e849fd

File tree

3 files changed

+259
-288
lines changed

3 files changed

+259
-288
lines changed

llvm/lib/Target/X86/X86ISelLowering.cpp

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -29915,7 +29915,7 @@ static SDValue LowerFunnelShift(SDValue Op, const X86Subtarget &Subtarget,
2991529915
}
2991629916

2991729917
// Attempt to fold per-element (ExtVT) shift as unpack(y,x) << zext(z)
29918-
if ((IsCst && !IsFSHR && EltSizeInBits == 8) ||
29918+
if (((IsCst || !Subtarget.hasAVX512()) && !IsFSHR && EltSizeInBits == 8) ||
2991929919
supportedVectorVarShift(ExtVT, Subtarget, ShiftOpc)) {
2992029920
SDValue Z = DAG.getConstant(0, DL, VT);
2992129921
SDValue RLo = DAG.getBitcast(ExtVT, getUnpackl(DAG, DL, VT, Op1, Op0));

0 commit comments

Comments
 (0)