-
Notifications
You must be signed in to change notification settings - Fork 14.8k
[AMDGPU] Define 1024 VGPRs on gfx1250 #156765
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AMDGPU] Define 1024 VGPRs on gfx1250 #156765
Conversation
@llvm/pr-subscribers-llvm-globalisel Author: Stanislav Mekhanoshin (rampitec) ChangesThis is a baseline support, it is not useable yet. Patch is 266.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156765.diff 32 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 93083f2660c2d..5ebdb28645872 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -564,6 +564,14 @@ class AMDGPUOperand : public MCParsedAsmOperand {
return isRegOrInlineNoMods(AMDGPU::VS_32RegClassID, MVT::i32);
}
+ bool isVCSrc_b32_Lo256() const {
+ return isRegOrInlineNoMods(AMDGPU::VS_32_Lo256RegClassID, MVT::i32);
+ }
+
+ bool isVCSrc_b64_Lo256() const {
+ return isRegOrInlineNoMods(AMDGPU::VS_64_Lo256RegClassID, MVT::i64);
+ }
+
bool isVCSrc_b64() const {
return isRegOrInlineNoMods(AMDGPU::VS_64RegClassID, MVT::i64);
}
@@ -2986,7 +2994,12 @@ MCRegister AMDGPUAsmParser::getRegularReg(RegisterKind RegKind, unsigned RegNum,
const MCRegisterInfo *TRI = getContext().getRegisterInfo();
const MCRegisterClass RC = TRI->getRegClass(RCID);
- if (RegIdx >= RC.getNumRegs()) {
+ if (RegIdx >= RC.getNumRegs() || (RegKind == IS_VGPR && RegIdx > 255)) {
+ Error(Loc, "register index is out of range");
+ return AMDGPU::NoRegister;
+ }
+
+ if (RegKind == IS_VGPR && !isGFX1250() && RegIdx + RegWidth / 32 > 256) {
Error(Loc, "register index is out of range");
return MCRegister();
}
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index bb9f811683255..97ad457d000bb 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -1223,6 +1223,26 @@ void AMDGPUDisassembler::convertVOP3DPPInst(MCInst &MI) const {
}
}
+// Given a wide tuple \p Reg check if it will overflow 256 registers.
+// \returns \p Reg on success or NoRegister otherwise.
+static unsigned CheckVGPROverflow(unsigned Reg, const MCRegisterClass &RC,
+ const MCRegisterInfo &MRI) {
+ unsigned NumRegs = RC.getSizeInBits() / 32;
+ MCRegister Sub0 = MRI.getSubReg(Reg, AMDGPU::sub0);
+ if (!Sub0)
+ return Reg;
+
+ MCRegister BaseReg;
+ if (MRI.getRegClass(AMDGPU::VGPR_32RegClassID).contains(Sub0))
+ BaseReg = AMDGPU::VGPR0;
+ else if (MRI.getRegClass(AMDGPU::AGPR_32RegClassID).contains(Sub0))
+ BaseReg = AMDGPU::AGPR0;
+
+ assert(BaseReg && "Only vector registers expected");
+
+ return (Sub0 - BaseReg + NumRegs <= 256) ? Reg : AMDGPU::NoRegister;
+}
+
// Note that before gfx10, the MIMG encoding provided no information about
// VADDR size. Consequently, decoded instructions always show address as if it
// has 1 dword, which could be not really so.
@@ -1327,8 +1347,9 @@ void AMDGPUDisassembler::convertMIMGInst(MCInst &MI) const {
MCRegister VdataSub0 = MRI.getSubReg(Vdata0, AMDGPU::sub0);
Vdata0 = (VdataSub0 != 0)? VdataSub0 : Vdata0;
- NewVdata = MRI.getMatchingSuperReg(Vdata0, AMDGPU::sub0,
- &MRI.getRegClass(DataRCID));
+ const MCRegisterClass &NewRC = MRI.getRegClass(DataRCID);
+ NewVdata = MRI.getMatchingSuperReg(Vdata0, AMDGPU::sub0, &NewRC);
+ NewVdata = CheckVGPROverflow(NewVdata, NewRC, MRI);
if (!NewVdata) {
// It's possible to encode this such that the low register + enabled
// components exceeds the register count.
@@ -1347,8 +1368,9 @@ void AMDGPUDisassembler::convertMIMGInst(MCInst &MI) const {
VAddrSA = VAddrSubSA ? VAddrSubSA : VAddrSA;
auto AddrRCID = MCII->get(NewOpcode).operands()[VAddrSAIdx].RegClass;
- NewVAddrSA = MRI.getMatchingSuperReg(VAddrSA, AMDGPU::sub0,
- &MRI.getRegClass(AddrRCID));
+ const MCRegisterClass &NewRC = MRI.getRegClass(AddrRCID);
+ NewVAddrSA = MRI.getMatchingSuperReg(VAddrSA, AMDGPU::sub0, &NewRC);
+ NewVAddrSA = CheckVGPROverflow(NewVAddrSA, NewRC, MRI);
if (!NewVAddrSA)
return;
}
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 931966b6df1df..7b94ea3ffbf1f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -577,6 +577,7 @@ GCNSubtarget::getMaxNumVectorRegs(const Function &F) const {
unsigned MaxNumVGPRs = MaxVectorRegs;
unsigned MaxNumAGPRs = 0;
+ unsigned NumArchVGPRs = has1024AddressableVGPRs() ? 1024 : 256;
// On GFX90A, the number of VGPRs and AGPRs need not be equal. Theoretically,
// a wave may have up to 512 total vector registers combining together both
@@ -589,7 +590,6 @@ GCNSubtarget::getMaxNumVectorRegs(const Function &F) const {
if (hasGFX90AInsts()) {
unsigned MinNumAGPRs = 0;
const unsigned TotalNumAGPRs = AMDGPU::AGPR_32RegClass.getNumRegs();
- const unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
const std::pair<unsigned, unsigned> DefaultNumAGPR = {~0u, ~0u};
@@ -614,11 +614,11 @@ GCNSubtarget::getMaxNumVectorRegs(const Function &F) const {
MaxNumAGPRs = std::min(std::max(MinNumAGPRs, MaxNumAGPRs), MaxVectorRegs);
MinNumAGPRs = std::min(std::min(MinNumAGPRs, TotalNumAGPRs), MaxNumAGPRs);
- MaxNumVGPRs = std::min(MaxVectorRegs - MinNumAGPRs, TotalNumVGPRs);
+ MaxNumVGPRs = std::min(MaxVectorRegs - MinNumAGPRs, NumArchVGPRs);
MaxNumAGPRs = std::min(MaxVectorRegs - MaxNumVGPRs, MaxNumAGPRs);
assert(MaxNumVGPRs + MaxNumAGPRs <= MaxVectorRegs &&
- MaxNumAGPRs <= TotalNumAGPRs && MaxNumVGPRs <= TotalNumVGPRs &&
+ MaxNumAGPRs <= TotalNumAGPRs && MaxNumVGPRs <= NumArchVGPRs &&
"invalid register counts");
} else if (hasMAIInsts()) {
// On gfx908 the number of AGPRs always equals the number of VGPRs.
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
index fc28b9c96f5af..16e96835ebead 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
@@ -402,7 +402,7 @@ void AMDGPUMCCodeEmitter::encodeInstruction(const MCInst &MI,
if (AMDGPU::isGFX10Plus(STI) && isVCMPX64(Desc)) {
assert((Encoding & 0xFF) == 0);
Encoding |= MRI.getEncodingValue(AMDGPU::EXEC_LO) &
- AMDGPU::HWEncoding::REG_IDX_MASK;
+ AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
}
for (unsigned i = 0; i < bytes; i++) {
@@ -551,7 +551,7 @@ void AMDGPUMCCodeEmitter::getAVOperandEncoding(
SmallVectorImpl<MCFixup> &Fixups, const MCSubtargetInfo &STI) const {
MCRegister Reg = MI.getOperand(OpNo).getReg();
unsigned Enc = MRI.getEncodingValue(Reg);
- unsigned Idx = Enc & AMDGPU::HWEncoding::REG_IDX_MASK;
+ unsigned Idx = Enc & AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
bool IsVGPROrAGPR =
Enc & (AMDGPU::HWEncoding::IS_VGPR | AMDGPU::HWEncoding::IS_AGPR);
@@ -593,7 +593,7 @@ void AMDGPUMCCodeEmitter::getMachineOpValue(const MCInst &MI,
const MCSubtargetInfo &STI) const {
if (MO.isReg()){
unsigned Enc = MRI.getEncodingValue(MO.getReg());
- unsigned Idx = Enc & AMDGPU::HWEncoding::REG_IDX_MASK;
+ unsigned Idx = Enc & AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
bool IsVGPROrAGPR =
Enc & (AMDGPU::HWEncoding::IS_VGPR | AMDGPU::HWEncoding::IS_AGPR);
Op = Idx | (IsVGPROrAGPR << 8);
@@ -656,7 +656,7 @@ void AMDGPUMCCodeEmitter::getMachineOpValueT16Lo128(
const MCOperand &MO = MI.getOperand(OpNo);
if (MO.isReg()) {
uint16_t Encoding = MRI.getEncodingValue(MO.getReg());
- unsigned RegIdx = Encoding & AMDGPU::HWEncoding::REG_IDX_MASK;
+ unsigned RegIdx = Encoding & AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
bool IsHi = Encoding & AMDGPU::HWEncoding::IS_HI16;
bool IsVGPR = Encoding & AMDGPU::HWEncoding::IS_VGPR;
assert((!IsVGPR || isUInt<7>(RegIdx)) && "VGPR0-VGPR127 expected!");
diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h
index e9dc583f92002..a4d3cc8fdf1f1 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -354,10 +354,11 @@ enum : unsigned {
// Register codes as defined in the TableGen's HWEncoding field.
namespace HWEncoding {
enum : unsigned {
- REG_IDX_MASK = 0xff,
- IS_VGPR = 1 << 8,
- IS_AGPR = 1 << 9,
- IS_HI16 = 1 << 10,
+ REG_IDX_MASK = 0x3ff,
+ LO256_REG_IDX_MASK = 0xff,
+ IS_VGPR = 1 << 10,
+ IS_AGPR = 1 << 11,
+ IS_HI16 = 1 << 12,
};
} // namespace HWEncoding
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 9b348d46fec4f..357b6f0fe2602 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1728,7 +1728,9 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
"Whole wave functions can use the reg mapped for their i1 argument");
// FIXME: Be more efficient!
- for (MCRegister Reg : AMDGPU::VGPR_32RegClass)
+ unsigned NumArchVGPRs = ST.has1024AddressableVGPRs() ? 1024 : 256;
+ for (MCRegister Reg :
+ AMDGPU::VGPR_32RegClass.getRegisters().take_front(NumArchVGPRs))
if (MF.getRegInfo().isPhysRegModified(Reg)) {
MFI->reserveWWMRegister(Reg);
MF.begin()->addLiveIn(Reg);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 2f5a2bc31caf7..1332ef60e15d1 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16916,7 +16916,7 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
switch (BitWidth) {
case 16:
RC = Subtarget->useRealTrue16Insts() ? &AMDGPU::VGPR_16RegClass
- : &AMDGPU::VGPR_32RegClass;
+ : &AMDGPU::VGPR_32_Lo256RegClass;
break;
default:
RC = TRI->getVGPRClassForBitWidth(BitWidth);
@@ -16963,7 +16963,7 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
auto [Kind, Idx, NumRegs] = AMDGPU::parseAsmConstraintPhysReg(Constraint);
if (Kind != '\0') {
if (Kind == 'v') {
- RC = &AMDGPU::VGPR_32RegClass;
+ RC = &AMDGPU::VGPR_32_Lo256RegClass;
} else if (Kind == 's') {
RC = &AMDGPU::SGPR_32RegClass;
} else if (Kind == 'a') {
@@ -17005,6 +17005,7 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
return std::pair(0U, nullptr);
if (Idx < RC->getNumRegs())
return std::pair(RC->getRegister(Idx), RC);
+ return std::pair(0U, nullptr);
}
}
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index e3a2efdd3856f..b163a274396ff 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -152,7 +152,7 @@ static constexpr StringLiteral WaitEventTypeName[] = {
// We reserve a fixed number of VGPR slots in the scoring tables for
// special tokens like SCMEM_LDS (needed for buffer load to LDS).
enum RegisterMapping {
- SQ_MAX_PGM_VGPRS = 1024, // Maximum programmable VGPRs across all targets.
+ SQ_MAX_PGM_VGPRS = 2048, // Maximum programmable VGPRs across all targets.
AGPR_OFFSET = 512, // Maximum programmable ArchVGPRs across all targets.
SQ_MAX_PGM_SGPRS = 128, // Maximum programmable SGPRs across all targets.
// Artificial register slots to track LDS writes into specific LDS locations
@@ -831,7 +831,6 @@ RegInterval WaitcntBrackets::getRegInterval(const MachineInstr *MI,
MCRegister MCReg = AMDGPU::getMCReg(Op.getReg(), *Context->ST);
unsigned RegIdx = TRI->getHWRegIndex(MCReg);
- assert(isUInt<8>(RegIdx));
const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Op.getReg());
unsigned Size = TRI->getRegSizeInBits(*RC);
@@ -839,7 +838,7 @@ RegInterval WaitcntBrackets::getRegInterval(const MachineInstr *MI,
// AGPRs/VGPRs are tracked every 16 bits, SGPRs by 32 bits
if (TRI->isVectorRegister(*MRI, Op.getReg())) {
unsigned Reg = RegIdx << 1 | (AMDGPU::isHi16Reg(MCReg, *TRI) ? 1 : 0);
- assert(Reg < AGPR_OFFSET);
+ assert(!Context->ST->hasMAIInsts() || Reg < AGPR_OFFSET);
Result.first = Reg;
if (TRI->isAGPR(*MRI, Op.getReg()))
Result.first += AGPR_OFFSET;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index d86f9a016d743..a1fcf26eab27b 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -3273,6 +3273,10 @@ StringRef SIRegisterInfo::getRegAsmName(MCRegister Reg) const {
return AMDGPUInstPrinter::getRegisterName(Reg);
}
+unsigned SIRegisterInfo::getHWRegIndex(MCRegister Reg) const {
+ return getEncodingValue(Reg) & AMDGPU::HWEncoding::REG_IDX_MASK;
+}
+
unsigned AMDGPU::getRegBitWidth(const TargetRegisterClass &RC) {
return getRegBitWidth(RC.getID());
}
@@ -3353,6 +3357,40 @@ SIRegisterInfo::getVGPRClassForBitWidth(unsigned BitWidth) const {
: getAnyVGPRClassForBitWidth(BitWidth);
}
+const TargetRegisterClass *
+SIRegisterInfo::getAlignedLo256VGPRClassForBitWidth(unsigned BitWidth) const {
+ if (BitWidth <= 32)
+ return &AMDGPU::VGPR_32_Lo256RegClass;
+ if (BitWidth <= 64)
+ return &AMDGPU::VReg_64_Lo256_Align2RegClass;
+ if (BitWidth <= 96)
+ return &AMDGPU::VReg_96_Lo256_Align2RegClass;
+ if (BitWidth <= 128)
+ return &AMDGPU::VReg_128_Lo256_Align2RegClass;
+ if (BitWidth <= 160)
+ return &AMDGPU::VReg_160_Lo256_Align2RegClass;
+ if (BitWidth <= 192)
+ return &AMDGPU::VReg_192_Lo256_Align2RegClass;
+ if (BitWidth <= 224)
+ return &AMDGPU::VReg_224_Lo256_Align2RegClass;
+ if (BitWidth <= 256)
+ return &AMDGPU::VReg_256_Lo256_Align2RegClass;
+ if (BitWidth <= 288)
+ return &AMDGPU::VReg_288_Lo256_Align2RegClass;
+ if (BitWidth <= 320)
+ return &AMDGPU::VReg_320_Lo256_Align2RegClass;
+ if (BitWidth <= 352)
+ return &AMDGPU::VReg_352_Lo256_Align2RegClass;
+ if (BitWidth <= 384)
+ return &AMDGPU::VReg_384_Lo256_Align2RegClass;
+ if (BitWidth <= 512)
+ return &AMDGPU::VReg_512_Lo256_Align2RegClass;
+ if (BitWidth <= 1024)
+ return &AMDGPU::VReg_1024_Lo256_Align2RegClass;
+
+ return nullptr;
+}
+
static const TargetRegisterClass *
getAnyAGPRClassForBitWidth(unsigned BitWidth) {
if (BitWidth == 64)
@@ -3547,7 +3585,17 @@ bool SIRegisterInfo::isSGPRReg(const MachineRegisterInfo &MRI,
const TargetRegisterClass *
SIRegisterInfo::getEquivalentVGPRClass(const TargetRegisterClass *SRC) const {
unsigned Size = getRegSizeInBits(*SRC);
- const TargetRegisterClass *VRC = getVGPRClassForBitWidth(Size);
+
+ switch (SRC->getID()) {
+ default:
+ break;
+ case AMDGPU::VS_32_Lo256RegClassID:
+ case AMDGPU::VS_64_Lo256RegClassID:
+ return getAllocatableClass(getAlignedLo256VGPRClassForBitWidth(Size));
+ }
+
+ const TargetRegisterClass *VRC =
+ getAllocatableClass(getVGPRClassForBitWidth(Size));
assert(VRC && "Invalid register class size");
return VRC;
}
@@ -4005,7 +4053,12 @@ SIRegisterInfo::getSubRegAlignmentNumBits(const TargetRegisterClass *RC,
unsigned SIRegisterInfo::getNumUsedPhysRegs(const MachineRegisterInfo &MRI,
const TargetRegisterClass &RC,
bool IncludeCalls) const {
- for (MCPhysReg Reg : reverse(RC.getRegisters()))
+ unsigned NumArchVGPRs = ST.has1024AddressableVGPRs() ? 1024 : 256;
+ ArrayRef<MCPhysReg> Registers =
+ (RC.getID() == AMDGPU::VGPR_32RegClassID)
+ ? RC.getRegisters().take_front(NumArchVGPRs)
+ : RC.getRegisters();
+ for (MCPhysReg Reg : reverse(Registers))
if (MRI.isPhysRegUsed(Reg, /*SkipRegMaskTest=*/!IncludeCalls))
return getHWRegIndex(Reg) + 1;
return 0;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index 5508f07b1b5ff..eeefef1116aa3 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -200,13 +200,14 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
StringRef getRegAsmName(MCRegister Reg) const override;
// Pseudo regs are not allowed
- unsigned getHWRegIndex(MCRegister Reg) const {
- return getEncodingValue(Reg) & 0xff;
- }
+ unsigned getHWRegIndex(MCRegister Reg) const;
LLVM_READONLY
const TargetRegisterClass *getVGPRClassForBitWidth(unsigned BitWidth) const;
+ LLVM_READONLY const TargetRegisterClass *
+ getAlignedLo256VGPRClassForBitWidth(unsigned BitWidth) const;
+
LLVM_READONLY
const TargetRegisterClass *getAGPRClassForBitWidth(unsigned BitWidth) const;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 9d5b3560074ac..1aa3f5d87f729 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -76,17 +76,17 @@ class SIRegisterTuples<list<SubRegIndex> Indices, RegisterClass RC,
//===----------------------------------------------------------------------===//
// Declarations that describe the SI registers
//===----------------------------------------------------------------------===//
-class SIReg <string n, bits<8> regIdx = 0, bit isVGPR = 0,
+class SIReg <string n, bits<10> regIdx = 0, bit isVGPR = 0,
bit isAGPR = 0, bit isHi16 = 0> : Register<n> {
let Namespace = "AMDGPU";
// These are generic helper values we use to form actual register
// codes. They should not be assumed to match any particular register
// encodings on any particular subtargets.
- let HWEncoding{7-0} = regIdx;
- let HWEncoding{8} = isVGPR;
- let HWEncoding{9} = isAGPR;
- let HWEncoding{10} = isHi16;
+ let HWEncoding{9-0} = regIdx;
+ let HWEncoding{10} = isVGPR;
+ let HWEncoding{11} = isAGPR;
+ let HWEncoding{12} = isHi16;
int Index = !cast<int>(regIdx);
}
@@ -128,7 +128,7 @@ class SIRegisterClass <string n, list<ValueType> rTypes, int Align, dag rList>
}
-multiclass SIRegLoHi16 <string n, bits<8> regIdx, bit ArtificialHigh = 1,
+multiclass SIRegLoHi16 <string n, bits<10> regIdx, bit ArtificialHigh = 1,
bit isVGPR = 0, bit isAGPR = 0,
list<int> DwarfEncodings = [-1, -1]> {
def _LO16 : SIReg<n#".l", regIdx, isVGPR, isAGPR>;
@@ -142,9 +142,10 @@ multiclass SIRegLoHi16 <string n, bits<8> regIdx, bit ArtificialHigh = 1,
let Namespace = "AMDGPU";
let SubRegIndices = [lo16, hi16];
let CoveredBySubRegs = !not(ArtificialHigh);
- let HWEncoding{7-0} = regIdx;
- let HWEncoding{8} = isVGPR;
- let HWEncoding{9} = isAGPR;
+
+ let HWEncoding{9-0} = regIdx;
+ let HWEncoding{10} = isVGPR;
+ let HWEncoding{11} = isAGPR;
int Index = !cast<int>(regIdx);
}
@@ -225,7 +226,7 @@ def SGPR_NULL64 :
// the high 32 bits. The lower 32 bits are always zero (for base) or
// -1 (for limit). Since we cannot access the high 32 bits, when we
// need them, we need to do a 64 bit load and extract the bits manually.
-multiclass ApertureRegister<string name, bits<8> regIdx> {
+multiclass ApertureRegister<string name, bits<10> regIdx> {
let isConstant = true in {
// FIXME: We shouldn't need to define subregisters for these (nor add them to any 16 bit
// register classes), but if we don't it seems to confuse the TableGen
@@ -313,7 +314,7 @@ foreach Index = 0...15 in {
defm TTMP#Index : SIRegLoHi16<"ttmp"#Index, 0>;
}
-multiclass FLAT_SCR_LOHI_m <string n, bits<8> ci_e, bits<8> vi_e> {
+multiclass FLAT_SCR_LOHI_m <string n, bits<10> ci_e, bits<10> vi_e> {
defm _ci : SIRegLoHi16<n, ci_e>;
defm _vi : SIRegLoHi16<n, vi_e>;
defm "" : SIRegLoHi16<n, 0>;
@@ -343,11 +344,12 @@ foreach Index = 0...105 in {
}
// VGPR registers
-foreach Index = 0...255 in {
+foreach Index = 0...1023 in {
defm VGPR#Index :
SIRegLoHi16 <"v"#Index, Index, /*ArtificialHigh=*/ 0,
/*isVGPR=*/ 1, /*isAGPR=*/ 0, /*DwarfEncodings=*/
- [!add(Index, 2560), !add(Index, 1536)]>;
+ [!if(!le(Index, 511), !add(Index, 2560), -1),
+ !if(!le(Index, 511), !add(Index, 1536), !add(Index, !sub(3584, 512)))]>;
}
// AccVGPR registers
@@ -604,15 +606,15 @@ def Reg512Types : Re...
[truncated]
|
@llvm/pr-subscribers-backend-amdgpu Author: Stanislav Mekhanoshin (rampitec) ChangesThis is a baseline support, it is not useable yet. Patch is 266.25 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/156765.diff 32 Files Affected:
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index 93083f2660c2d..5ebdb28645872 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -564,6 +564,14 @@ class AMDGPUOperand : public MCParsedAsmOperand {
return isRegOrInlineNoMods(AMDGPU::VS_32RegClassID, MVT::i32);
}
+ bool isVCSrc_b32_Lo256() const {
+ return isRegOrInlineNoMods(AMDGPU::VS_32_Lo256RegClassID, MVT::i32);
+ }
+
+ bool isVCSrc_b64_Lo256() const {
+ return isRegOrInlineNoMods(AMDGPU::VS_64_Lo256RegClassID, MVT::i64);
+ }
+
bool isVCSrc_b64() const {
return isRegOrInlineNoMods(AMDGPU::VS_64RegClassID, MVT::i64);
}
@@ -2986,7 +2994,12 @@ MCRegister AMDGPUAsmParser::getRegularReg(RegisterKind RegKind, unsigned RegNum,
const MCRegisterInfo *TRI = getContext().getRegisterInfo();
const MCRegisterClass RC = TRI->getRegClass(RCID);
- if (RegIdx >= RC.getNumRegs()) {
+ if (RegIdx >= RC.getNumRegs() || (RegKind == IS_VGPR && RegIdx > 255)) {
+ Error(Loc, "register index is out of range");
+ return AMDGPU::NoRegister;
+ }
+
+ if (RegKind == IS_VGPR && !isGFX1250() && RegIdx + RegWidth / 32 > 256) {
Error(Loc, "register index is out of range");
return MCRegister();
}
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index bb9f811683255..97ad457d000bb 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -1223,6 +1223,26 @@ void AMDGPUDisassembler::convertVOP3DPPInst(MCInst &MI) const {
}
}
+// Given a wide tuple \p Reg check if it will overflow 256 registers.
+// \returns \p Reg on success or NoRegister otherwise.
+static unsigned CheckVGPROverflow(unsigned Reg, const MCRegisterClass &RC,
+ const MCRegisterInfo &MRI) {
+ unsigned NumRegs = RC.getSizeInBits() / 32;
+ MCRegister Sub0 = MRI.getSubReg(Reg, AMDGPU::sub0);
+ if (!Sub0)
+ return Reg;
+
+ MCRegister BaseReg;
+ if (MRI.getRegClass(AMDGPU::VGPR_32RegClassID).contains(Sub0))
+ BaseReg = AMDGPU::VGPR0;
+ else if (MRI.getRegClass(AMDGPU::AGPR_32RegClassID).contains(Sub0))
+ BaseReg = AMDGPU::AGPR0;
+
+ assert(BaseReg && "Only vector registers expected");
+
+ return (Sub0 - BaseReg + NumRegs <= 256) ? Reg : AMDGPU::NoRegister;
+}
+
// Note that before gfx10, the MIMG encoding provided no information about
// VADDR size. Consequently, decoded instructions always show address as if it
// has 1 dword, which could be not really so.
@@ -1327,8 +1347,9 @@ void AMDGPUDisassembler::convertMIMGInst(MCInst &MI) const {
MCRegister VdataSub0 = MRI.getSubReg(Vdata0, AMDGPU::sub0);
Vdata0 = (VdataSub0 != 0)? VdataSub0 : Vdata0;
- NewVdata = MRI.getMatchingSuperReg(Vdata0, AMDGPU::sub0,
- &MRI.getRegClass(DataRCID));
+ const MCRegisterClass &NewRC = MRI.getRegClass(DataRCID);
+ NewVdata = MRI.getMatchingSuperReg(Vdata0, AMDGPU::sub0, &NewRC);
+ NewVdata = CheckVGPROverflow(NewVdata, NewRC, MRI);
if (!NewVdata) {
// It's possible to encode this such that the low register + enabled
// components exceeds the register count.
@@ -1347,8 +1368,9 @@ void AMDGPUDisassembler::convertMIMGInst(MCInst &MI) const {
VAddrSA = VAddrSubSA ? VAddrSubSA : VAddrSA;
auto AddrRCID = MCII->get(NewOpcode).operands()[VAddrSAIdx].RegClass;
- NewVAddrSA = MRI.getMatchingSuperReg(VAddrSA, AMDGPU::sub0,
- &MRI.getRegClass(AddrRCID));
+ const MCRegisterClass &NewRC = MRI.getRegClass(AddrRCID);
+ NewVAddrSA = MRI.getMatchingSuperReg(VAddrSA, AMDGPU::sub0, &NewRC);
+ NewVAddrSA = CheckVGPROverflow(NewVAddrSA, NewRC, MRI);
if (!NewVAddrSA)
return;
}
diff --git a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
index 931966b6df1df..7b94ea3ffbf1f 100644
--- a/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
+++ b/llvm/lib/Target/AMDGPU/GCNSubtarget.cpp
@@ -577,6 +577,7 @@ GCNSubtarget::getMaxNumVectorRegs(const Function &F) const {
unsigned MaxNumVGPRs = MaxVectorRegs;
unsigned MaxNumAGPRs = 0;
+ unsigned NumArchVGPRs = has1024AddressableVGPRs() ? 1024 : 256;
// On GFX90A, the number of VGPRs and AGPRs need not be equal. Theoretically,
// a wave may have up to 512 total vector registers combining together both
@@ -589,7 +590,6 @@ GCNSubtarget::getMaxNumVectorRegs(const Function &F) const {
if (hasGFX90AInsts()) {
unsigned MinNumAGPRs = 0;
const unsigned TotalNumAGPRs = AMDGPU::AGPR_32RegClass.getNumRegs();
- const unsigned TotalNumVGPRs = AMDGPU::VGPR_32RegClass.getNumRegs();
const std::pair<unsigned, unsigned> DefaultNumAGPR = {~0u, ~0u};
@@ -614,11 +614,11 @@ GCNSubtarget::getMaxNumVectorRegs(const Function &F) const {
MaxNumAGPRs = std::min(std::max(MinNumAGPRs, MaxNumAGPRs), MaxVectorRegs);
MinNumAGPRs = std::min(std::min(MinNumAGPRs, TotalNumAGPRs), MaxNumAGPRs);
- MaxNumVGPRs = std::min(MaxVectorRegs - MinNumAGPRs, TotalNumVGPRs);
+ MaxNumVGPRs = std::min(MaxVectorRegs - MinNumAGPRs, NumArchVGPRs);
MaxNumAGPRs = std::min(MaxVectorRegs - MaxNumVGPRs, MaxNumAGPRs);
assert(MaxNumVGPRs + MaxNumAGPRs <= MaxVectorRegs &&
- MaxNumAGPRs <= TotalNumAGPRs && MaxNumVGPRs <= TotalNumVGPRs &&
+ MaxNumAGPRs <= TotalNumAGPRs && MaxNumVGPRs <= NumArchVGPRs &&
"invalid register counts");
} else if (hasMAIInsts()) {
// On gfx908 the number of AGPRs always equals the number of VGPRs.
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
index fc28b9c96f5af..16e96835ebead 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUMCCodeEmitter.cpp
@@ -402,7 +402,7 @@ void AMDGPUMCCodeEmitter::encodeInstruction(const MCInst &MI,
if (AMDGPU::isGFX10Plus(STI) && isVCMPX64(Desc)) {
assert((Encoding & 0xFF) == 0);
Encoding |= MRI.getEncodingValue(AMDGPU::EXEC_LO) &
- AMDGPU::HWEncoding::REG_IDX_MASK;
+ AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
}
for (unsigned i = 0; i < bytes; i++) {
@@ -551,7 +551,7 @@ void AMDGPUMCCodeEmitter::getAVOperandEncoding(
SmallVectorImpl<MCFixup> &Fixups, const MCSubtargetInfo &STI) const {
MCRegister Reg = MI.getOperand(OpNo).getReg();
unsigned Enc = MRI.getEncodingValue(Reg);
- unsigned Idx = Enc & AMDGPU::HWEncoding::REG_IDX_MASK;
+ unsigned Idx = Enc & AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
bool IsVGPROrAGPR =
Enc & (AMDGPU::HWEncoding::IS_VGPR | AMDGPU::HWEncoding::IS_AGPR);
@@ -593,7 +593,7 @@ void AMDGPUMCCodeEmitter::getMachineOpValue(const MCInst &MI,
const MCSubtargetInfo &STI) const {
if (MO.isReg()){
unsigned Enc = MRI.getEncodingValue(MO.getReg());
- unsigned Idx = Enc & AMDGPU::HWEncoding::REG_IDX_MASK;
+ unsigned Idx = Enc & AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
bool IsVGPROrAGPR =
Enc & (AMDGPU::HWEncoding::IS_VGPR | AMDGPU::HWEncoding::IS_AGPR);
Op = Idx | (IsVGPROrAGPR << 8);
@@ -656,7 +656,7 @@ void AMDGPUMCCodeEmitter::getMachineOpValueT16Lo128(
const MCOperand &MO = MI.getOperand(OpNo);
if (MO.isReg()) {
uint16_t Encoding = MRI.getEncodingValue(MO.getReg());
- unsigned RegIdx = Encoding & AMDGPU::HWEncoding::REG_IDX_MASK;
+ unsigned RegIdx = Encoding & AMDGPU::HWEncoding::LO256_REG_IDX_MASK;
bool IsHi = Encoding & AMDGPU::HWEncoding::IS_HI16;
bool IsVGPR = Encoding & AMDGPU::HWEncoding::IS_VGPR;
assert((!IsVGPR || isUInt<7>(RegIdx)) && "VGPR0-VGPR127 expected!");
diff --git a/llvm/lib/Target/AMDGPU/SIDefines.h b/llvm/lib/Target/AMDGPU/SIDefines.h
index e9dc583f92002..a4d3cc8fdf1f1 100644
--- a/llvm/lib/Target/AMDGPU/SIDefines.h
+++ b/llvm/lib/Target/AMDGPU/SIDefines.h
@@ -354,10 +354,11 @@ enum : unsigned {
// Register codes as defined in the TableGen's HWEncoding field.
namespace HWEncoding {
enum : unsigned {
- REG_IDX_MASK = 0xff,
- IS_VGPR = 1 << 8,
- IS_AGPR = 1 << 9,
- IS_HI16 = 1 << 10,
+ REG_IDX_MASK = 0x3ff,
+ LO256_REG_IDX_MASK = 0xff,
+ IS_VGPR = 1 << 10,
+ IS_AGPR = 1 << 11,
+ IS_HI16 = 1 << 12,
};
} // namespace HWEncoding
diff --git a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
index 9b348d46fec4f..357b6f0fe2602 100644
--- a/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIFrameLowering.cpp
@@ -1728,7 +1728,9 @@ void SIFrameLowering::determineCalleeSaves(MachineFunction &MF,
"Whole wave functions can use the reg mapped for their i1 argument");
// FIXME: Be more efficient!
- for (MCRegister Reg : AMDGPU::VGPR_32RegClass)
+ unsigned NumArchVGPRs = ST.has1024AddressableVGPRs() ? 1024 : 256;
+ for (MCRegister Reg :
+ AMDGPU::VGPR_32RegClass.getRegisters().take_front(NumArchVGPRs))
if (MF.getRegInfo().isPhysRegModified(Reg)) {
MFI->reserveWWMRegister(Reg);
MF.begin()->addLiveIn(Reg);
diff --git a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
index 2f5a2bc31caf7..1332ef60e15d1 100644
--- a/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
+++ b/llvm/lib/Target/AMDGPU/SIISelLowering.cpp
@@ -16916,7 +16916,7 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
switch (BitWidth) {
case 16:
RC = Subtarget->useRealTrue16Insts() ? &AMDGPU::VGPR_16RegClass
- : &AMDGPU::VGPR_32RegClass;
+ : &AMDGPU::VGPR_32_Lo256RegClass;
break;
default:
RC = TRI->getVGPRClassForBitWidth(BitWidth);
@@ -16963,7 +16963,7 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
auto [Kind, Idx, NumRegs] = AMDGPU::parseAsmConstraintPhysReg(Constraint);
if (Kind != '\0') {
if (Kind == 'v') {
- RC = &AMDGPU::VGPR_32RegClass;
+ RC = &AMDGPU::VGPR_32_Lo256RegClass;
} else if (Kind == 's') {
RC = &AMDGPU::SGPR_32RegClass;
} else if (Kind == 'a') {
@@ -17005,6 +17005,7 @@ SITargetLowering::getRegForInlineAsmConstraint(const TargetRegisterInfo *TRI_,
return std::pair(0U, nullptr);
if (Idx < RC->getNumRegs())
return std::pair(RC->getRegister(Idx), RC);
+ return std::pair(0U, nullptr);
}
}
diff --git a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
index e3a2efdd3856f..b163a274396ff 100644
--- a/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
+++ b/llvm/lib/Target/AMDGPU/SIInsertWaitcnts.cpp
@@ -152,7 +152,7 @@ static constexpr StringLiteral WaitEventTypeName[] = {
// We reserve a fixed number of VGPR slots in the scoring tables for
// special tokens like SCMEM_LDS (needed for buffer load to LDS).
enum RegisterMapping {
- SQ_MAX_PGM_VGPRS = 1024, // Maximum programmable VGPRs across all targets.
+ SQ_MAX_PGM_VGPRS = 2048, // Maximum programmable VGPRs across all targets.
AGPR_OFFSET = 512, // Maximum programmable ArchVGPRs across all targets.
SQ_MAX_PGM_SGPRS = 128, // Maximum programmable SGPRs across all targets.
// Artificial register slots to track LDS writes into specific LDS locations
@@ -831,7 +831,6 @@ RegInterval WaitcntBrackets::getRegInterval(const MachineInstr *MI,
MCRegister MCReg = AMDGPU::getMCReg(Op.getReg(), *Context->ST);
unsigned RegIdx = TRI->getHWRegIndex(MCReg);
- assert(isUInt<8>(RegIdx));
const TargetRegisterClass *RC = TRI->getPhysRegBaseClass(Op.getReg());
unsigned Size = TRI->getRegSizeInBits(*RC);
@@ -839,7 +838,7 @@ RegInterval WaitcntBrackets::getRegInterval(const MachineInstr *MI,
// AGPRs/VGPRs are tracked every 16 bits, SGPRs by 32 bits
if (TRI->isVectorRegister(*MRI, Op.getReg())) {
unsigned Reg = RegIdx << 1 | (AMDGPU::isHi16Reg(MCReg, *TRI) ? 1 : 0);
- assert(Reg < AGPR_OFFSET);
+ assert(!Context->ST->hasMAIInsts() || Reg < AGPR_OFFSET);
Result.first = Reg;
if (TRI->isAGPR(*MRI, Op.getReg()))
Result.first += AGPR_OFFSET;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
index d86f9a016d743..a1fcf26eab27b 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.cpp
@@ -3273,6 +3273,10 @@ StringRef SIRegisterInfo::getRegAsmName(MCRegister Reg) const {
return AMDGPUInstPrinter::getRegisterName(Reg);
}
+unsigned SIRegisterInfo::getHWRegIndex(MCRegister Reg) const {
+ return getEncodingValue(Reg) & AMDGPU::HWEncoding::REG_IDX_MASK;
+}
+
unsigned AMDGPU::getRegBitWidth(const TargetRegisterClass &RC) {
return getRegBitWidth(RC.getID());
}
@@ -3353,6 +3357,40 @@ SIRegisterInfo::getVGPRClassForBitWidth(unsigned BitWidth) const {
: getAnyVGPRClassForBitWidth(BitWidth);
}
+const TargetRegisterClass *
+SIRegisterInfo::getAlignedLo256VGPRClassForBitWidth(unsigned BitWidth) const {
+ if (BitWidth <= 32)
+ return &AMDGPU::VGPR_32_Lo256RegClass;
+ if (BitWidth <= 64)
+ return &AMDGPU::VReg_64_Lo256_Align2RegClass;
+ if (BitWidth <= 96)
+ return &AMDGPU::VReg_96_Lo256_Align2RegClass;
+ if (BitWidth <= 128)
+ return &AMDGPU::VReg_128_Lo256_Align2RegClass;
+ if (BitWidth <= 160)
+ return &AMDGPU::VReg_160_Lo256_Align2RegClass;
+ if (BitWidth <= 192)
+ return &AMDGPU::VReg_192_Lo256_Align2RegClass;
+ if (BitWidth <= 224)
+ return &AMDGPU::VReg_224_Lo256_Align2RegClass;
+ if (BitWidth <= 256)
+ return &AMDGPU::VReg_256_Lo256_Align2RegClass;
+ if (BitWidth <= 288)
+ return &AMDGPU::VReg_288_Lo256_Align2RegClass;
+ if (BitWidth <= 320)
+ return &AMDGPU::VReg_320_Lo256_Align2RegClass;
+ if (BitWidth <= 352)
+ return &AMDGPU::VReg_352_Lo256_Align2RegClass;
+ if (BitWidth <= 384)
+ return &AMDGPU::VReg_384_Lo256_Align2RegClass;
+ if (BitWidth <= 512)
+ return &AMDGPU::VReg_512_Lo256_Align2RegClass;
+ if (BitWidth <= 1024)
+ return &AMDGPU::VReg_1024_Lo256_Align2RegClass;
+
+ return nullptr;
+}
+
static const TargetRegisterClass *
getAnyAGPRClassForBitWidth(unsigned BitWidth) {
if (BitWidth == 64)
@@ -3547,7 +3585,17 @@ bool SIRegisterInfo::isSGPRReg(const MachineRegisterInfo &MRI,
const TargetRegisterClass *
SIRegisterInfo::getEquivalentVGPRClass(const TargetRegisterClass *SRC) const {
unsigned Size = getRegSizeInBits(*SRC);
- const TargetRegisterClass *VRC = getVGPRClassForBitWidth(Size);
+
+ switch (SRC->getID()) {
+ default:
+ break;
+ case AMDGPU::VS_32_Lo256RegClassID:
+ case AMDGPU::VS_64_Lo256RegClassID:
+ return getAllocatableClass(getAlignedLo256VGPRClassForBitWidth(Size));
+ }
+
+ const TargetRegisterClass *VRC =
+ getAllocatableClass(getVGPRClassForBitWidth(Size));
assert(VRC && "Invalid register class size");
return VRC;
}
@@ -4005,7 +4053,12 @@ SIRegisterInfo::getSubRegAlignmentNumBits(const TargetRegisterClass *RC,
unsigned SIRegisterInfo::getNumUsedPhysRegs(const MachineRegisterInfo &MRI,
const TargetRegisterClass &RC,
bool IncludeCalls) const {
- for (MCPhysReg Reg : reverse(RC.getRegisters()))
+ unsigned NumArchVGPRs = ST.has1024AddressableVGPRs() ? 1024 : 256;
+ ArrayRef<MCPhysReg> Registers =
+ (RC.getID() == AMDGPU::VGPR_32RegClassID)
+ ? RC.getRegisters().take_front(NumArchVGPRs)
+ : RC.getRegisters();
+ for (MCPhysReg Reg : reverse(Registers))
if (MRI.isPhysRegUsed(Reg, /*SkipRegMaskTest=*/!IncludeCalls))
return getHWRegIndex(Reg) + 1;
return 0;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
index 5508f07b1b5ff..eeefef1116aa3 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.h
@@ -200,13 +200,14 @@ class SIRegisterInfo final : public AMDGPUGenRegisterInfo {
StringRef getRegAsmName(MCRegister Reg) const override;
// Pseudo regs are not allowed
- unsigned getHWRegIndex(MCRegister Reg) const {
- return getEncodingValue(Reg) & 0xff;
- }
+ unsigned getHWRegIndex(MCRegister Reg) const;
LLVM_READONLY
const TargetRegisterClass *getVGPRClassForBitWidth(unsigned BitWidth) const;
+ LLVM_READONLY const TargetRegisterClass *
+ getAlignedLo256VGPRClassForBitWidth(unsigned BitWidth) const;
+
LLVM_READONLY
const TargetRegisterClass *getAGPRClassForBitWidth(unsigned BitWidth) const;
diff --git a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
index 9d5b3560074ac..1aa3f5d87f729 100644
--- a/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
+++ b/llvm/lib/Target/AMDGPU/SIRegisterInfo.td
@@ -76,17 +76,17 @@ class SIRegisterTuples<list<SubRegIndex> Indices, RegisterClass RC,
//===----------------------------------------------------------------------===//
// Declarations that describe the SI registers
//===----------------------------------------------------------------------===//
-class SIReg <string n, bits<8> regIdx = 0, bit isVGPR = 0,
+class SIReg <string n, bits<10> regIdx = 0, bit isVGPR = 0,
bit isAGPR = 0, bit isHi16 = 0> : Register<n> {
let Namespace = "AMDGPU";
// These are generic helper values we use to form actual register
// codes. They should not be assumed to match any particular register
// encodings on any particular subtargets.
- let HWEncoding{7-0} = regIdx;
- let HWEncoding{8} = isVGPR;
- let HWEncoding{9} = isAGPR;
- let HWEncoding{10} = isHi16;
+ let HWEncoding{9-0} = regIdx;
+ let HWEncoding{10} = isVGPR;
+ let HWEncoding{11} = isAGPR;
+ let HWEncoding{12} = isHi16;
int Index = !cast<int>(regIdx);
}
@@ -128,7 +128,7 @@ class SIRegisterClass <string n, list<ValueType> rTypes, int Align, dag rList>
}
-multiclass SIRegLoHi16 <string n, bits<8> regIdx, bit ArtificialHigh = 1,
+multiclass SIRegLoHi16 <string n, bits<10> regIdx, bit ArtificialHigh = 1,
bit isVGPR = 0, bit isAGPR = 0,
list<int> DwarfEncodings = [-1, -1]> {
def _LO16 : SIReg<n#".l", regIdx, isVGPR, isAGPR>;
@@ -142,9 +142,10 @@ multiclass SIRegLoHi16 <string n, bits<8> regIdx, bit ArtificialHigh = 1,
let Namespace = "AMDGPU";
let SubRegIndices = [lo16, hi16];
let CoveredBySubRegs = !not(ArtificialHigh);
- let HWEncoding{7-0} = regIdx;
- let HWEncoding{8} = isVGPR;
- let HWEncoding{9} = isAGPR;
+
+ let HWEncoding{9-0} = regIdx;
+ let HWEncoding{10} = isVGPR;
+ let HWEncoding{11} = isAGPR;
int Index = !cast<int>(regIdx);
}
@@ -225,7 +226,7 @@ def SGPR_NULL64 :
// the high 32 bits. The lower 32 bits are always zero (for base) or
// -1 (for limit). Since we cannot access the high 32 bits, when we
// need them, we need to do a 64 bit load and extract the bits manually.
-multiclass ApertureRegister<string name, bits<8> regIdx> {
+multiclass ApertureRegister<string name, bits<10> regIdx> {
let isConstant = true in {
// FIXME: We shouldn't need to define subregisters for these (nor add them to any 16 bit
// register classes), but if we don't it seems to confuse the TableGen
@@ -313,7 +314,7 @@ foreach Index = 0...15 in {
defm TTMP#Index : SIRegLoHi16<"ttmp"#Index, 0>;
}
-multiclass FLAT_SCR_LOHI_m <string n, bits<8> ci_e, bits<8> vi_e> {
+multiclass FLAT_SCR_LOHI_m <string n, bits<10> ci_e, bits<10> vi_e> {
defm _ci : SIRegLoHi16<n, ci_e>;
defm _vi : SIRegLoHi16<n, vi_e>;
defm "" : SIRegLoHi16<n, 0>;
@@ -343,11 +344,12 @@ foreach Index = 0...105 in {
}
// VGPR registers
-foreach Index = 0...255 in {
+foreach Index = 0...1023 in {
defm VGPR#Index :
SIRegLoHi16 <"v"#Index, Index, /*ArtificialHigh=*/ 0,
/*isVGPR=*/ 1, /*isAGPR=*/ 0, /*DwarfEncodings=*/
- [!add(Index, 2560), !add(Index, 1536)]>;
+ [!if(!le(Index, 511), !add(Index, 2560), -1),
+ !if(!le(Index, 511), !add(Index, 1536), !add(Index, !sub(3584, 512)))]>;
}
// AccVGPR registers
@@ -604,15 +606,15 @@ def Reg512Types : Re...
[truncated]
|
This is a baseline support, it is not useable yet.
1cf9107
to
b880853
Compare
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/200/builds/16427 Here is the relevant piece of the build log for the reference
|
Clean and reconfigure. It is solved by GLIBCXX_USE_CXX11_ABI=ON in the cmake. All clean builds are fast. |
* main: (1483 commits) [clang] fix error recovery for invalid nested name specifiers (llvm#156772) Revert "[lldb] Add count for errors of DWO files in statistics and combine DWO file count functions" (llvm#156777) AMDGPU: Add agpr variants of multi-data DS instructions (llvm#156420) [libc][NFC] disable localtime on aarch64/baremetal (llvm#156776) [win/asan] Improve SharedReAlloc with HEAP_REALLOC_IN_PLACE_ONLY. (llvm#132558) [LLDB] Make internal shell the default for running LLDB lit tests. (llvm#156729) [lldb][debugserver] Max response size for qSpeedTest (llvm#156099) [AMDGPU] Define 1024 VGPRs on gfx1250 (llvm#156765) [flang] Check for BIND(C) name conflicts with alternate entries (llvm#156563) [RISCV] Add exhausted_gprs_fprs test to calling-conv-half.ll. NFC (llvm#156586) [NFC] Remove trailing whitespaces from `clang/include/clang/Basic/AttrDocs.td` [lldb] Mark scripted frames as synthetic instead of artificial (llvm#153117) [docs] Refine some of the wording in the quality developer policy (llvm#156555) [MLIR] Apply clang-tidy fixes for readability-identifier-naming in TransformOps.cpp (NFC) [MLIR] Add LDBG() tracing to VectorTransferOpTransforms.cpp (NFC) [NFC] Apply clang-format to PPCInstrFutureMMA.td (llvm#156749) [libc] implement template functions for localtime (llvm#110363) [llvm-objcopy][COFF] Update .symidx values after stripping (llvm#153322) Add documentation on debugging LLVM. [lldb] Add count for errors of DWO files in statistics and combine DWO file count functions (llvm#155023) ...
This is a baseline support, it is not useable yet.