-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Arm64EC entry/exit thunks, consolidated. #79067
Conversation
@llvm/pr-subscribers-backend-aarch64 @llvm/pr-subscribers-clang-codegen Author: Eli Friedman (efriedma-quic) ChangesThis combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .) Patch is 130.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79067.diff 31 Files Affected:
diff --git a/clang/lib/CodeGen/CGCXX.cpp b/clang/lib/CodeGen/CGCXX.cpp
index 110e21f7cb6d19f..e95a735f92f74b5 100644
--- a/clang/lib/CodeGen/CGCXX.cpp
+++ b/clang/lib/CodeGen/CGCXX.cpp
@@ -40,6 +40,11 @@ bool CodeGenModule::TryEmitBaseDestructorAsAlias(const CXXDestructorDecl *D) {
if (getCodeGenOpts().OptimizationLevel == 0)
return true;
+ // Disable this optimization for ARM64EC. FIXME: This probably should work,
+ // but getting the symbol table correct is complicated.
+ if (getTarget().getTriple().isWindowsArm64EC())
+ return true;
+
// If sanitizing memory to check for use-after-dtor, do not emit as
// an alias, unless this class owns no members.
if (getCodeGenOpts().SanitizeMemoryUseAfterDtor &&
diff --git a/llvm/include/llvm/IR/CallingConv.h b/llvm/include/llvm/IR/CallingConv.h
index 3a522c239ad59eb..bca31b2572eb4b2 100644
--- a/llvm/include/llvm/IR/CallingConv.h
+++ b/llvm/include/llvm/IR/CallingConv.h
@@ -251,6 +251,16 @@ namespace CallingConv {
/// Used by GraalVM. Two additional registers are reserved.
GRAAL = 107,
+ /// Calling convention used in the ARM64EC ABI to implement calls between
+ /// x64 code and thunks. This is basically the x64 calling convention using
+ /// ARM64 register names. The first parameter is mapped to x9.
+ ARM64EC_Thunk_X64 = 108,
+
+ /// Calling convention used in the ARM64EC ABI to implement calls between
+ /// ARM64 code and thunks. This is just the ARM64 calling convention,
+ /// except that the first parameter is mapped to x9.
+ ARM64EC_Thunk_Native = 109,
+
/// The highest possible ID. Must be some 2^k - 1.
MaxID = 1023
};
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 7df1c82bf357f60..29da2b1c29f837e 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -2892,6 +2892,39 @@ bool AsmPrinter::emitSpecialLLVMGlobal(const GlobalVariable *GV) {
GV->hasAvailableExternallyLinkage())
return true;
+ if (GV->getName() == "llvm.arm64ec.symbolmap") {
+ // For ARM64EC, print the table that maps between symbols and the
+ // corresponding thunks to translate between x64 and AArch64 code.
+ // This table is generated by AArch64Arm64ECCallLowering.
+ OutStreamer->switchSection(OutContext.getCOFFSection(
+ ".hybmp$x", COFF::IMAGE_SCN_LNK_INFO, SectionKind::getMetadata()));
+ auto *Arr = cast<ConstantArray>(GV->getInitializer());
+ for (auto &U : Arr->operands()) {
+ auto *C = cast<Constant>(U);
+ auto *Src = cast<Function>(C->getOperand(0)->stripPointerCasts());
+ auto *Dst = cast<Function>(C->getOperand(1)->stripPointerCasts());
+ int Kind = cast<ConstantInt>(C->getOperand(2))->getZExtValue();
+
+ if (Src->hasDLLImportStorageClass()) {
+ // For now, we assume dllimport functions aren't directly called.
+ // (We might change this later to match MSVC.)
+ OutStreamer->emitCOFFSymbolIndex(
+ OutContext.getOrCreateSymbol("__imp_" + Src->getName()));
+ OutStreamer->emitCOFFSymbolIndex(getSymbol(Dst));
+ OutStreamer->emitInt32(Kind);
+ } else {
+ // FIXME: For non-dllimport functions, MSVC emits the same entry
+ // twice, for reasons I don't understand. I have to assume the linker
+ // ignores the redundant entry; there aren't any reasonable semantics
+ // to attach to it.
+ OutStreamer->emitCOFFSymbolIndex(getSymbol(Src));
+ OutStreamer->emitCOFFSymbolIndex(getSymbol(Dst));
+ OutStreamer->emitInt32(Kind);
+ }
+ }
+ return true;
+ }
+
if (!GV->hasAppendingLinkage()) return false;
assert(GV->hasInitializer() && "Not a special LLVM global!");
diff --git a/llvm/lib/Target/AArch64/AArch64.h b/llvm/lib/Target/AArch64/AArch64.h
index d20ef63a72e8f62..f7d81f42ef5d8ee 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -71,6 +71,7 @@ FunctionPass *createAArch64PostSelectOptimize();
FunctionPass *createAArch64StackTaggingPass(bool IsOptNone);
FunctionPass *createAArch64StackTaggingPreRAPass();
ModulePass *createAArch64GlobalsTaggingPass();
+ModulePass *createAArch64Arm64ECCallLoweringPass();
void initializeAArch64A53Fix835769Pass(PassRegistry&);
void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);
@@ -109,6 +110,7 @@ void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);
void initializeLDTLSCleanupPass(PassRegistry&);
void initializeSMEABIPass(PassRegistry &);
void initializeSVEIntrinsicOptsPass(PassRegistry &);
+void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
} // end namespace llvm
#endif
diff --git a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
new file mode 100644
index 000000000000000..11248bb7aef31f2
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
@@ -0,0 +1,769 @@
+//===-- AArch64Arm64ECCallLowering.cpp - Lower Arm64EC calls ----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains the IR transform to lower external or indirect calls for
+/// the ARM64EC calling convention. Such calls must go through the runtime, so
+/// we can translate the calling convention for calls into the emulator.
+///
+/// This subsumes Control Flow Guard handling.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AArch64.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/IR/CallingConv.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/TargetParser/Triple.h"
+
+using namespace llvm;
+
+using OperandBundleDef = OperandBundleDefT<Value *>;
+
+#define DEBUG_TYPE "arm64eccalllowering"
+
+STATISTIC(Arm64ECCallsLowered, "Number of Arm64EC calls lowered");
+
+static cl::opt<bool> LowerDirectToIndirect("arm64ec-lower-direct-to-indirect",
+ cl::Hidden, cl::init(true));
+static cl::opt<bool> GenerateThunks("arm64ec-generate-thunks", cl::Hidden,
+ cl::init(true));
+
+namespace {
+
+class AArch64Arm64ECCallLowering : public ModulePass {
+public:
+ static char ID;
+ AArch64Arm64ECCallLowering() : ModulePass(ID) {
+ initializeAArch64Arm64ECCallLoweringPass(*PassRegistry::getPassRegistry());
+ }
+
+ Function *buildExitThunk(FunctionType *FnTy, AttributeList Attrs);
+ Function *buildEntryThunk(Function *F);
+ void lowerCall(CallBase *CB);
+ Function *buildGuestExitThunk(Function *F);
+ bool processFunction(Function &F, SetVector<Function *> &DirectCalledFns);
+ bool runOnModule(Module &M) override;
+
+private:
+ int cfguard_module_flag = 0;
+ FunctionType *GuardFnType = nullptr;
+ PointerType *GuardFnPtrType = nullptr;
+ Constant *GuardFnCFGlobal = nullptr;
+ Constant *GuardFnGlobal = nullptr;
+ Module *M = nullptr;
+
+ Type *PtrTy;
+ Type *I64Ty;
+ Type *VoidTy;
+
+ void getThunkType(FunctionType *FT, AttributeList AttrList, bool EntryThunk,
+ raw_ostream &Out, FunctionType *&Arm64Ty,
+ FunctionType *&X64Ty);
+ void getThunkRetType(FunctionType *FT, AttributeList AttrList,
+ raw_ostream &Out, Type *&Arm64RetTy, Type *&X64RetTy,
+ SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool &HasSretPtr);
+ void getThunkArgTypes(FunctionType *FT, AttributeList AttrList,
+ raw_ostream &Out,
+ SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool HasSretPtr);
+ void canonicalizeThunkType(Type *T, Align Alignment, bool Ret,
+ uint64_t ArgSizeBytes, raw_ostream &Out,
+ Type *&Arm64Ty, Type *&X64Ty);
+};
+
+} // end anonymous namespace
+
+void AArch64Arm64ECCallLowering::getThunkType(FunctionType *FT,
+ AttributeList AttrList,
+ bool EntryThunk, raw_ostream &Out,
+ FunctionType *&Arm64Ty,
+ FunctionType *&X64Ty) {
+ Out << (EntryThunk ? "$ientry_thunk$cdecl$" : "$iexit_thunk$cdecl$");
+
+ Type *Arm64RetTy;
+ Type *X64RetTy;
+
+ SmallVector<Type *> Arm64ArgTypes;
+ SmallVector<Type *> X64ArgTypes;
+
+ // The first argument to a thunk is the called function, stored in x9.
+ // For exit thunks, we pass the called function down to the emulator;
+ // for entry thunks, we just call the Arm64 function directly.
+ if (!EntryThunk)
+ Arm64ArgTypes.push_back(PtrTy);
+ X64ArgTypes.push_back(PtrTy);
+
+ bool HasSretPtr = false;
+ getThunkRetType(FT, AttrList, Out, Arm64RetTy, X64RetTy, Arm64ArgTypes,
+ X64ArgTypes, HasSretPtr);
+
+ getThunkArgTypes(FT, AttrList, Out, Arm64ArgTypes, X64ArgTypes, HasSretPtr);
+
+ Arm64Ty = FunctionType::get(Arm64RetTy, Arm64ArgTypes, false);
+ X64Ty = FunctionType::get(X64RetTy, X64ArgTypes, false);
+}
+
+void AArch64Arm64ECCallLowering::getThunkArgTypes(
+ FunctionType *FT, AttributeList AttrList, raw_ostream &Out,
+ SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool HasSretPtr) {
+
+ Out << "$";
+ if (FT->isVarArg()) {
+ // We treat the variadic function's thunk as a normal function
+ // with the following type on the ARM side:
+ // rettype exitthunk(
+ // ptr x9, ptr x0, i64 x1, i64 x2, i64 x3, ptr x4, i64 x5)
+ //
+ // that can coverage all types of variadic function.
+ // x9 is similar to normal exit thunk, store the called function.
+ // x0-x3 is the arguments be stored in registers.
+ // x4 is the address of the arguments on the stack.
+ // x5 is the size of the arguments on the stack.
+ //
+ // On the x64 side, it's the same except that x5 isn't set.
+ //
+ // If both the ARM and X64 sides are sret, there are only three
+ // arguments in registers.
+ //
+ // If the X64 side is sret, but the ARM side isn't, we pass an extra value
+ // to/from the X64 side, and let SelectionDAG transform it into a memory
+ // location.
+ Out << "varargs";
+
+ // x0-x3
+ for (int i = HasSretPtr ? 1 : 0; i < 4; i++) {
+ Arm64ArgTypes.push_back(I64Ty);
+ X64ArgTypes.push_back(I64Ty);
+ }
+
+ // x4
+ Arm64ArgTypes.push_back(PtrTy);
+ X64ArgTypes.push_back(PtrTy);
+ // x5
+ Arm64ArgTypes.push_back(I64Ty);
+ // FIXME: x5 isn't actually passed/used by the x64 side; revisit once we
+ // have proper isel for varargs
+ X64ArgTypes.push_back(I64Ty);
+ return;
+ }
+
+ unsigned I = 0;
+ if (HasSretPtr)
+ I++;
+
+ if (I == FT->getNumParams()) {
+ Out << "v";
+ return;
+ }
+
+ for (unsigned E = FT->getNumParams(); I != E; ++I) {
+ Align ParamAlign = AttrList.getParamAlignment(I).valueOrOne();
+#if 0
+ // FIXME: Need more information about argument size; see
+ // https://reviews.llvm.org/D132926
+ uint64_t ArgSizeBytes = AttrList.getParamArm64ECArgSizeBytes(I);
+#else
+ uint64_t ArgSizeBytes = 0;
+#endif
+ Type *Arm64Ty, *X64Ty;
+ canonicalizeThunkType(FT->getParamType(I), ParamAlign,
+ /*Ret*/ false, ArgSizeBytes, Out, Arm64Ty, X64Ty);
+ Arm64ArgTypes.push_back(Arm64Ty);
+ X64ArgTypes.push_back(X64Ty);
+ }
+}
+
+void AArch64Arm64ECCallLowering::getThunkRetType(
+ FunctionType *FT, AttributeList AttrList, raw_ostream &Out,
+ Type *&Arm64RetTy, Type *&X64RetTy, SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool &HasSretPtr) {
+ Type *T = FT->getReturnType();
+#if 0
+ // FIXME: Need more information about argument size; see
+ // https://reviews.llvm.org/D132926
+ uint64_t ArgSizeBytes = AttrList.getRetArm64ECArgSizeBytes();
+#else
+ int64_t ArgSizeBytes = 0;
+#endif
+ if (T->isVoidTy()) {
+ if (FT->getNumParams()) {
+ auto SRetAttr = AttrList.getParamAttr(0, Attribute::StructRet);
+ auto InRegAttr = AttrList.getParamAttr(0, Attribute::InReg);
+ if (SRetAttr.isValid() && InRegAttr.isValid()) {
+ // sret+inreg indicates a call that returns a C++ class value. This is
+ // actually equivalent to just passing and returning a void* pointer
+ // as the first argument. Translate it that way, instead of trying
+ // to model "inreg" in the thunk's calling convention, to simplify
+ // the rest of the code.
+ Out << "i8";
+ Arm64RetTy = I64Ty;
+ X64RetTy = I64Ty;
+ return;
+ }
+ if (SRetAttr.isValid()) {
+ // FIXME: Sanity-check the sret type; if it's an integer or pointer,
+ // we'll get screwy mangling/codegen.
+ // FIXME: For large struct types, mangle as an integer argument and
+ // integer return, so we can reuse more thunks, instead of "m" syntax.
+ // (MSVC mangles this case as an integer return with no argument, but
+ // that's a miscompile.)
+ Type *SRetType = SRetAttr.getValueAsType();
+ Align SRetAlign = AttrList.getParamAlignment(0).valueOrOne();
+ Type *Arm64Ty, *X64Ty;
+ canonicalizeThunkType(SRetType, SRetAlign, /*Ret*/ true, ArgSizeBytes,
+ Out, Arm64Ty, X64Ty);
+ Arm64RetTy = VoidTy;
+ X64RetTy = VoidTy;
+ Arm64ArgTypes.push_back(FT->getParamType(0));
+ X64ArgTypes.push_back(FT->getParamType(0));
+ HasSretPtr = true;
+ return;
+ }
+ }
+
+ Out << "v";
+ Arm64RetTy = VoidTy;
+ X64RetTy = VoidTy;
+ return;
+ }
+
+ canonicalizeThunkType(T, Align(), /*Ret*/ true, ArgSizeBytes, Out, Arm64RetTy,
+ X64RetTy);
+ if (X64RetTy->isPointerTy()) {
+ // If the X64 type is canonicalized to a pointer, that means it's
+ // passed/returned indirectly. For a return value, that means it's an
+ // sret pointer.
+ X64ArgTypes.push_back(X64RetTy);
+ X64RetTy = VoidTy;
+ }
+}
+
+void AArch64Arm64ECCallLowering::canonicalizeThunkType(
+ Type *T, Align Alignment, bool Ret, uint64_t ArgSizeBytes, raw_ostream &Out,
+ Type *&Arm64Ty, Type *&X64Ty) {
+ if (T->isFloatTy()) {
+ Out << "f";
+ Arm64Ty = T;
+ X64Ty = T;
+ return;
+ }
+
+ if (T->isDoubleTy()) {
+ Out << "d";
+ Arm64Ty = T;
+ X64Ty = T;
+ return;
+ }
+
+ if (T->isFloatingPointTy()) {
+ report_fatal_error(
+ "Only 32 and 64 bit floating points are supported for ARM64EC thunks");
+ }
+
+ auto &DL = M->getDataLayout();
+
+ if (auto *StructTy = dyn_cast<StructType>(T))
+ if (StructTy->getNumElements() == 1)
+ T = StructTy->getElementType(0);
+
+ if (T->isArrayTy()) {
+ Type *ElementTy = T->getArrayElementType();
+ uint64_t ElementCnt = T->getArrayNumElements();
+ uint64_t ElementSizePerBytes = DL.getTypeSizeInBits(ElementTy) / 8;
+ uint64_t TotalSizeBytes = ElementCnt * ElementSizePerBytes;
+ if (ElementTy->isFloatTy() || ElementTy->isDoubleTy()) {
+ Out << (ElementTy->isFloatTy() ? "F" : "D") << TotalSizeBytes;
+ if (Alignment.value() >= 8 && !T->isPointerTy())
+ Out << "a" << Alignment.value();
+ Arm64Ty = T;
+ if (TotalSizeBytes <= 8) {
+ // Arm64 returns small structs of float/double in float registers;
+ // X64 uses RAX.
+ X64Ty = llvm::Type::getIntNTy(M->getContext(), TotalSizeBytes * 8);
+ } else {
+ // Struct is passed directly on Arm64, but indirectly on X64.
+ X64Ty = PtrTy;
+ }
+ return;
+ } else if (T->isFloatingPointTy()) {
+ report_fatal_error("Only 32 and 64 bit floating points are supported for "
+ "ARM64EC thunks");
+ }
+ }
+
+ if ((T->isIntegerTy() || T->isPointerTy()) && DL.getTypeSizeInBits(T) <= 64) {
+ Out << "i8";
+ Arm64Ty = I64Ty;
+ X64Ty = I64Ty;
+ return;
+ }
+
+ unsigned TypeSize = ArgSizeBytes;
+ if (TypeSize == 0)
+ TypeSize = DL.getTypeSizeInBits(T) / 8;
+ Out << "m";
+ if (TypeSize != 4)
+ Out << TypeSize;
+ if (Alignment.value() >= 8 && !T->isPointerTy())
+ Out << "a" << Alignment.value();
+ // FIXME: Try to canonicalize Arm64Ty more thoroughly?
+ Arm64Ty = T;
+ if (TypeSize == 1 || TypeSize == 2 || TypeSize == 4 || TypeSize == 8) {
+ // Pass directly in an integer register
+ X64Ty = llvm::Type::getIntNTy(M->getContext(), TypeSize * 8);
+ } else {
+ // Passed directly on Arm64, but indirectly on X64.
+ X64Ty = PtrTy;
+ }
+}
+
+// This function builds the "exit thunk", a function which translates
+// arguments and return values when calling x64 code from AArch64 code.
+Function *AArch64Arm64ECCallLowering::buildExitThunk(FunctionType *FT,
+ AttributeList Attrs) {
+ SmallString<256> ExitThunkName;
+ llvm::raw_svector_ostream ExitThunkStream(ExitThunkName);
+ FunctionType *Arm64Ty, *X64Ty;
+ getThunkType(FT, Attrs, /*EntryThunk*/ false, ExitThunkStream, Arm64Ty,
+ X64Ty);
+ if (Function *F = M->getFunction(ExitThunkName))
+ return F;
+
+ Function *F = Function::Create(Arm64Ty, GlobalValue::LinkOnceODRLinkage, 0,
+ ExitThunkName, M);
+ F->setCallingConv(CallingConv::ARM64EC_Thunk_Native);
+ F->setSection(".wowthk$aa");
+ F->setComdat(M->getOrInsertComdat(ExitThunkName));
+ // Copy MSVC, and always set up a frame pointer. (Maybe this isn't necessary.)
+ F->addFnAttr("frame-pointer", "all");
+ // Only copy sret from the first argument. For C++ instance methods, clang can
+ // stick an sret marking on a later argument, but it doesn't actually affect
+ // the ABI, so we can omit it. This avoids triggering a verifier assertion.
+ if (FT->getNumParams()) {
+ auto SRet = Attrs.getParamAttr(0, Attribute::StructRet);
+ auto InReg = Attrs.getParamAttr(0, Attribute::InReg);
+ if (SRet.isValid() && !InReg.isValid())
+ F->addParamAttr(1, SRet);
+ }
+ // FIXME: Copy anything other than sret? Shouldn't be necessary for normal
+ // C ABI, but might show up in other cases.
+ BasicBlock *BB = BasicBlock::Create(M->getContext(), "", F);
+ IRBuilder<> IRB(BB);
+ Value *CalleePtr =
+ M->getOrInsertGlobal("__os_arm64x_dispatch_call_no_redirect", PtrTy);
+ Value *Callee = IRB.CreateLoad(PtrTy, CalleePtr);
+ auto &DL = M->getDataLayout();
+ SmallVector<Value *> Args;
+
+ // Pass the called function in x9.
+ Args.push_back(F->arg_begin());
+
+ Type *RetTy = Arm64Ty->getReturnType();
+ if (RetTy != X64Ty->getReturnType()) {
+ // If the return type is an array or struct, translate it. Values of size
+ // 8 or less go into RAX; bigger values go into memory, and we pass a
+ // pointer.
+ if (DL.getTypeStoreSize(RetTy) > 8) {
+ Args.push_back(IRB.CreateAlloca(RetTy));
+ }
+ }
+
+ for (auto &Arg : make_range(F->arg_begin() + 1, F->arg_end())) {
+ // Translate arguments from AArch64 calling convention to x86 calling
+ // convention.
+ //
+ // For simple types, we don't need to do any translation: they're
+ // represented the same way. (Implicit sign extension is not part of
+ // either convention.)
+ //
+ // The big thing we have to worry about is struct types... but
+ // fortunately AArch64 clang is pretty friendly here: the cases that need
+ // translation are always passed as a struct or array. (If we run into
+ // some cases where this doesn't work, we can teach clang to mark it up
+ // with an attribute.)
+ //
+ // The first argument is the called function, stored in x9.
+ if (Arg.getType()->isArrayTy() || Arg.getType()->isStructTy() ||
+ DL.getTypeStoreSize(Arg.getType()) > 8) {
+ Value *Mem = IRB.CreateAlloca(Arg.getType());
+ IRB.CreateStore(&Arg, Mem);
+ if (DL....
[truncated]
|
@llvm/pr-subscribers-clang Author: Eli Friedman (efriedma-quic) ChangesThis combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .) Patch is 130.00 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/79067.diff 31 Files Affected:
diff --git a/clang/lib/CodeGen/CGCXX.cpp b/clang/lib/CodeGen/CGCXX.cpp
index 110e21f7cb6d19f..e95a735f92f74b5 100644
--- a/clang/lib/CodeGen/CGCXX.cpp
+++ b/clang/lib/CodeGen/CGCXX.cpp
@@ -40,6 +40,11 @@ bool CodeGenModule::TryEmitBaseDestructorAsAlias(const CXXDestructorDecl *D) {
if (getCodeGenOpts().OptimizationLevel == 0)
return true;
+ // Disable this optimization for ARM64EC. FIXME: This probably should work,
+ // but getting the symbol table correct is complicated.
+ if (getTarget().getTriple().isWindowsArm64EC())
+ return true;
+
// If sanitizing memory to check for use-after-dtor, do not emit as
// an alias, unless this class owns no members.
if (getCodeGenOpts().SanitizeMemoryUseAfterDtor &&
diff --git a/llvm/include/llvm/IR/CallingConv.h b/llvm/include/llvm/IR/CallingConv.h
index 3a522c239ad59eb..bca31b2572eb4b2 100644
--- a/llvm/include/llvm/IR/CallingConv.h
+++ b/llvm/include/llvm/IR/CallingConv.h
@@ -251,6 +251,16 @@ namespace CallingConv {
/// Used by GraalVM. Two additional registers are reserved.
GRAAL = 107,
+ /// Calling convention used in the ARM64EC ABI to implement calls between
+ /// x64 code and thunks. This is basically the x64 calling convention using
+ /// ARM64 register names. The first parameter is mapped to x9.
+ ARM64EC_Thunk_X64 = 108,
+
+ /// Calling convention used in the ARM64EC ABI to implement calls between
+ /// ARM64 code and thunks. This is just the ARM64 calling convention,
+ /// except that the first parameter is mapped to x9.
+ ARM64EC_Thunk_Native = 109,
+
/// The highest possible ID. Must be some 2^k - 1.
MaxID = 1023
};
diff --git a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
index 7df1c82bf357f60..29da2b1c29f837e 100644
--- a/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
+++ b/llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp
@@ -2892,6 +2892,39 @@ bool AsmPrinter::emitSpecialLLVMGlobal(const GlobalVariable *GV) {
GV->hasAvailableExternallyLinkage())
return true;
+ if (GV->getName() == "llvm.arm64ec.symbolmap") {
+ // For ARM64EC, print the table that maps between symbols and the
+ // corresponding thunks to translate between x64 and AArch64 code.
+ // This table is generated by AArch64Arm64ECCallLowering.
+ OutStreamer->switchSection(OutContext.getCOFFSection(
+ ".hybmp$x", COFF::IMAGE_SCN_LNK_INFO, SectionKind::getMetadata()));
+ auto *Arr = cast<ConstantArray>(GV->getInitializer());
+ for (auto &U : Arr->operands()) {
+ auto *C = cast<Constant>(U);
+ auto *Src = cast<Function>(C->getOperand(0)->stripPointerCasts());
+ auto *Dst = cast<Function>(C->getOperand(1)->stripPointerCasts());
+ int Kind = cast<ConstantInt>(C->getOperand(2))->getZExtValue();
+
+ if (Src->hasDLLImportStorageClass()) {
+ // For now, we assume dllimport functions aren't directly called.
+ // (We might change this later to match MSVC.)
+ OutStreamer->emitCOFFSymbolIndex(
+ OutContext.getOrCreateSymbol("__imp_" + Src->getName()));
+ OutStreamer->emitCOFFSymbolIndex(getSymbol(Dst));
+ OutStreamer->emitInt32(Kind);
+ } else {
+ // FIXME: For non-dllimport functions, MSVC emits the same entry
+ // twice, for reasons I don't understand. I have to assume the linker
+ // ignores the redundant entry; there aren't any reasonable semantics
+ // to attach to it.
+ OutStreamer->emitCOFFSymbolIndex(getSymbol(Src));
+ OutStreamer->emitCOFFSymbolIndex(getSymbol(Dst));
+ OutStreamer->emitInt32(Kind);
+ }
+ }
+ return true;
+ }
+
if (!GV->hasAppendingLinkage()) return false;
assert(GV->hasInitializer() && "Not a special LLVM global!");
diff --git a/llvm/lib/Target/AArch64/AArch64.h b/llvm/lib/Target/AArch64/AArch64.h
index d20ef63a72e8f62..f7d81f42ef5d8ee 100644
--- a/llvm/lib/Target/AArch64/AArch64.h
+++ b/llvm/lib/Target/AArch64/AArch64.h
@@ -71,6 +71,7 @@ FunctionPass *createAArch64PostSelectOptimize();
FunctionPass *createAArch64StackTaggingPass(bool IsOptNone);
FunctionPass *createAArch64StackTaggingPreRAPass();
ModulePass *createAArch64GlobalsTaggingPass();
+ModulePass *createAArch64Arm64ECCallLoweringPass();
void initializeAArch64A53Fix835769Pass(PassRegistry&);
void initializeAArch64A57FPLoadBalancingPass(PassRegistry&);
@@ -109,6 +110,7 @@ void initializeFalkorMarkStridedAccessesLegacyPass(PassRegistry&);
void initializeLDTLSCleanupPass(PassRegistry&);
void initializeSMEABIPass(PassRegistry &);
void initializeSVEIntrinsicOptsPass(PassRegistry &);
+void initializeAArch64Arm64ECCallLoweringPass(PassRegistry &);
} // end namespace llvm
#endif
diff --git a/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
new file mode 100644
index 000000000000000..11248bb7aef31f2
--- /dev/null
+++ b/llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp
@@ -0,0 +1,769 @@
+//===-- AArch64Arm64ECCallLowering.cpp - Lower Arm64EC calls ----*- C++ -*-===//
+//
+// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
+// See https://llvm.org/LICENSE.txt for license information.
+// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
+//
+//===----------------------------------------------------------------------===//
+///
+/// \file
+/// This file contains the IR transform to lower external or indirect calls for
+/// the ARM64EC calling convention. Such calls must go through the runtime, so
+/// we can translate the calling convention for calls into the emulator.
+///
+/// This subsumes Control Flow Guard handling.
+///
+//===----------------------------------------------------------------------===//
+
+#include "AArch64.h"
+#include "llvm/ADT/SetVector.h"
+#include "llvm/ADT/SmallString.h"
+#include "llvm/ADT/SmallVector.h"
+#include "llvm/ADT/Statistic.h"
+#include "llvm/IR/CallingConv.h"
+#include "llvm/IR/IRBuilder.h"
+#include "llvm/IR/Instruction.h"
+#include "llvm/InitializePasses.h"
+#include "llvm/Pass.h"
+#include "llvm/Support/CommandLine.h"
+#include "llvm/TargetParser/Triple.h"
+
+using namespace llvm;
+
+using OperandBundleDef = OperandBundleDefT<Value *>;
+
+#define DEBUG_TYPE "arm64eccalllowering"
+
+STATISTIC(Arm64ECCallsLowered, "Number of Arm64EC calls lowered");
+
+static cl::opt<bool> LowerDirectToIndirect("arm64ec-lower-direct-to-indirect",
+ cl::Hidden, cl::init(true));
+static cl::opt<bool> GenerateThunks("arm64ec-generate-thunks", cl::Hidden,
+ cl::init(true));
+
+namespace {
+
+class AArch64Arm64ECCallLowering : public ModulePass {
+public:
+ static char ID;
+ AArch64Arm64ECCallLowering() : ModulePass(ID) {
+ initializeAArch64Arm64ECCallLoweringPass(*PassRegistry::getPassRegistry());
+ }
+
+ Function *buildExitThunk(FunctionType *FnTy, AttributeList Attrs);
+ Function *buildEntryThunk(Function *F);
+ void lowerCall(CallBase *CB);
+ Function *buildGuestExitThunk(Function *F);
+ bool processFunction(Function &F, SetVector<Function *> &DirectCalledFns);
+ bool runOnModule(Module &M) override;
+
+private:
+ int cfguard_module_flag = 0;
+ FunctionType *GuardFnType = nullptr;
+ PointerType *GuardFnPtrType = nullptr;
+ Constant *GuardFnCFGlobal = nullptr;
+ Constant *GuardFnGlobal = nullptr;
+ Module *M = nullptr;
+
+ Type *PtrTy;
+ Type *I64Ty;
+ Type *VoidTy;
+
+ void getThunkType(FunctionType *FT, AttributeList AttrList, bool EntryThunk,
+ raw_ostream &Out, FunctionType *&Arm64Ty,
+ FunctionType *&X64Ty);
+ void getThunkRetType(FunctionType *FT, AttributeList AttrList,
+ raw_ostream &Out, Type *&Arm64RetTy, Type *&X64RetTy,
+ SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool &HasSretPtr);
+ void getThunkArgTypes(FunctionType *FT, AttributeList AttrList,
+ raw_ostream &Out,
+ SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool HasSretPtr);
+ void canonicalizeThunkType(Type *T, Align Alignment, bool Ret,
+ uint64_t ArgSizeBytes, raw_ostream &Out,
+ Type *&Arm64Ty, Type *&X64Ty);
+};
+
+} // end anonymous namespace
+
+void AArch64Arm64ECCallLowering::getThunkType(FunctionType *FT,
+ AttributeList AttrList,
+ bool EntryThunk, raw_ostream &Out,
+ FunctionType *&Arm64Ty,
+ FunctionType *&X64Ty) {
+ Out << (EntryThunk ? "$ientry_thunk$cdecl$" : "$iexit_thunk$cdecl$");
+
+ Type *Arm64RetTy;
+ Type *X64RetTy;
+
+ SmallVector<Type *> Arm64ArgTypes;
+ SmallVector<Type *> X64ArgTypes;
+
+ // The first argument to a thunk is the called function, stored in x9.
+ // For exit thunks, we pass the called function down to the emulator;
+ // for entry thunks, we just call the Arm64 function directly.
+ if (!EntryThunk)
+ Arm64ArgTypes.push_back(PtrTy);
+ X64ArgTypes.push_back(PtrTy);
+
+ bool HasSretPtr = false;
+ getThunkRetType(FT, AttrList, Out, Arm64RetTy, X64RetTy, Arm64ArgTypes,
+ X64ArgTypes, HasSretPtr);
+
+ getThunkArgTypes(FT, AttrList, Out, Arm64ArgTypes, X64ArgTypes, HasSretPtr);
+
+ Arm64Ty = FunctionType::get(Arm64RetTy, Arm64ArgTypes, false);
+ X64Ty = FunctionType::get(X64RetTy, X64ArgTypes, false);
+}
+
+void AArch64Arm64ECCallLowering::getThunkArgTypes(
+ FunctionType *FT, AttributeList AttrList, raw_ostream &Out,
+ SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool HasSretPtr) {
+
+ Out << "$";
+ if (FT->isVarArg()) {
+ // We treat the variadic function's thunk as a normal function
+ // with the following type on the ARM side:
+ // rettype exitthunk(
+ // ptr x9, ptr x0, i64 x1, i64 x2, i64 x3, ptr x4, i64 x5)
+ //
+ // that can coverage all types of variadic function.
+ // x9 is similar to normal exit thunk, store the called function.
+ // x0-x3 is the arguments be stored in registers.
+ // x4 is the address of the arguments on the stack.
+ // x5 is the size of the arguments on the stack.
+ //
+ // On the x64 side, it's the same except that x5 isn't set.
+ //
+ // If both the ARM and X64 sides are sret, there are only three
+ // arguments in registers.
+ //
+ // If the X64 side is sret, but the ARM side isn't, we pass an extra value
+ // to/from the X64 side, and let SelectionDAG transform it into a memory
+ // location.
+ Out << "varargs";
+
+ // x0-x3
+ for (int i = HasSretPtr ? 1 : 0; i < 4; i++) {
+ Arm64ArgTypes.push_back(I64Ty);
+ X64ArgTypes.push_back(I64Ty);
+ }
+
+ // x4
+ Arm64ArgTypes.push_back(PtrTy);
+ X64ArgTypes.push_back(PtrTy);
+ // x5
+ Arm64ArgTypes.push_back(I64Ty);
+ // FIXME: x5 isn't actually passed/used by the x64 side; revisit once we
+ // have proper isel for varargs
+ X64ArgTypes.push_back(I64Ty);
+ return;
+ }
+
+ unsigned I = 0;
+ if (HasSretPtr)
+ I++;
+
+ if (I == FT->getNumParams()) {
+ Out << "v";
+ return;
+ }
+
+ for (unsigned E = FT->getNumParams(); I != E; ++I) {
+ Align ParamAlign = AttrList.getParamAlignment(I).valueOrOne();
+#if 0
+ // FIXME: Need more information about argument size; see
+ // https://reviews.llvm.org/D132926
+ uint64_t ArgSizeBytes = AttrList.getParamArm64ECArgSizeBytes(I);
+#else
+ uint64_t ArgSizeBytes = 0;
+#endif
+ Type *Arm64Ty, *X64Ty;
+ canonicalizeThunkType(FT->getParamType(I), ParamAlign,
+ /*Ret*/ false, ArgSizeBytes, Out, Arm64Ty, X64Ty);
+ Arm64ArgTypes.push_back(Arm64Ty);
+ X64ArgTypes.push_back(X64Ty);
+ }
+}
+
+void AArch64Arm64ECCallLowering::getThunkRetType(
+ FunctionType *FT, AttributeList AttrList, raw_ostream &Out,
+ Type *&Arm64RetTy, Type *&X64RetTy, SmallVectorImpl<Type *> &Arm64ArgTypes,
+ SmallVectorImpl<Type *> &X64ArgTypes, bool &HasSretPtr) {
+ Type *T = FT->getReturnType();
+#if 0
+ // FIXME: Need more information about argument size; see
+ // https://reviews.llvm.org/D132926
+ uint64_t ArgSizeBytes = AttrList.getRetArm64ECArgSizeBytes();
+#else
+ int64_t ArgSizeBytes = 0;
+#endif
+ if (T->isVoidTy()) {
+ if (FT->getNumParams()) {
+ auto SRetAttr = AttrList.getParamAttr(0, Attribute::StructRet);
+ auto InRegAttr = AttrList.getParamAttr(0, Attribute::InReg);
+ if (SRetAttr.isValid() && InRegAttr.isValid()) {
+ // sret+inreg indicates a call that returns a C++ class value. This is
+ // actually equivalent to just passing and returning a void* pointer
+ // as the first argument. Translate it that way, instead of trying
+ // to model "inreg" in the thunk's calling convention, to simplify
+ // the rest of the code.
+ Out << "i8";
+ Arm64RetTy = I64Ty;
+ X64RetTy = I64Ty;
+ return;
+ }
+ if (SRetAttr.isValid()) {
+ // FIXME: Sanity-check the sret type; if it's an integer or pointer,
+ // we'll get screwy mangling/codegen.
+ // FIXME: For large struct types, mangle as an integer argument and
+ // integer return, so we can reuse more thunks, instead of "m" syntax.
+ // (MSVC mangles this case as an integer return with no argument, but
+ // that's a miscompile.)
+ Type *SRetType = SRetAttr.getValueAsType();
+ Align SRetAlign = AttrList.getParamAlignment(0).valueOrOne();
+ Type *Arm64Ty, *X64Ty;
+ canonicalizeThunkType(SRetType, SRetAlign, /*Ret*/ true, ArgSizeBytes,
+ Out, Arm64Ty, X64Ty);
+ Arm64RetTy = VoidTy;
+ X64RetTy = VoidTy;
+ Arm64ArgTypes.push_back(FT->getParamType(0));
+ X64ArgTypes.push_back(FT->getParamType(0));
+ HasSretPtr = true;
+ return;
+ }
+ }
+
+ Out << "v";
+ Arm64RetTy = VoidTy;
+ X64RetTy = VoidTy;
+ return;
+ }
+
+ canonicalizeThunkType(T, Align(), /*Ret*/ true, ArgSizeBytes, Out, Arm64RetTy,
+ X64RetTy);
+ if (X64RetTy->isPointerTy()) {
+ // If the X64 type is canonicalized to a pointer, that means it's
+ // passed/returned indirectly. For a return value, that means it's an
+ // sret pointer.
+ X64ArgTypes.push_back(X64RetTy);
+ X64RetTy = VoidTy;
+ }
+}
+
+void AArch64Arm64ECCallLowering::canonicalizeThunkType(
+ Type *T, Align Alignment, bool Ret, uint64_t ArgSizeBytes, raw_ostream &Out,
+ Type *&Arm64Ty, Type *&X64Ty) {
+ if (T->isFloatTy()) {
+ Out << "f";
+ Arm64Ty = T;
+ X64Ty = T;
+ return;
+ }
+
+ if (T->isDoubleTy()) {
+ Out << "d";
+ Arm64Ty = T;
+ X64Ty = T;
+ return;
+ }
+
+ if (T->isFloatingPointTy()) {
+ report_fatal_error(
+ "Only 32 and 64 bit floating points are supported for ARM64EC thunks");
+ }
+
+ auto &DL = M->getDataLayout();
+
+ if (auto *StructTy = dyn_cast<StructType>(T))
+ if (StructTy->getNumElements() == 1)
+ T = StructTy->getElementType(0);
+
+ if (T->isArrayTy()) {
+ Type *ElementTy = T->getArrayElementType();
+ uint64_t ElementCnt = T->getArrayNumElements();
+ uint64_t ElementSizePerBytes = DL.getTypeSizeInBits(ElementTy) / 8;
+ uint64_t TotalSizeBytes = ElementCnt * ElementSizePerBytes;
+ if (ElementTy->isFloatTy() || ElementTy->isDoubleTy()) {
+ Out << (ElementTy->isFloatTy() ? "F" : "D") << TotalSizeBytes;
+ if (Alignment.value() >= 8 && !T->isPointerTy())
+ Out << "a" << Alignment.value();
+ Arm64Ty = T;
+ if (TotalSizeBytes <= 8) {
+ // Arm64 returns small structs of float/double in float registers;
+ // X64 uses RAX.
+ X64Ty = llvm::Type::getIntNTy(M->getContext(), TotalSizeBytes * 8);
+ } else {
+ // Struct is passed directly on Arm64, but indirectly on X64.
+ X64Ty = PtrTy;
+ }
+ return;
+ } else if (T->isFloatingPointTy()) {
+ report_fatal_error("Only 32 and 64 bit floating points are supported for "
+ "ARM64EC thunks");
+ }
+ }
+
+ if ((T->isIntegerTy() || T->isPointerTy()) && DL.getTypeSizeInBits(T) <= 64) {
+ Out << "i8";
+ Arm64Ty = I64Ty;
+ X64Ty = I64Ty;
+ return;
+ }
+
+ unsigned TypeSize = ArgSizeBytes;
+ if (TypeSize == 0)
+ TypeSize = DL.getTypeSizeInBits(T) / 8;
+ Out << "m";
+ if (TypeSize != 4)
+ Out << TypeSize;
+ if (Alignment.value() >= 8 && !T->isPointerTy())
+ Out << "a" << Alignment.value();
+ // FIXME: Try to canonicalize Arm64Ty more thoroughly?
+ Arm64Ty = T;
+ if (TypeSize == 1 || TypeSize == 2 || TypeSize == 4 || TypeSize == 8) {
+ // Pass directly in an integer register
+ X64Ty = llvm::Type::getIntNTy(M->getContext(), TypeSize * 8);
+ } else {
+ // Passed directly on Arm64, but indirectly on X64.
+ X64Ty = PtrTy;
+ }
+}
+
+// This function builds the "exit thunk", a function which translates
+// arguments and return values when calling x64 code from AArch64 code.
+Function *AArch64Arm64ECCallLowering::buildExitThunk(FunctionType *FT,
+ AttributeList Attrs) {
+ SmallString<256> ExitThunkName;
+ llvm::raw_svector_ostream ExitThunkStream(ExitThunkName);
+ FunctionType *Arm64Ty, *X64Ty;
+ getThunkType(FT, Attrs, /*EntryThunk*/ false, ExitThunkStream, Arm64Ty,
+ X64Ty);
+ if (Function *F = M->getFunction(ExitThunkName))
+ return F;
+
+ Function *F = Function::Create(Arm64Ty, GlobalValue::LinkOnceODRLinkage, 0,
+ ExitThunkName, M);
+ F->setCallingConv(CallingConv::ARM64EC_Thunk_Native);
+ F->setSection(".wowthk$aa");
+ F->setComdat(M->getOrInsertComdat(ExitThunkName));
+ // Copy MSVC, and always set up a frame pointer. (Maybe this isn't necessary.)
+ F->addFnAttr("frame-pointer", "all");
+ // Only copy sret from the first argument. For C++ instance methods, clang can
+ // stick an sret marking on a later argument, but it doesn't actually affect
+ // the ABI, so we can omit it. This avoids triggering a verifier assertion.
+ if (FT->getNumParams()) {
+ auto SRet = Attrs.getParamAttr(0, Attribute::StructRet);
+ auto InReg = Attrs.getParamAttr(0, Attribute::InReg);
+ if (SRet.isValid() && !InReg.isValid())
+ F->addParamAttr(1, SRet);
+ }
+ // FIXME: Copy anything other than sret? Shouldn't be necessary for normal
+ // C ABI, but might show up in other cases.
+ BasicBlock *BB = BasicBlock::Create(M->getContext(), "", F);
+ IRBuilder<> IRB(BB);
+ Value *CalleePtr =
+ M->getOrInsertGlobal("__os_arm64x_dispatch_call_no_redirect", PtrTy);
+ Value *Callee = IRB.CreateLoad(PtrTy, CalleePtr);
+ auto &DL = M->getDataLayout();
+ SmallVector<Value *> Args;
+
+ // Pass the called function in x9.
+ Args.push_back(F->arg_begin());
+
+ Type *RetTy = Arm64Ty->getReturnType();
+ if (RetTy != X64Ty->getReturnType()) {
+ // If the return type is an array or struct, translate it. Values of size
+ // 8 or less go into RAX; bigger values go into memory, and we pass a
+ // pointer.
+ if (DL.getTypeStoreSize(RetTy) > 8) {
+ Args.push_back(IRB.CreateAlloca(RetTy));
+ }
+ }
+
+ for (auto &Arg : make_range(F->arg_begin() + 1, F->arg_end())) {
+ // Translate arguments from AArch64 calling convention to x86 calling
+ // convention.
+ //
+ // For simple types, we don't need to do any translation: they're
+ // represented the same way. (Implicit sign extension is not part of
+ // either convention.)
+ //
+ // The big thing we have to worry about is struct types... but
+ // fortunately AArch64 clang is pretty friendly here: the cases that need
+ // translation are always passed as a struct or array. (If we run into
+ // some cases where this doesn't work, we can teach clang to mark it up
+ // with an attribute.)
+ //
+ // The first argument is the called function, stored in x9.
+ if (Arg.getType()->isArrayTy() || Arg.getType()->isStructTy() ||
+ DL.getTypeStoreSize(Arg.getType()) > 8) {
+ Value *Mem = IRB.CreateAlloca(Arg.getType());
+ IRB.CreateStore(&Arg, Mem);
+ if (DL....
[truncated]
|
You can test this locally with the following command:git-clang-format --diff 2d373143ad69910c56bbc7161224d365813a95b0 b2d4afa14c22d6e8e7ffd457c85f00724b01110e -- llvm/lib/Target/AArch64/AArch64Arm64ECCallLowering.cpp clang/lib/CodeGen/CGCXX.cpp llvm/include/llvm/IR/CallingConv.h llvm/lib/CodeGen/AsmPrinter/AsmPrinter.cpp llvm/lib/Target/AArch64/AArch64.h llvm/lib/Target/AArch64/AArch64AsmPrinter.cpp llvm/lib/Target/AArch64/AArch64CallingConvention.h llvm/lib/Target/AArch64/AArch64FastISel.cpp llvm/lib/Target/AArch64/AArch64FrameLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.cpp llvm/lib/Target/AArch64/AArch64ISelLowering.h llvm/lib/Target/AArch64/AArch64InstrInfo.cpp llvm/lib/Target/AArch64/AArch64MCInstLower.cpp llvm/lib/Target/AArch64/AArch64MCInstLower.h llvm/lib/Target/AArch64/AArch64RegisterInfo.cpp llvm/lib/Target/AArch64/AArch64Subtarget.cpp llvm/lib/Target/AArch64/AArch64Subtarget.h llvm/lib/Target/AArch64/AArch64TargetMachine.cpp llvm/lib/Target/AArch64/GISel/AArch64CallLowering.cpp llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h View the diff from clang-format here.diff --git a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
index 332fb37655..2b6ed0b679 100644
--- a/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
+++ b/llvm/lib/Target/AArch64/AArch64ISelLowering.cpp
@@ -6585,10 +6585,10 @@ CCAssignFn *AArch64TargetLowering::CCAssignFnForCall(CallingConv::ID CC,
case CallingConv::AArch64_SME_ABI_Support_Routines_PreserveMost_From_X0:
case CallingConv::AArch64_SME_ABI_Support_Routines_PreserveMost_From_X2:
return CC_AArch64_AAPCS;
- case CallingConv::ARM64EC_Thunk_X64:
- return CC_AArch64_Arm64EC_Thunk;
- case CallingConv::ARM64EC_Thunk_Native:
- return CC_AArch64_Arm64EC_Thunk_Native;
+ case CallingConv::ARM64EC_Thunk_X64:
+ return CC_AArch64_Arm64EC_Thunk;
+ case CallingConv::ARM64EC_Thunk_Native:
+ return CC_AArch64_Arm64EC_Thunk_Native;
}
}
diff --git a/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp b/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
index 1e12cf545f..3357d889f5 100644
--- a/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
+++ b/llvm/lib/Target/AArch64/AArch64MCInstLower.cpp
@@ -93,10 +93,8 @@ MCSymbol *AArch64MCInstLower::GetGlobalValueSymbol(const GlobalValue *GV,
SmallString<128> Name;
- if ((TargetFlags & AArch64II::MO_DLLIMPORT) &&
- TheTriple.isWindowsArm64EC() &&
- !(TargetFlags & AArch64II::MO_ARM64EC_CALLMANGLE) &&
- isa<Function>(GV)) {
+ if ((TargetFlags & AArch64II::MO_DLLIMPORT) && TheTriple.isWindowsArm64EC() &&
+ !(TargetFlags & AArch64II::MO_ARM64EC_CALLMANGLE) && isa<Function>(GV)) {
// __imp_aux is specific to arm64EC; it represents the actual address of
// an imported function without any thunks.
//
diff --git a/llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h b/llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
index 10e69655f7..f2f5c9c669 100644
--- a/llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
+++ b/llvm/lib/Target/AArch64/Utils/AArch64BaseInfo.h
@@ -742,93 +742,93 @@ namespace AArch64PRCTX {
namespace AArch64II {
/// Target Operand Flag enum.
- enum TOF {
- //===------------------------------------------------------------------===//
- // AArch64 Specific MachineOperand flags.
-
- MO_NO_FLAG,
-
- MO_FRAGMENT = 0x7,
-
- /// MO_PAGE - A symbol operand with this flag represents the pc-relative
- /// offset of the 4K page containing the symbol. This is used with the
- /// ADRP instruction.
- MO_PAGE = 1,
-
- /// MO_PAGEOFF - A symbol operand with this flag represents the offset of
- /// that symbol within a 4K page. This offset is added to the page address
- /// to produce the complete address.
- MO_PAGEOFF = 2,
-
- /// MO_G3 - A symbol operand with this flag (granule 3) represents the high
- /// 16-bits of a 64-bit address, used in a MOVZ or MOVK instruction
- MO_G3 = 3,
-
- /// MO_G2 - A symbol operand with this flag (granule 2) represents the bits
- /// 32-47 of a 64-bit address, used in a MOVZ or MOVK instruction
- MO_G2 = 4,
-
- /// MO_G1 - A symbol operand with this flag (granule 1) represents the bits
- /// 16-31 of a 64-bit address, used in a MOVZ or MOVK instruction
- MO_G1 = 5,
-
- /// MO_G0 - A symbol operand with this flag (granule 0) represents the bits
- /// 0-15 of a 64-bit address, used in a MOVZ or MOVK instruction
- MO_G0 = 6,
-
- /// MO_HI12 - This flag indicates that a symbol operand represents the bits
- /// 13-24 of a 64-bit address, used in a arithmetic immediate-shifted-left-
- /// by-12-bits instruction.
- MO_HI12 = 7,
-
- /// MO_COFFSTUB - On a symbol operand "FOO", this indicates that the
- /// reference is actually to the ".refptr.FOO" symbol. This is used for
- /// stub symbols on windows.
- MO_COFFSTUB = 0x8,
-
- /// MO_GOT - This flag indicates that a symbol operand represents the
- /// address of the GOT entry for the symbol, rather than the address of
- /// the symbol itself.
- MO_GOT = 0x10,
-
- /// MO_NC - Indicates whether the linker is expected to check the symbol
- /// reference for overflow. For example in an ADRP/ADD pair of relocations
- /// the ADRP usually does check, but not the ADD.
- MO_NC = 0x20,
-
- /// MO_TLS - Indicates that the operand being accessed is some kind of
- /// thread-local symbol. On Darwin, only one type of thread-local access
- /// exists (pre linker-relaxation), but on ELF the TLSModel used for the
- /// referee will affect interpretation.
- MO_TLS = 0x40,
-
- /// MO_DLLIMPORT - On a symbol operand, this represents that the reference
- /// to the symbol is for an import stub. This is used for DLL import
- /// storage class indication on Windows.
- MO_DLLIMPORT = 0x80,
-
- /// MO_S - Indicates that the bits of the symbol operand represented by
- /// MO_G0 etc are signed.
- MO_S = 0x100,
-
- /// MO_PREL - Indicates that the bits of the symbol operand represented by
- /// MO_G0 etc are PC relative.
- MO_PREL = 0x200,
-
- /// MO_TAGGED - With MO_PAGE, indicates that the page includes a memory tag
- /// in bits 56-63.
- /// On a FrameIndex operand, indicates that the underlying memory is tagged
- /// with an unknown tag value (MTE); this needs to be lowered either to an
- /// SP-relative load or store instruction (which do not check tags), or to
- /// an LDG instruction to obtain the tag value.
- MO_TAGGED = 0x400,
-
- /// MO_ARM64EC_CALLMANGLE - Operand refers to the Arm64EC-mangled version
- /// of a symbol, not the original. For dllimport symbols, this means it
- /// uses "__imp_aux". For other symbols, this means it uses the mangled
- /// ("#" prefix for C) name.
- MO_ARM64EC_CALLMANGLE = 0x800,
- };
+enum TOF {
+ //===------------------------------------------------------------------===//
+ // AArch64 Specific MachineOperand flags.
+
+ MO_NO_FLAG,
+
+ MO_FRAGMENT = 0x7,
+
+ /// MO_PAGE - A symbol operand with this flag represents the pc-relative
+ /// offset of the 4K page containing the symbol. This is used with the
+ /// ADRP instruction.
+ MO_PAGE = 1,
+
+ /// MO_PAGEOFF - A symbol operand with this flag represents the offset of
+ /// that symbol within a 4K page. This offset is added to the page address
+ /// to produce the complete address.
+ MO_PAGEOFF = 2,
+
+ /// MO_G3 - A symbol operand with this flag (granule 3) represents the high
+ /// 16-bits of a 64-bit address, used in a MOVZ or MOVK instruction
+ MO_G3 = 3,
+
+ /// MO_G2 - A symbol operand with this flag (granule 2) represents the bits
+ /// 32-47 of a 64-bit address, used in a MOVZ or MOVK instruction
+ MO_G2 = 4,
+
+ /// MO_G1 - A symbol operand with this flag (granule 1) represents the bits
+ /// 16-31 of a 64-bit address, used in a MOVZ or MOVK instruction
+ MO_G1 = 5,
+
+ /// MO_G0 - A symbol operand with this flag (granule 0) represents the bits
+ /// 0-15 of a 64-bit address, used in a MOVZ or MOVK instruction
+ MO_G0 = 6,
+
+ /// MO_HI12 - This flag indicates that a symbol operand represents the bits
+ /// 13-24 of a 64-bit address, used in a arithmetic immediate-shifted-left-
+ /// by-12-bits instruction.
+ MO_HI12 = 7,
+
+ /// MO_COFFSTUB - On a symbol operand "FOO", this indicates that the
+ /// reference is actually to the ".refptr.FOO" symbol. This is used for
+ /// stub symbols on windows.
+ MO_COFFSTUB = 0x8,
+
+ /// MO_GOT - This flag indicates that a symbol operand represents the
+ /// address of the GOT entry for the symbol, rather than the address of
+ /// the symbol itself.
+ MO_GOT = 0x10,
+
+ /// MO_NC - Indicates whether the linker is expected to check the symbol
+ /// reference for overflow. For example in an ADRP/ADD pair of relocations
+ /// the ADRP usually does check, but not the ADD.
+ MO_NC = 0x20,
+
+ /// MO_TLS - Indicates that the operand being accessed is some kind of
+ /// thread-local symbol. On Darwin, only one type of thread-local access
+ /// exists (pre linker-relaxation), but on ELF the TLSModel used for the
+ /// referee will affect interpretation.
+ MO_TLS = 0x40,
+
+ /// MO_DLLIMPORT - On a symbol operand, this represents that the reference
+ /// to the symbol is for an import stub. This is used for DLL import
+ /// storage class indication on Windows.
+ MO_DLLIMPORT = 0x80,
+
+ /// MO_S - Indicates that the bits of the symbol operand represented by
+ /// MO_G0 etc are signed.
+ MO_S = 0x100,
+
+ /// MO_PREL - Indicates that the bits of the symbol operand represented by
+ /// MO_G0 etc are PC relative.
+ MO_PREL = 0x200,
+
+ /// MO_TAGGED - With MO_PAGE, indicates that the page includes a memory tag
+ /// in bits 56-63.
+ /// On a FrameIndex operand, indicates that the underlying memory is tagged
+ /// with an unknown tag value (MTE); this needs to be lowered either to an
+ /// SP-relative load or store instruction (which do not check tags), or to
+ /// an LDG instruction to obtain the tag value.
+ MO_TAGGED = 0x400,
+
+ /// MO_ARM64EC_CALLMANGLE - Operand refers to the Arm64EC-mangled version
+ /// of a symbol, not the original. For dllimport symbols, this means it
+ /// uses "__imp_aux". For other symbols, this means it uses the mangled
+ /// ("#" prefix for C) name.
+ MO_ARM64EC_CALLMANGLE = 0x800,
+};
} // end namespace AArch64II
//===----------------------------------------------------------------------===//
|
This combines the previously posted patches with some additional work I've done to more closely match MSVC output. Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs. The other changes are supporting changes, to handle the new constructs generated by that pass. There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that. There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX. I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently. Thanks to @dpaoliello for extensive testing and suggestions. (Originally posted as https://reviews.llvm.org/D157547 .)
484da22
to
b2d4afa
Compare
(I'm going to ignore the indentation issues here, and address in a followup.) |
Thanks! FWIW, I tested and reviewed the new version and it works and looks good to me. |
This combines the previously posted patches with some additional work I've done to more closely match MSVC output.
Most of the important logic here is implemented in AArch64Arm64ECCallLowering. The purpose of the AArch64Arm64ECCallLowering is to take "normal" IR we'd generate for other targets, and generate most of the Arm64EC-specific bits: generating thunks, mangling symbols, generating aliases, and generating the .hybmp$x table. This is all done late for a few reasons: to consolidate the logic as much as possible, and to ensure the IR exposed to optimization passes doesn't contain complex arm64ec-specific constructs.
The other changes are supporting changes, to handle the new constructs generated by that pass.
There's a global llvm.arm64ec.symbolmap representing the .hybmp$x entries for the thunks. This gets handled directly by the AsmPrinter because it needs symbol indexes that aren't available before that.
There are two new calling conventions used to represent calls to and from thunks: ARM64EC_Thunk_X64 and ARM64EC_Thunk_Native. There are a few changes to handle the associated exception-handling info, SEH_SaveAnyRegQP and SEH_SaveAnyRegQPX.
I've intentionally left out handling for structs with small non-power-of-two sizes, because that's easily separated out. The rest of my current work is here. I squashed my current patches because they were split in ways that didn't really make sense. Maybe I could split out some bits, but it's hard to meaningfully test most of the parts independently.
Thanks to @dpaoliello for extensive testing and suggestions.
(Originally posted as https://reviews.llvm.org/D157547 .)