-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Emit aliases to FP16 conversion routines #45649
Conversation
Arrg for whatever reason one can't create aliases to things outside the compilation unit... that's disappointing. One alternative is |
aed6adf
to
ab1aa83
Compare
Okay the error on AArch64 is interesting:
I am testing here explicitly the |
This looks good to me. @vtjnash would appreciate a quick review |
Buildbot Windows disagrees? |
One of them is a OOM, and the other one feels like a OOM as well https://build.julialang.org/#/builders/72/builds/7343 |
I'm not sure why we would be OOM'ing, the machines have 32GB of memory available. Looking at the memory graphs of I'm not trying hard to attribute one particular dip with another here; just showing that overall, while we are using significant amounts of memory, we aren't in OOM territory yet. I'm willing to bet that the win32 OOM is an address space exhaustion more than an OOM (similar to what we've been seeing elsewhere on linux32) and that win64 is something else entirely. |
sigh Thanks Elliot for checking. |
Can't reproduce the windows failure locally :/ |
Fixes #45433, thanks. |
TODO: update these with latest changes, see below. 1.6.6--- src/APInt-C.cpp 2022-06-29 15:37:58.943951000 +0000
+++ src/APInt-C.cpp 2022-06-29 15:39:56.742904521 +0000
@@ -316,7 +316,7 @@
void LLVMFPtoInt(unsigned numbits, void *pa, unsigned onumbits, integerPart *pr, bool isSigned, bool *isExact) {
double Val;
if (numbits == 16)
- Val = __gnu_h2f_ieee(*(uint16_t*)pa);
+ Val = julia__gnu_h2f_ieee(*(uint16_t*)pa);
else if (numbits == 32)
Val = *(float*)pa;
else if (numbits == 64)
@@ -391,7 +391,7 @@
val = a.roundToDouble(true);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
@@ -408,7 +408,7 @@
val = a.roundToDouble(false);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
--- src/aotcompile.cpp 2022-06-29 15:37:58.943951000 +0000
+++ src/aotcompile.cpp 2022-06-29 16:27:31.074065990 +0000
@@ -51,6 +51,7 @@
#include <llvm/Support/CodeGen.h>
#endif
+#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/LegacyPassManagers.h>
#include <llvm/Transforms/Utils/Cloning.h>
@@ -276,6 +277,24 @@
*ci_out = codeinst;
}
+static void injectCRTAlias(Module &M, StringRef name, StringRef alias, FunctionType *FT)
+{
+ Function *target = M.getFunction(alias);
+ if (!target) {
+ target = Function::Create(FT, Function::ExternalLinkage, alias, M);
+ }
+ // Weak so that this does not get discarded
+ // maybe use llvm.compiler.used instead?
+ Function *interposer = Function::Create(FT, Function::WeakAnyLinkage, name, M);
+
+ llvm::IRBuilder<> builder(BasicBlock::Create(M.getContext(), "top", interposer));
+ SmallVector<Value *, 4> CallArgs;
+ for (auto &arg : interposer->args())
+ CallArgs.push_back(&arg);
+ auto val = builder.CreateCall(target, CallArgs);
+ builder.CreateRet(val);
+}
+
// takes the running content that has collected in the shadow module and dump it to disk
// this builds the object file portion of the sysimage files for fast startup, and can
// also be used be extern consumers like GPUCompiler.jl to obtain a module containing
@@ -554,6 +573,20 @@
"jl_RTLD_DEFAULT_handle_pointer"));
}
+ // We would like to emit an alias or an weakref alias to redirect these symbols
+ // but LLVM doesn't let us emit a GlobalAlias to a declaration...
+ // So for now we inject a definition of these functions that calls our runtime functions.
+ injectCRTAlias(*data->M, "__gnu_h2f_ieee", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(*data->M, "__extendhfsf2", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(*data->M, "__gnu_f2h_ieee", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(*data->M, "__truncsfhf2", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(*data->M, "__truncdfhf2", "julia__truncdfhf2",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getDoubleTy(Context) }, false));
+
// do the actual work
auto add_output = [&] (Module &M, StringRef unopt_bc_Name, StringRef bc_Name, StringRef obj_Name, StringRef asm_Name) {
PM.run(M);
--- src/julia.expmap 2022-06-29 15:37:58.987952000 +0000
+++ src/julia.expmap 2022-06-29 15:40:28.643715568 +0000
@@ -42,12 +42,6 @@
environ;
__progname;
- /* compiler run-time intrinsics */
- __gnu_h2f_ieee;
- __extendhfsf2;
- __gnu_f2h_ieee;
- __truncdfhf2;
-
local:
*;
};
--- src/julia_internal.h 2022-06-29 15:37:58.991953000 +0000
+++ src/julia_internal.h 2022-06-29 15:42:47.155284019 +0000
@@ -1363,8 +1363,9 @@
#define JL_GC_ASSERT_LIVE(x) (void)(x)
#endif
-float __gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
-uint16_t __gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__truncdfhf2(double param) JL_NOTSAFEPOINT;
#ifdef __cplusplus
}
--- src/runtime_intrinsics.c 2022-06-29 15:37:59.003953000 +0000
+++ src/runtime_intrinsics.c 2022-06-29 15:43:48.056873802 +0000
@@ -169,9 +169,9 @@
}
#define fp_select(a, func) \
- sizeof(a) == sizeof(float) ? func##f((float)a) : func(a)
+ sizeof(a) <= sizeof(float) ? func##f((float)a) : func(a)
#define fp_select2(a, b, func) \
- sizeof(a) == sizeof(float) ? func##f(a, b) : func(a, b)
+ sizeof(a) <= sizeof(float) ? func##f(a, b) : func(a, b)
// fast-function generators //
@@ -215,11 +215,11 @@
static inline void name(unsigned osize, void *pa, void *pr) JL_NOTSAFEPOINT \
{ \
uint16_t a = *(uint16_t*)pa; \
- float A = __gnu_h2f_ieee(a); \
+ float A = julia__gnu_h2f_ieee(a); \
if (osize == 16) { \
float R; \
OP(&R, A); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
} else { \
OP((uint16_t*)pr, A); \
} \
@@ -243,11 +243,11 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
float R = OP(A, B); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
// float or integer inputs, bool output
@@ -268,8 +268,8 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
return OP(A, B); \
}
@@ -309,12 +309,12 @@
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
uint16_t c = *(uint16_t*)pc; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
- float C = __gnu_h2f_ieee(c); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
+ float C = julia__gnu_h2f_ieee(c); \
runtime_nbits = 16; \
float R = OP(A, B, C); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
@@ -832,7 +832,7 @@
fpiseq_n(float, 32)
fpiseq_n(double, 64)
#define fpiseq(a,b) \
- sizeof(a) == sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
+ sizeof(a) <= sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
#define fpislt_n(c_type, nbits) \
static inline int fpislt##nbits(c_type a, c_type b) JL_NOTSAFEPOINT \
@@ -903,7 +903,7 @@
if (!(osize < 8 * sizeof(a))) \
jl_error("fptrunc: output bitsize must be < input bitsize"); \
else if (osize == 16) \
- *(uint16_t*)pr = __gnu_f2h_ieee(a); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(a); \
else if (osize == 32) \
*(float*)pr = a; \
else if (osize == 64) \
--- src/jitlayers.cpp 2022-06-29 15:37:58.975952000 +0000
+++ src/jitlayers.cpp 2022-06-29 15:45:50.344097088 +0000
@@ -737,12 +737,26 @@
}
JD.addToLinkOrder(GlobalJD, orc::JITDylibLookupFlags::MatchExportedSymbolsOnly);
+
+ orc::SymbolAliasMap jl_crt = {
+ { mangle("__gnu_h2f_ieee"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__extendhfsf2"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__gnu_f2h_ieee"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncsfhf2"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncdfhf2"), { mangle("julia__truncdfhf2"), JITSymbolFlags::Exported } }
+ };
+ cantFail(GlobalJD.define(orc::symbolAliases(jl_crt)));
}
-void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+orc::SymbolStringPtr JuliaOJIT::mangle(StringRef Name)
{
std::string MangleName = getMangledName(Name);
- cantFail(JD.define(orc::absoluteSymbols({{ES.intern(MangleName), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
+ return ES.intern(MangleName);
+}
+
+void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+{
+ cantFail(JD.define(orc::absoluteSymbols({{mangle(Name), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
}
void JuliaOJIT::addModule(std::unique_ptr<Module> M)
--- src/jitlayers.h 2022-06-29 15:37:58.975952000 +0000
+++ src/jitlayers.h 2022-06-29 15:46:24.985016703 +0000
@@ -185,6 +185,7 @@
const object::ObjectFile &Obj,
const RuntimeDyld::LoadedObjectInfo &LoadedObjectInfo);
#endif
+ orc::SymbolStringPtr mangle(StringRef Name);
void addGlobalMapping(StringRef Name, uint64_t Addr);
void addModule(std::unique_ptr<Module> M);
#if JL_LLVM_VERSION < 120000
--- src/intrinsics.cpp 2022-06-29 16:28:06.923128000 +0000
+++ src/intrinsics.cpp 2022-06-29 16:30:30.343357962 +0000
@@ -1476,22 +1476,17 @@
#if !defined(_OS_DARWIN_) // xcode already links compiler-rt
-extern "C" JL_DLLEXPORT float __gnu_h2f_ieee(uint16_t param)
+extern "C" JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param)
{
return half_to_float(param);
}
-extern "C" JL_DLLEXPORT float __extendhfsf2(uint16_t param)
-{
- return half_to_float(param);
-}
-
-extern "C" JL_DLLEXPORT uint16_t __gnu_f2h_ieee(float param)
+extern "C" JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param)
{
return float_to_half(param);
}
-extern "C" JL_DLLEXPORT uint16_t __truncdfhf2(double param)
+extern "C" JL_DLLEXPORT uint16_t julia__truncdfhf2(double param)
{
return float_to_half((float)param);
}
--- test/intrinsics.jl 2022-06-29 15:37:59.139956000 +0000
+++ test/intrinsics.jl 2022-06-29 15:49:07.285356548 +0000
@@ -152,3 +152,27 @@
@test_intrinsic Core.Intrinsics.fptosi Int Float16(3.3) 3
@test_intrinsic Core.Intrinsics.fptoui UInt Float16(3.3) UInt(3)
end
+
+if Sys.ARCH == :aarch64
+ # On AArch64 we are following the `_Float16` ABI. Buthe these functions expect `Int16`.
+ # TODO: SHould we have `Chalf == Int16` and `Cfloat16 == Float16`?
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ truncsfhf2(x::Float32) = reinterpret(Float16, ccall("extern __truncsfhf2", llvmcall, Int16, (Float32,), x))
+ gnu_f2h_ieee(x::Float32) = reinterpret(Float16, ccall("extern __gnu_f2h_ieee", llvmcall, Int16, (Float32,), x))
+ truncdfhf2(x::Float64) = reinterpret(Float16, ccall("extern __truncdfhf2", llvmcall, Int16, (Float64,), x))
+else
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Float16,), x)
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Float16,), x)
+ truncsfhf2(x::Float32) = ccall("extern __truncsfhf2", llvmcall, Float16, (Float32,), x)
+ gnu_f2h_ieee(x::Float32) = ccall("extern __gnu_f2h_ieee", llvmcall, Float16, (Float32,), x)
+ truncdfhf2(x::Float64) = ccall("extern __truncdfhf2", llvmcall, Float16, (Float64,), x)
+end
+
+@testset "Float16 intrinsics (crt)" begin
+ @test extendhfsf2(Float16(3.3)) == 3.3007812f0
+ @test gnu_h2f_ieee(Float16(3.3)) == 3.3007812f0
+ @test truncsfhf2(3.3f0) == Float16(3.3)
+ @test gnu_f2h_ieee(3.3f0) == Float16(3.3)
+ @test truncdfhf2(3.3) == Float16(3.3)
+end 1.7.3--- src/APInt-C.cpp 2022-06-29 15:38:07.412161000 +0000
+++ src/APInt-C.cpp 2022-06-29 16:03:07.396275264 +0000
@@ -316,7 +316,7 @@
void LLVMFPtoInt(unsigned numbits, void *pa, unsigned onumbits, integerPart *pr, bool isSigned, bool *isExact) {
double Val;
if (numbits == 16)
- Val = __gnu_h2f_ieee(*(uint16_t*)pa);
+ Val = julia__gnu_h2f_ieee(*(uint16_t*)pa);
else if (numbits == 32)
Val = *(float*)pa;
else if (numbits == 64)
@@ -391,7 +391,7 @@
val = a.roundToDouble(true);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
@@ -408,7 +408,7 @@
val = a.roundToDouble(false);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
--- src/aotcompile.cpp 2022-06-29 15:38:07.416161000 +0000
+++ src/aotcompile.cpp 2022-06-29 16:36:32.101927553 +0000
@@ -50,6 +50,7 @@
#include <llvm/MC/MCCodeEmitter.h>
#include <llvm/Support/CodeGen.h>
+#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/LegacyPassManagers.h>
#include <llvm/Transforms/Utils/Cloning.h>
@@ -446,6 +447,24 @@
jl_safe_printf("ERROR: failed to emit output file %s\n", err.c_str());
}
+static void injectCRTAlias(Module &M, StringRef name, StringRef alias, FunctionType *FT)
+{
+ Function *target = M.getFunction(alias);
+ if (!target) {
+ target = Function::Create(FT, Function::ExternalLinkage, alias, M);
+ }
+ // Weak so that this does not get discarded
+ // maybe use llvm.compiler.used instead?
+ Function *interposer = Function::Create(FT, Function::WeakAnyLinkage, name, M);
+
+ llvm::IRBuilder<> builder(BasicBlock::Create(M.getContext(), "top", interposer));
+ SmallVector<Value *, 4> CallArgs;
+ for (auto &arg : interposer->args())
+ CallArgs.push_back(&arg);
+ auto val = builder.CreateCall(target, CallArgs);
+ builder.CreateRet(val);
+}
+
// takes the running content that has collected in the shadow module and dump it to disk
// this builds the object file portion of the sysimage files for fast startup
@@ -551,6 +570,20 @@
"jl_RTLD_DEFAULT_handle_pointer"));
}
+ // We would like to emit an alias or an weakref alias to redirect these symbols
+ // but LLVM doesn't let us emit a GlobalAlias to a declaration...
+ // So for now we inject a definition of these functions that calls our runtime functions.
+ injectCRTAlias(*data->M, "__gnu_h2f_ieee", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(*data->M, "__extendhfsf2", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(*data->M, "__gnu_f2h_ieee", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(*data->M, "__truncsfhf2", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(*data->M, "__truncdfhf2", "julia__truncdfhf2",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getDoubleTy(Context) }, false));
+
// do the actual work
auto add_output = [&] (Module &M, StringRef unopt_bc_Name, StringRef bc_Name, StringRef obj_Name, StringRef asm_Name) {
PM.run(M);
--- src/julia.expmap 2022-06-29 15:38:07.444162000 +0000
+++ src/julia.expmap 2022-06-29 16:02:38.471479518 +0000
@@ -42,12 +42,6 @@
environ;
__progname;
- /* compiler run-time intrinsics */
- __gnu_h2f_ieee;
- __extendhfsf2;
- __gnu_f2h_ieee;
- __truncdfhf2;
-
local:
*;
};
--- src/julia_internal.h 2022-06-29 15:38:07.448162000 +0000
+++ src/julia_internal.h 2022-06-29 16:03:58.453680503 +0000
@@ -1427,8 +1427,9 @@
#define JL_GC_ASSERT_LIVE(x) (void)(x)
#endif
-float __gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
-uint16_t __gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__truncdfhf2(double param) JL_NOTSAFEPOINT;
#ifdef __cplusplus
}
--- src/runtime_intrinsics.c 2022-06-29 15:38:07.456162000 +0000
+++ src/runtime_intrinsics.c 2022-06-29 16:05:46.116645907 +0000
@@ -338,9 +338,9 @@
}
#define fp_select(a, func) \
- sizeof(a) == sizeof(float) ? func##f((float)a) : func(a)
+ sizeof(a) <= sizeof(float) ? func##f((float)a) : func(a)
#define fp_select2(a, b, func) \
- sizeof(a) == sizeof(float) ? func##f(a, b) : func(a, b)
+ sizeof(a) <= sizeof(float) ? func##f(a, b) : func(a, b)
// fast-function generators //
@@ -384,11 +384,11 @@
static inline void name(unsigned osize, void *pa, void *pr) JL_NOTSAFEPOINT \
{ \
uint16_t a = *(uint16_t*)pa; \
- float A = __gnu_h2f_ieee(a); \
+ float A = julia__gnu_h2f_ieee(a); \
if (osize == 16) { \
float R; \
OP(&R, A); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
} else { \
OP((uint16_t*)pr, A); \
} \
@@ -412,11 +412,11 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
float R = OP(A, B); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
// float or integer inputs, bool output
@@ -437,8 +437,8 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
return OP(A, B); \
}
@@ -478,12 +478,12 @@
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
uint16_t c = *(uint16_t*)pc; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
- float C = __gnu_h2f_ieee(c); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
+ float C = julia__gnu_h2f_ieee(c); \
runtime_nbits = 16; \
float R = OP(A, B, C); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
@@ -1001,7 +1001,7 @@
fpiseq_n(float, 32)
fpiseq_n(double, 64)
#define fpiseq(a,b) \
- sizeof(a) == sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
+ sizeof(a) <= sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
bool_fintrinsic(eq,eq_float)
bool_fintrinsic(ne,ne_float)
@@ -1050,7 +1050,7 @@
if (!(osize < 8 * sizeof(a))) \
jl_error("fptrunc: output bitsize must be < input bitsize"); \
else if (osize == 16) \
- *(uint16_t*)pr = __gnu_f2h_ieee(a); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(a); \
else if (osize == 32) \
*(float*)pr = a; \
else if (osize == 64) \
--- src/jitlayers.cpp 2022-06-29 15:38:07.440162000 +0000
+++ src/jitlayers.cpp 2022-06-29 16:38:19.841056942 +0000
@@ -728,12 +728,26 @@
}
JD.addToLinkOrder(GlobalJD, orc::JITDylibLookupFlags::MatchExportedSymbolsOnly);
+
+ orc::SymbolAliasMap jl_crt = {
+ { mangle("__gnu_h2f_ieee"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__extendhfsf2"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__gnu_f2h_ieee"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncsfhf2"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncdfhf2"), { mangle("julia__truncdfhf2"), JITSymbolFlags::Exported } }
+ };
+ cantFail(GlobalJD.define(orc::symbolAliases(jl_crt)));
}
-void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+orc::SymbolStringPtr JuliaOJIT::mangle(StringRef Name)
{
std::string MangleName = getMangledName(Name);
- cantFail(JD.define(orc::absoluteSymbols({{ES.intern(MangleName), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
+ return ES.intern(MangleName);
+}
+
+void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+{
+ cantFail(JD.define(orc::absoluteSymbols({{mangle(Name), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
}
void JuliaOJIT::addModule(std::unique_ptr<Module> M)
--- src/jitlayers.h 2022-06-29 15:38:07.440162000 +0000
+++ src/jitlayers.h 2022-06-29 16:08:04.044478978 +0000
@@ -182,6 +182,7 @@
const object::ObjectFile &Obj,
const RuntimeDyld::LoadedObjectInfo &LoadedObjectInfo);
#endif
+ orc::SymbolStringPtr mangle(StringRef Name);
void addGlobalMapping(StringRef Name, uint64_t Addr);
void addModule(std::unique_ptr<Module> M);
#if JL_LLVM_VERSION < 120000
--- src/intrinsics.cpp 2022-06-29 16:26:53.104938000 +0000
+++ src/intrinsics.cpp 2022-06-29 16:31:32.729189496 +0000
@@ -1635,22 +1635,17 @@
#if !defined(_OS_DARWIN_) // xcode already links compiler-rt
-extern "C" JL_DLLEXPORT float __gnu_h2f_ieee(uint16_t param)
+extern "C" JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param)
{
return half_to_float(param);
}
-extern "C" JL_DLLEXPORT float __extendhfsf2(uint16_t param)
-{
- return half_to_float(param);
-}
-
-extern "C" JL_DLLEXPORT uint16_t __gnu_f2h_ieee(float param)
+extern "C" JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param)
{
return float_to_half(param);
}
-extern "C" JL_DLLEXPORT uint16_t __truncdfhf2(double param)
+extern "C" JL_DLLEXPORT uint16_t julia__truncdfhf2(double param)
{
float res = (float)param;
uint32_t resi;
--- test/intrinsics.jl 2022-06-29 15:38:07.584165000 +0000
+++ test/intrinsics.jl 2022-06-29 16:56:50.640396691 +0000
@@ -284,3 +284,27 @@
@test r2 isa IntWrap && r2.x === 103 === r[].x && r2 !== r[]
end
end)()
+
+if Sys.ARCH == :aarch64
+ # On AArch64 we are following the `_Float16` ABI. Buthe these functions expect `Int16`.
+ # TODO: SHould we have `Chalf == Int16` and `Cfloat16 == Float16`?
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ truncsfhf2(x::Float32) = reinterpret(Float16, ccall("extern __truncsfhf2", llvmcall, Int16, (Float32,), x))
+ gnu_f2h_ieee(x::Float32) = reinterpret(Float16, ccall("extern __gnu_f2h_ieee", llvmcall, Int16, (Float32,), x))
+ truncdfhf2(x::Float64) = reinterpret(Float16, ccall("extern __truncdfhf2", llvmcall, Int16, (Float64,), x))
+else
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Float16,), x)
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Float16,), x)
+ truncsfhf2(x::Float32) = ccall("extern __truncsfhf2", llvmcall, Float16, (Float32,), x)
+ gnu_f2h_ieee(x::Float32) = ccall("extern __gnu_f2h_ieee", llvmcall, Float16, (Float32,), x)
+ truncdfhf2(x::Float64) = ccall("extern __truncdfhf2", llvmcall, Float16, (Float64,), x)
+end
+
+@testset "Float16 intrinsics (crt)" begin
+ @test extendhfsf2(Float16(3.3)) == 3.3007812f0
+ @test gnu_h2f_ieee(Float16(3.3)) == 3.3007812f0
+ @test truncsfhf2(3.3f0) == Float16(3.3)
+ @test gnu_f2h_ieee(3.3f0) == Float16(3.3)
+ @test truncdfhf2(3.3) == Float16(3.3)
+end 1.8.0-rc1--- src/aotcompile.cpp 2022-06-29 15:38:07.416161000 +0000
+++ src/aotcompile.cpp 2022-06-29 16:36:32.101927553 +0000
@@ -50,6 +50,7 @@
#include <llvm/MC/MCCodeEmitter.h>
#include <llvm/Support/CodeGen.h>
+#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/LegacyPassManagers.h>
#include <llvm/Transforms/Utils/Cloning.h>
@@ -446,6 +447,24 @@
jl_safe_printf("ERROR: failed to emit output file %s\n", err.c_str());
}
+static void injectCRTAlias(Module &M, StringRef name, StringRef alias, FunctionType *FT)
+{
+ Function *target = M.getFunction(alias);
+ if (!target) {
+ target = Function::Create(FT, Function::ExternalLinkage, alias, M);
+ }
+ // Weak so that this does not get discarded
+ // maybe use llvm.compiler.used instead?
+ Function *interposer = Function::Create(FT, Function::WeakAnyLinkage, name, M);
+
+ llvm::IRBuilder<> builder(BasicBlock::Create(M.getContext(), "top", interposer));
+ SmallVector<Value *, 4> CallArgs;
+ for (auto &arg : interposer->args())
+ CallArgs.push_back(&arg);
+ auto val = builder.CreateCall(target, CallArgs);
+ builder.CreateRet(val);
+}
+
// takes the running content that has collected in the shadow module and dump it to disk
// this builds the object file portion of the sysimage files for fast startup
@@ -551,6 +570,20 @@
"jl_RTLD_DEFAULT_handle_pointer"));
}
+ // We would like to emit an alias or an weakref alias to redirect these symbols
+ // but LLVM doesn't let us emit a GlobalAlias to a declaration...
+ // So for now we inject a definition of these functions that calls our runtime functions.
+ injectCRTAlias(*data->M, "__gnu_h2f_ieee", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(*data->M, "__extendhfsf2", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(*data->M, "__gnu_f2h_ieee", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(*data->M, "__truncsfhf2", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(*data->M, "__truncdfhf2", "julia__truncdfhf2",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getDoubleTy(Context) }, false));
+
// do the actual work
auto add_output = [&] (Module &M, StringRef unopt_bc_Name, StringRef bc_Name, StringRef obj_Name, StringRef asm_Name) {
PM.run(M);
--- src/jitlayers.cpp 2022-06-29 15:38:07.440162000 +0000
+++ src/jitlayers.cpp 2022-06-29 16:38:19.841056942 +0000
@@ -728,12 +728,26 @@
}
JD.addToLinkOrder(GlobalJD, orc::JITDylibLookupFlags::MatchExportedSymbolsOnly);
+
+ orc::SymbolAliasMap jl_crt = {
+ { mangle("__gnu_h2f_ieee"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__extendhfsf2"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__gnu_f2h_ieee"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncsfhf2"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncdfhf2"), { mangle("julia__truncdfhf2"), JITSymbolFlags::Exported } }
+ };
+ cantFail(GlobalJD.define(orc::symbolAliases(jl_crt)));
}
-void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+orc::SymbolStringPtr JuliaOJIT::mangle(StringRef Name)
{
std::string MangleName = getMangledName(Name);
- cantFail(JD.define(orc::absoluteSymbols({{ES.intern(MangleName), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
+ return ES.intern(MangleName);
+}
+
+void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+{
+ cantFail(JD.define(orc::absoluteSymbols({{mangle(Name), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
}
void JuliaOJIT::addModule(std::unique_ptr<Module> M)
--- src/jitlayers.h 2022-06-29 18:41:05.689863399 +0200
+++ src/jitlayers.h 2022-06-29 18:45:27.071795560 +0200
@@ -204,6 +204,7 @@
void RegisterJITEventListener(JITEventListener *L);
#endif
+ orc::SymbolStringPtr mangle(StringRef Name);
void addGlobalMapping(StringRef Name, uint64_t Addr);
void addModule(std::unique_ptr<Module> M);
--- test/intrinsics.jl 2022-06-29 15:38:07.584165000 +0000
+++ test/intrinsics.jl 2022-06-29 16:56:50.640396691 +0000
@@ -284,3 +284,27 @@
@test r2 isa IntWrap && r2.x === 103 === r[].x && r2 !== r[]
end
end)()
+
+if Sys.ARCH == :aarch64
+ # On AArch64 we are following the `_Float16` ABI. Buthe these functions expect `Int16`.
+ # TODO: SHould we have `Chalf == Int16` and `Cfloat16 == Float16`?
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ truncsfhf2(x::Float32) = reinterpret(Float16, ccall("extern __truncsfhf2", llvmcall, Int16, (Float32,), x))
+ gnu_f2h_ieee(x::Float32) = reinterpret(Float16, ccall("extern __gnu_f2h_ieee", llvmcall, Int16, (Float32,), x))
+ truncdfhf2(x::Float64) = reinterpret(Float16, ccall("extern __truncdfhf2", llvmcall, Int16, (Float64,), x))
+else
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Float16,), x)
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Float16,), x)
+ truncsfhf2(x::Float32) = ccall("extern __truncsfhf2", llvmcall, Float16, (Float32,), x)
+ gnu_f2h_ieee(x::Float32) = ccall("extern __gnu_f2h_ieee", llvmcall, Float16, (Float32,), x)
+ truncdfhf2(x::Float64) = ccall("extern __truncdfhf2", llvmcall, Float16, (Float64,), x)
+end
+
+@testset "Float16 intrinsics (crt)" begin
+ @test extendhfsf2(Float16(3.3)) == 3.3007812f0
+ @test gnu_h2f_ieee(Float16(3.3)) == 3.3007812f0
+ @test truncsfhf2(3.3f0) == Float16(3.3)
+ @test gnu_f2h_ieee(3.3f0) == Float16(3.3)
+ @test truncdfhf2(3.3) == Float16(3.3)
+end |
When using weak symbols, the WinCOFFObjectWriter keeps a list (`WeakDefaults`) that's used to make names unique. This list should be reset when the object writer is reset, because otherwise reuse of the object writer can result in freed symbols being accessed. With some added output, this becomes clear when using `llc` in `--run-twice` mode: ``` $ ./llc --compile-twice -mtriple=x86_64-pc-win32 trivial.ll -filetype=obj DefineSymbol::WeakDefaults - .weak.foo.default - .weak.bar.default DefineSymbol::WeakDefaults - .weak.foo.default - áÑJij⌂ p§┼Ø┐☺ - .debug_macinfo.dw - .weak.bar.default ``` This does not seem to leak into the output object file though, so I couldn't come up with a test. I added one that just does `--run-twice` (and verified that it does access freed memory), which should result in detecting the invalid memory accesses when running under ASAN. Observed in a Julia PR where we started using weak symbols: JuliaLang/julia#45649 Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D129840
CI failures:
Crucially both windows bots are happy 🎉 So this looks good to go for me. |
Thanks Tim for getting this across the finish line! LGTM! |
- Put the interposer in llvm.compiler.used. - Injecting the aliases after optimization: Our multiversioning pass interacts badly with the llvm.compiler.used gvar. Co-authored-by: Tim Besard <tim.besard@gmail.com> Co-authored-by: Valentin Churavy <v.churavy@gmail.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
@KristofferC, if you need it, here is a manual backport of the merged PR for all concerned versions, with tests passing: 1.6.7--- src/APInt-C.cpp 2022-06-29 17:37:58.943951000 +0200
+++ src/APInt-C.cpp 2022-06-29 17:39:56.742904521 +0200
@@ -316,7 +316,7 @@
void LLVMFPtoInt(unsigned numbits, void *pa, unsigned onumbits, integerPart *pr, bool isSigned, bool *isExact) {
double Val;
if (numbits == 16)
- Val = __gnu_h2f_ieee(*(uint16_t*)pa);
+ Val = julia__gnu_h2f_ieee(*(uint16_t*)pa);
else if (numbits == 32)
Val = *(float*)pa;
else if (numbits == 64)
@@ -391,7 +391,7 @@
val = a.roundToDouble(true);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
@@ -408,7 +408,7 @@
val = a.roundToDouble(false);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
--- src/aotcompile.cpp 2022-06-29 17:37:58.943951000 +0200
+++ src/aotcompile.cpp 2022-07-22 10:09:59.465318017 +0200
@@ -51,8 +51,10 @@
#include <llvm/Support/CodeGen.h>
#endif
+#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/LegacyPassManagers.h>
#include <llvm/Transforms/Utils/Cloning.h>
+#include <llvm/Transforms/Utils/ModuleUtils.h>
using namespace llvm;
@@ -276,6 +278,23 @@
*ci_out = codeinst;
}
+static void injectCRTAlias(Module &M, StringRef name, StringRef alias, FunctionType *FT)
+{
+ Function *target = M.getFunction(alias);
+ if (!target) {
+ target = Function::Create(FT, Function::ExternalLinkage, alias, M);
+ }
+ Function *interposer = Function::Create(FT, Function::WeakAnyLinkage, name, M);
+ appendToCompilerUsed(M, {interposer});
+
+ llvm::IRBuilder<> builder(BasicBlock::Create(M.getContext(), "top", interposer));
+ SmallVector<Value *, 4> CallArgs;
+ for (auto &arg : interposer->args())
+ CallArgs.push_back(&arg);
+ auto val = builder.CreateCall(target, CallArgs);
+ builder.CreateRet(val);
+}
+
// takes the running content that has collected in the shadow module and dump it to disk
// this builds the object file portion of the sysimage files for fast startup, and can
// also be used be extern consumers like GPUCompiler.jl to obtain a module containing
@@ -556,7 +575,22 @@
// do the actual work
auto add_output = [&] (Module &M, StringRef unopt_bc_Name, StringRef bc_Name, StringRef obj_Name, StringRef asm_Name) {
+ // We would like to emit an alias or an weakref alias to redirect these symbols
+ // but LLVM doesn't let us emit a GlobalAlias to a declaration...
+ // So for now we inject a definition of these functions that calls our runtime functions.
+ injectCRTAlias(M, "__gnu_h2f_ieee", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(M, "__extendhfsf2", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(M, "__gnu_f2h_ieee", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(M, "__truncsfhf2", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(M, "__truncdfhf2", "julia__truncdfhf2",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getDoubleTy(Context) }, false));
+
PM.run(M);
+
if (unopt_bc_fname)
emit_result(unopt_bc_Archive, unopt_bc_Buffer, unopt_bc_Name, outputs);
if (bc_fname)
--- src/julia.expmap 2022-06-29 17:37:58.987952000 +0200
+++ src/julia.expmap 2022-06-29 17:40:28.643715568 +0200
@@ -42,12 +42,6 @@
environ;
__progname;
- /* compiler run-time intrinsics */
- __gnu_h2f_ieee;
- __extendhfsf2;
- __gnu_f2h_ieee;
- __truncdfhf2;
-
local:
*;
};
--- src/julia_internal.h 2022-06-29 17:37:58.991953000 +0200
+++ src/julia_internal.h 2022-06-29 17:42:47.155284019 +0200
@@ -1363,8 +1363,9 @@
#define JL_GC_ASSERT_LIVE(x) (void)(x)
#endif
-float __gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
-uint16_t __gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__truncdfhf2(double param) JL_NOTSAFEPOINT;
#ifdef __cplusplus
}
--- src/runtime_intrinsics.c 2022-06-29 17:37:59.003953000 +0200
+++ src/runtime_intrinsics.c 2022-07-19 18:37:28.928908192 +0200
@@ -169,9 +169,9 @@
}
#define fp_select(a, func) \
- sizeof(a) == sizeof(float) ? func##f((float)a) : func(a)
+ sizeof(a) <= sizeof(float) ? func##f((float)a) : func(a)
#define fp_select2(a, b, func) \
- sizeof(a) == sizeof(float) ? func##f(a, b) : func(a, b)
+ sizeof(a) <= sizeof(float) ? func##f(a, b) : func(a, b)
// fast-function generators //
@@ -215,11 +215,11 @@
static inline void name(unsigned osize, void *pa, void *pr) JL_NOTSAFEPOINT \
{ \
uint16_t a = *(uint16_t*)pa; \
- float A = __gnu_h2f_ieee(a); \
+ float A = julia__gnu_h2f_ieee(a); \
if (osize == 16) { \
float R; \
OP(&R, A); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
} else { \
OP((uint16_t*)pr, A); \
} \
@@ -243,11 +243,11 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
float R = OP(A, B); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
// float or integer inputs, bool output
@@ -268,8 +268,8 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
return OP(A, B); \
}
@@ -309,12 +309,12 @@
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
uint16_t c = *(uint16_t*)pc; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
- float C = __gnu_h2f_ieee(c); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
+ float C = julia__gnu_h2f_ieee(c); \
runtime_nbits = 16; \
float R = OP(A, B, C); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
@@ -832,7 +832,7 @@
fpiseq_n(float, 32)
fpiseq_n(double, 64)
#define fpiseq(a,b) \
- sizeof(a) == sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
+ sizeof(a) <= sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
#define fpislt_n(c_type, nbits) \
static inline int fpislt##nbits(c_type a, c_type b) JL_NOTSAFEPOINT \
@@ -903,7 +903,7 @@
if (!(osize < 8 * sizeof(a))) \
jl_error("fptrunc: output bitsize must be < input bitsize"); \
else if (osize == 16) \
- *(uint16_t*)pr = __gnu_f2h_ieee(a); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(a); \
else if (osize == 32) \
*(float*)pr = a; \
else if (osize == 64) \
--- src/jitlayers.cpp 2022-06-29 17:37:58.975952000 +0200
+++ src/jitlayers.cpp 2022-06-29 17:45:50.344097088 +0200
@@ -737,12 +737,26 @@
}
JD.addToLinkOrder(GlobalJD, orc::JITDylibLookupFlags::MatchExportedSymbolsOnly);
+
+ orc::SymbolAliasMap jl_crt = {
+ { mangle("__gnu_h2f_ieee"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__extendhfsf2"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__gnu_f2h_ieee"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncsfhf2"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncdfhf2"), { mangle("julia__truncdfhf2"), JITSymbolFlags::Exported } }
+ };
+ cantFail(GlobalJD.define(orc::symbolAliases(jl_crt)));
}
-void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+orc::SymbolStringPtr JuliaOJIT::mangle(StringRef Name)
{
std::string MangleName = getMangledName(Name);
- cantFail(JD.define(orc::absoluteSymbols({{ES.intern(MangleName), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
+ return ES.intern(MangleName);
+}
+
+void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+{
+ cantFail(JD.define(orc::absoluteSymbols({{mangle(Name), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
}
void JuliaOJIT::addModule(std::unique_ptr<Module> M)
--- src/jitlayers.h 2022-06-29 17:37:58.975952000 +0200
+++ src/jitlayers.h 2022-06-29 17:46:24.985016703 +0200
@@ -185,6 +185,7 @@
const object::ObjectFile &Obj,
const RuntimeDyld::LoadedObjectInfo &LoadedObjectInfo);
#endif
+ orc::SymbolStringPtr mangle(StringRef Name);
void addGlobalMapping(StringRef Name, uint64_t Addr);
void addModule(std::unique_ptr<Module> M);
#if JL_LLVM_VERSION < 120000
--- src/intrinsics.cpp 2022-06-29 18:28:06.923128000 +0200
+++ src/intrinsics.cpp 2022-06-29 18:30:30.343357962 +0200
@@ -1476,22 +1476,17 @@
#if !defined(_OS_DARWIN_) // xcode already links compiler-rt
-extern "C" JL_DLLEXPORT float __gnu_h2f_ieee(uint16_t param)
+extern "C" JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param)
{
return half_to_float(param);
}
-extern "C" JL_DLLEXPORT float __extendhfsf2(uint16_t param)
-{
- return half_to_float(param);
-}
-
-extern "C" JL_DLLEXPORT uint16_t __gnu_f2h_ieee(float param)
+extern "C" JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param)
{
return float_to_half(param);
}
-extern "C" JL_DLLEXPORT uint16_t __truncdfhf2(double param)
+extern "C" JL_DLLEXPORT uint16_t julia__truncdfhf2(double param)
{
return float_to_half((float)param);
}
--- test/intrinsics.jl 2022-06-29 17:37:59.139956000 +0200
+++ test/intrinsics.jl 2022-06-29 17:49:07.285356548 +0200
@@ -152,3 +152,27 @@
@test_intrinsic Core.Intrinsics.fptosi Int Float16(3.3) 3
@test_intrinsic Core.Intrinsics.fptoui UInt Float16(3.3) UInt(3)
end
+
+if Sys.ARCH == :aarch64
+ # On AArch64 we are following the `_Float16` ABI. Buthe these functions expect `Int16`.
+ # TODO: SHould we have `Chalf == Int16` and `Cfloat16 == Float16`?
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ truncsfhf2(x::Float32) = reinterpret(Float16, ccall("extern __truncsfhf2", llvmcall, Int16, (Float32,), x))
+ gnu_f2h_ieee(x::Float32) = reinterpret(Float16, ccall("extern __gnu_f2h_ieee", llvmcall, Int16, (Float32,), x))
+ truncdfhf2(x::Float64) = reinterpret(Float16, ccall("extern __truncdfhf2", llvmcall, Int16, (Float64,), x))
+else
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Float16,), x)
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Float16,), x)
+ truncsfhf2(x::Float32) = ccall("extern __truncsfhf2", llvmcall, Float16, (Float32,), x)
+ gnu_f2h_ieee(x::Float32) = ccall("extern __gnu_f2h_ieee", llvmcall, Float16, (Float32,), x)
+ truncdfhf2(x::Float64) = ccall("extern __truncdfhf2", llvmcall, Float16, (Float64,), x)
+end
+
+@testset "Float16 intrinsics (crt)" begin
+ @test extendhfsf2(Float16(3.3)) == 3.3007812f0
+ @test gnu_h2f_ieee(Float16(3.3)) == 3.3007812f0
+ @test truncsfhf2(3.3f0) == Float16(3.3)
+ @test gnu_f2h_ieee(3.3f0) == Float16(3.3)
+ @test truncdfhf2(3.3) == Float16(3.3)
+end 1.7.3--- src/APInt-C.cpp 2022-06-29 17:38:07.412161000 +0200
+++ src/APInt-C.cpp 2022-06-29 18:03:07.396275264 +0200
@@ -316,7 +316,7 @@
void LLVMFPtoInt(unsigned numbits, void *pa, unsigned onumbits, integerPart *pr, bool isSigned, bool *isExact) {
double Val;
if (numbits == 16)
- Val = __gnu_h2f_ieee(*(uint16_t*)pa);
+ Val = julia__gnu_h2f_ieee(*(uint16_t*)pa);
else if (numbits == 32)
Val = *(float*)pa;
else if (numbits == 64)
@@ -391,7 +391,7 @@
val = a.roundToDouble(true);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
@@ -408,7 +408,7 @@
val = a.roundToDouble(false);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
--- src/aotcompile.cpp 2022-06-29 17:38:07.416161000 +0200
+++ src/aotcompile.cpp 2022-07-22 10:08:25.371800696 +0200
@@ -50,8 +50,10 @@
#include <llvm/MC/MCCodeEmitter.h>
#include <llvm/Support/CodeGen.h>
+#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/LegacyPassManagers.h>
#include <llvm/Transforms/Utils/Cloning.h>
+#include <llvm/Transforms/Utils/ModuleUtils.h>
using namespace llvm;
@@ -446,6 +448,23 @@
jl_safe_printf("ERROR: failed to emit output file %s\n", err.c_str());
}
+static void injectCRTAlias(Module &M, StringRef name, StringRef alias, FunctionType *FT)
+{
+ Function *target = M.getFunction(alias);
+ if (!target) {
+ target = Function::Create(FT, Function::ExternalLinkage, alias, M);
+ }
+ Function *interposer = Function::Create(FT, Function::WeakAnyLinkage, name, M);
+ appendToCompilerUsed(M, {interposer});
+
+ llvm::IRBuilder<> builder(BasicBlock::Create(M.getContext(), "top", interposer));
+ SmallVector<Value *, 4> CallArgs;
+ for (auto &arg : interposer->args())
+ CallArgs.push_back(&arg);
+ auto val = builder.CreateCall(target, CallArgs);
+ builder.CreateRet(val);
+}
+
// takes the running content that has collected in the shadow module and dump it to disk
// this builds the object file portion of the sysimage files for fast startup
@@ -553,7 +572,22 @@
// do the actual work
auto add_output = [&] (Module &M, StringRef unopt_bc_Name, StringRef bc_Name, StringRef obj_Name, StringRef asm_Name) {
+ // We would like to emit an alias or an weakref alias to redirect these symbols
+ // but LLVM doesn't let us emit a GlobalAlias to a declaration...
+ // So for now we inject a definition of these functions that calls our runtime functions.
+ injectCRTAlias(M, "__gnu_h2f_ieee", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(M, "__extendhfsf2", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(M, "__gnu_f2h_ieee", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(M, "__truncsfhf2", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(M, "__truncdfhf2", "julia__truncdfhf2",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getDoubleTy(Context) }, false));
+
PM.run(M);
+
if (unopt_bc_fname)
emit_result(unopt_bc_Archive, unopt_bc_Buffer, unopt_bc_Name, outputs);
if (bc_fname)
--- src/julia.expmap 2022-06-29 17:38:07.444162000 +0200
+++ src/julia.expmap 2022-06-29 18:02:38.471479518 +0200
@@ -42,12 +42,6 @@
environ;
__progname;
- /* compiler run-time intrinsics */
- __gnu_h2f_ieee;
- __extendhfsf2;
- __gnu_f2h_ieee;
- __truncdfhf2;
-
local:
*;
};
--- src/julia_internal.h 2022-06-29 17:38:07.448162000 +0200
+++ src/julia_internal.h 2022-06-29 18:03:58.453680503 +0200
@@ -1427,8 +1427,9 @@
#define JL_GC_ASSERT_LIVE(x) (void)(x)
#endif
-float __gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
-uint16_t __gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__truncdfhf2(double param) JL_NOTSAFEPOINT;
#ifdef __cplusplus
}
--- src/runtime_intrinsics.c 2022-06-29 17:38:07.456162000 +0200
+++ src/runtime_intrinsics.c 2022-06-29 18:05:46.116645907 +0200
@@ -338,9 +338,9 @@
}
#define fp_select(a, func) \
- sizeof(a) == sizeof(float) ? func##f((float)a) : func(a)
+ sizeof(a) <= sizeof(float) ? func##f((float)a) : func(a)
#define fp_select2(a, b, func) \
- sizeof(a) == sizeof(float) ? func##f(a, b) : func(a, b)
+ sizeof(a) <= sizeof(float) ? func##f(a, b) : func(a, b)
// fast-function generators //
@@ -384,11 +384,11 @@
static inline void name(unsigned osize, void *pa, void *pr) JL_NOTSAFEPOINT \
{ \
uint16_t a = *(uint16_t*)pa; \
- float A = __gnu_h2f_ieee(a); \
+ float A = julia__gnu_h2f_ieee(a); \
if (osize == 16) { \
float R; \
OP(&R, A); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
} else { \
OP((uint16_t*)pr, A); \
} \
@@ -412,11 +412,11 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
float R = OP(A, B); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
// float or integer inputs, bool output
@@ -437,8 +437,8 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
return OP(A, B); \
}
@@ -478,12 +478,12 @@
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
uint16_t c = *(uint16_t*)pc; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
- float C = __gnu_h2f_ieee(c); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
+ float C = julia__gnu_h2f_ieee(c); \
runtime_nbits = 16; \
float R = OP(A, B, C); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
@@ -1001,7 +1001,7 @@
fpiseq_n(float, 32)
fpiseq_n(double, 64)
#define fpiseq(a,b) \
- sizeof(a) == sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
+ sizeof(a) <= sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
bool_fintrinsic(eq,eq_float)
bool_fintrinsic(ne,ne_float)
@@ -1050,7 +1050,7 @@
if (!(osize < 8 * sizeof(a))) \
jl_error("fptrunc: output bitsize must be < input bitsize"); \
else if (osize == 16) \
- *(uint16_t*)pr = __gnu_f2h_ieee(a); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(a); \
else if (osize == 32) \
*(float*)pr = a; \
else if (osize == 64) \
--- src/jitlayers.cpp 2022-06-29 17:38:07.440162000 +0200
+++ src/jitlayers.cpp 2022-06-29 18:38:19.841056942 +0200
@@ -728,12 +728,26 @@
}
JD.addToLinkOrder(GlobalJD, orc::JITDylibLookupFlags::MatchExportedSymbolsOnly);
+
+ orc::SymbolAliasMap jl_crt = {
+ { mangle("__gnu_h2f_ieee"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__extendhfsf2"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__gnu_f2h_ieee"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncsfhf2"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncdfhf2"), { mangle("julia__truncdfhf2"), JITSymbolFlags::Exported } }
+ };
+ cantFail(GlobalJD.define(orc::symbolAliases(jl_crt)));
}
-void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+orc::SymbolStringPtr JuliaOJIT::mangle(StringRef Name)
{
std::string MangleName = getMangledName(Name);
- cantFail(JD.define(orc::absoluteSymbols({{ES.intern(MangleName), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
+ return ES.intern(MangleName);
+}
+
+void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+{
+ cantFail(JD.define(orc::absoluteSymbols({{mangle(Name), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
}
void JuliaOJIT::addModule(std::unique_ptr<Module> M)
--- src/jitlayers.h 2022-06-29 17:38:07.440162000 +0200
+++ src/jitlayers.h 2022-06-29 18:08:04.044478978 +0200
@@ -182,6 +182,7 @@
const object::ObjectFile &Obj,
const RuntimeDyld::LoadedObjectInfo &LoadedObjectInfo);
#endif
+ orc::SymbolStringPtr mangle(StringRef Name);
void addGlobalMapping(StringRef Name, uint64_t Addr);
void addModule(std::unique_ptr<Module> M);
#if JL_LLVM_VERSION < 120000
--- src/intrinsics.cpp 2022-06-29 18:26:53.104938000 +0200
+++ src/intrinsics.cpp 2022-06-29 18:31:32.729189496 +0200
@@ -1635,22 +1635,17 @@
#if !defined(_OS_DARWIN_) // xcode already links compiler-rt
-extern "C" JL_DLLEXPORT float __gnu_h2f_ieee(uint16_t param)
+extern "C" JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param)
{
return half_to_float(param);
}
-extern "C" JL_DLLEXPORT float __extendhfsf2(uint16_t param)
-{
- return half_to_float(param);
-}
-
-extern "C" JL_DLLEXPORT uint16_t __gnu_f2h_ieee(float param)
+extern "C" JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param)
{
return float_to_half(param);
}
-extern "C" JL_DLLEXPORT uint16_t __truncdfhf2(double param)
+extern "C" JL_DLLEXPORT uint16_t julia__truncdfhf2(double param)
{
float res = (float)param;
uint32_t resi;
--- test/intrinsics.jl 2022-06-29 17:38:07.584165000 +0200
+++ test/intrinsics.jl 2022-06-29 18:56:50.640396691 +0200
@@ -284,3 +284,27 @@
@test r2 isa IntWrap && r2.x === 103 === r[].x && r2 !== r[]
end
end)()
+
+if Sys.ARCH == :aarch64
+ # On AArch64 we are following the `_Float16` ABI. Buthe these functions expect `Int16`.
+ # TODO: SHould we have `Chalf == Int16` and `Cfloat16 == Float16`?
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ truncsfhf2(x::Float32) = reinterpret(Float16, ccall("extern __truncsfhf2", llvmcall, Int16, (Float32,), x))
+ gnu_f2h_ieee(x::Float32) = reinterpret(Float16, ccall("extern __gnu_f2h_ieee", llvmcall, Int16, (Float32,), x))
+ truncdfhf2(x::Float64) = reinterpret(Float16, ccall("extern __truncdfhf2", llvmcall, Int16, (Float64,), x))
+else
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Float16,), x)
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Float16,), x)
+ truncsfhf2(x::Float32) = ccall("extern __truncsfhf2", llvmcall, Float16, (Float32,), x)
+ gnu_f2h_ieee(x::Float32) = ccall("extern __gnu_f2h_ieee", llvmcall, Float16, (Float32,), x)
+ truncdfhf2(x::Float64) = ccall("extern __truncdfhf2", llvmcall, Float16, (Float64,), x)
+end
+
+@testset "Float16 intrinsics (crt)" begin
+ @test extendhfsf2(Float16(3.3)) == 3.3007812f0
+ @test gnu_h2f_ieee(Float16(3.3)) == 3.3007812f0
+ @test truncsfhf2(3.3f0) == Float16(3.3)
+ @test gnu_f2h_ieee(3.3f0) == Float16(3.3)
+ @test truncdfhf2(3.3) == Float16(3.3)
+end 1.8.0-rc3--- src/APInt-C.cpp 2022-06-29 15:38:07.412161000 +0000
+++ src/APInt-C.cpp 2022-06-29 16:03:07.396275264 +0000
@@ -316,7 +316,7 @@
void LLVMFPtoInt(unsigned numbits, void *pa, unsigned onumbits, integerPart *pr, bool isSigned, bool *isExact) {
double Val;
if (numbits == 16)
- Val = __gnu_h2f_ieee(*(uint16_t*)pa);
+ Val = julia__gnu_h2f_ieee(*(uint16_t*)pa);
else if (numbits == 32)
Val = *(float*)pa;
else if (numbits == 64)
@@ -391,7 +391,7 @@
val = a.roundToDouble(true);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
@@ -408,7 +408,7 @@
val = a.roundToDouble(false);
}
if (onumbits == 16)
- *(uint16_t*)pr = __gnu_f2h_ieee(val);
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(val);
else if (onumbits == 32)
*(float*)pr = val;
else if (onumbits == 64)
--- src/aotcompile.cpp 2022-06-29 15:38:07.416161000 +0000
+++ src/aotcompile.cpp 2022-07-19 16:43:52.586543207 +0000
@@ -50,8 +50,10 @@
#include <llvm/MC/MCCodeEmitter.h>
#include <llvm/Support/CodeGen.h>
+#include <llvm/IR/IRBuilder.h>
#include <llvm/IR/LegacyPassManagers.h>
#include <llvm/Transforms/Utils/Cloning.h>
+#include <llvm/Transforms/Utils/ModuleUtils.h>
using namespace llvm;
@@ -446,6 +448,23 @@
jl_safe_printf("ERROR: failed to emit output file %s\n", err.c_str());
}
+static void injectCRTAlias(Module &M, StringRef name, StringRef alias, FunctionType *FT)
+{
+ Function *target = M.getFunction(alias);
+ if (!target) {
+ target = Function::Create(FT, Function::ExternalLinkage, alias, M);
+ }
+ Function *interposer = Function::Create(FT, Function::WeakAnyLinkage, name, M);
+ appendToCompilerUsed(M, {interposer});
+
+ llvm::IRBuilder<> builder(BasicBlock::Create(M.getContext(), "top", interposer));
+ SmallVector<Value *, 4> CallArgs;
+ for (auto &arg : interposer->args())
+ CallArgs.push_back(&arg);
+ auto val = builder.CreateCall(target, CallArgs);
+ builder.CreateRet(val);
+}
+
// takes the running content that has collected in the shadow module and dump it to disk
// this builds the object file portion of the sysimage files for fast startup
@@ -553,7 +572,22 @@
// do the actual work
auto add_output = [&] (Module &M, StringRef unopt_bc_Name, StringRef bc_Name, StringRef obj_Name, StringRef asm_Name) {
+ // We would like to emit an alias or an weakref alias to redirect these symbols
+ // but LLVM doesn't let us emit a GlobalAlias to a declaration...
+ // So for now we inject a definition of these functions that calls our runtime functions.
+ injectCRTAlias(M, "__gnu_h2f_ieee", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(M, "__extendhfsf2", "julia__gnu_h2f_ieee",
+ FunctionType::get(Type::getFloatTy(Context), { Type::getHalfTy(Context) }, false));
+ injectCRTAlias(M, "__gnu_f2h_ieee", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(M, "__truncsfhf2", "julia__gnu_f2h_ieee",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getFloatTy(Context) }, false));
+ injectCRTAlias(M, "__truncdfhf2", "julia__truncdfhf2",
+ FunctionType::get(Type::getHalfTy(Context), { Type::getDoubleTy(Context) }, false));
+
PM.run(M);
+
if (unopt_bc_fname)
emit_result(unopt_bc_Archive, unopt_bc_Buffer, unopt_bc_Name, outputs);
if (bc_fname)
--- src/julia.expmap 2022-06-29 15:38:07.444162000 +0000
+++ src/julia.expmap 2022-06-29 16:02:38.471479518 +0000
@@ -42,12 +42,6 @@
environ;
__progname;
- /* compiler run-time intrinsics */
- __gnu_h2f_ieee;
- __extendhfsf2;
- __gnu_f2h_ieee;
- __truncdfhf2;
-
local:
*;
};
--- src/julia_internal.h 2022-06-29 15:38:07.448162000 +0000
+++ src/julia_internal.h 2022-06-29 16:03:58.453680503 +0000
@@ -1427,8 +1427,9 @@
#define JL_GC_ASSERT_LIVE(x) (void)(x)
#endif
-float __gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
-uint16_t __gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param) JL_NOTSAFEPOINT;
+JL_DLLEXPORT uint16_t julia__truncdfhf2(double param) JL_NOTSAFEPOINT;
#ifdef __cplusplus
}
--- src/runtime_intrinsics.c 2022-06-29 15:38:07.456162000 +0000
+++ src/runtime_intrinsics.c 2022-06-29 16:05:46.116645907 +0000
@@ -188,22 +188,17 @@
return h;
}
-JL_DLLEXPORT float __gnu_h2f_ieee(uint16_t param)
+JL_DLLEXPORT float julia__gnu_h2f_ieee(uint16_t param)
{
return half_to_float(param);
}
-JL_DLLEXPORT float __extendhfsf2(uint16_t param)
-{
- return half_to_float(param);
-}
-
-JL_DLLEXPORT uint16_t __gnu_f2h_ieee(float param)
+JL_DLLEXPORT uint16_t julia__gnu_f2h_ieee(float param)
{
return float_to_half(param);
}
-JL_DLLEXPORT uint16_t __truncdfhf2(double param)
+JL_DLLEXPORT uint16_t julia__truncdfhf2(double param)
{
float res = (float)param;
uint32_t resi;
@@ -338,9 +338,9 @@
}
#define fp_select(a, func) \
- sizeof(a) == sizeof(float) ? func##f((float)a) : func(a)
+ sizeof(a) <= sizeof(float) ? func##f((float)a) : func(a)
#define fp_select2(a, b, func) \
- sizeof(a) == sizeof(float) ? func##f(a, b) : func(a, b)
+ sizeof(a) <= sizeof(float) ? func##f(a, b) : func(a, b)
// fast-function generators //
@@ -384,11 +384,11 @@
static inline void name(unsigned osize, void *pa, void *pr) JL_NOTSAFEPOINT \
{ \
uint16_t a = *(uint16_t*)pa; \
- float A = __gnu_h2f_ieee(a); \
+ float A = julia__gnu_h2f_ieee(a); \
if (osize == 16) { \
float R; \
OP(&R, A); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
} else { \
OP((uint16_t*)pr, A); \
} \
@@ -412,11 +412,11 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
float R = OP(A, B); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
// float or integer inputs, bool output
@@ -437,8 +437,8 @@
{ \
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
runtime_nbits = 16; \
return OP(A, B); \
}
@@ -478,12 +478,12 @@
uint16_t a = *(uint16_t*)pa; \
uint16_t b = *(uint16_t*)pb; \
uint16_t c = *(uint16_t*)pc; \
- float A = __gnu_h2f_ieee(a); \
- float B = __gnu_h2f_ieee(b); \
- float C = __gnu_h2f_ieee(c); \
+ float A = julia__gnu_h2f_ieee(a); \
+ float B = julia__gnu_h2f_ieee(b); \
+ float C = julia__gnu_h2f_ieee(c); \
runtime_nbits = 16; \
float R = OP(A, B, C); \
- *(uint16_t*)pr = __gnu_f2h_ieee(R); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(R); \
}
@@ -1001,7 +1001,7 @@
fpiseq_n(float, 32)
fpiseq_n(double, 64)
#define fpiseq(a,b) \
- sizeof(a) == sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
+ sizeof(a) <= sizeof(float) ? fpiseq32(a, b) : fpiseq64(a, b)
bool_fintrinsic(eq,eq_float)
bool_fintrinsic(ne,ne_float)
@@ -1050,7 +1050,7 @@
if (!(osize < 8 * sizeof(a))) \
jl_error("fptrunc: output bitsize must be < input bitsize"); \
else if (osize == 16) \
- *(uint16_t*)pr = __gnu_f2h_ieee(a); \
+ *(uint16_t*)pr = julia__gnu_f2h_ieee(a); \
else if (osize == 32) \
*(float*)pr = a; \
else if (osize == 64) \
--- src/jitlayers.cpp 2022-06-29 15:38:07.440162000 +0000
+++ src/jitlayers.cpp 2022-06-29 16:38:19.841056942 +0000
@@ -728,12 +728,26 @@
}
JD.addToLinkOrder(GlobalJD, orc::JITDylibLookupFlags::MatchExportedSymbolsOnly);
+
+ orc::SymbolAliasMap jl_crt = {
+ { mangle("__gnu_h2f_ieee"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__extendhfsf2"), { mangle("julia__gnu_h2f_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__gnu_f2h_ieee"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncsfhf2"), { mangle("julia__gnu_f2h_ieee"), JITSymbolFlags::Exported } },
+ { mangle("__truncdfhf2"), { mangle("julia__truncdfhf2"), JITSymbolFlags::Exported } }
+ };
+ cantFail(GlobalJD.define(orc::symbolAliases(jl_crt)));
}
-void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+orc::SymbolStringPtr JuliaOJIT::mangle(StringRef Name)
{
std::string MangleName = getMangledName(Name);
- cantFail(JD.define(orc::absoluteSymbols({{ES.intern(MangleName), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
+ return ES.intern(MangleName);
+}
+
+void JuliaOJIT::addGlobalMapping(StringRef Name, uint64_t Addr)
+{
+ cantFail(JD.define(orc::absoluteSymbols({{mangle(Name), JITEvaluatedSymbol::fromPointer((void*)Addr)}})));
}
void JuliaOJIT::addModule(std::unique_ptr<Module> M)
--- src/jitlayers.h 2022-06-29 18:41:05.689863399 +0200
+++ src/jitlayers.h 2022-06-29 18:45:27.071795560 +0200
@@ -204,6 +204,7 @@
void RegisterJITEventListener(JITEventListener *L);
#endif
+ orc::SymbolStringPtr mangle(StringRef Name);
void addGlobalMapping(StringRef Name, uint64_t Addr);
void addModule(std::unique_ptr<Module> M);
--- test/intrinsics.jl 2022-06-29 15:38:07.584165000 +0000
+++ test/intrinsics.jl 2022-06-29 16:56:50.640396691 +0000
@@ -284,3 +284,27 @@
@test r2 isa IntWrap && r2.x === 103 === r[].x && r2 !== r[]
end
end)()
+
+if Sys.ARCH == :aarch64
+ # On AArch64 we are following the `_Float16` ABI. Buthe these functions expect `Int16`.
+ # TODO: SHould we have `Chalf == Int16` and `Cfloat16 == Float16`?
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Int16,), reinterpret(Int16, x))
+ truncsfhf2(x::Float32) = reinterpret(Float16, ccall("extern __truncsfhf2", llvmcall, Int16, (Float32,), x))
+ gnu_f2h_ieee(x::Float32) = reinterpret(Float16, ccall("extern __gnu_f2h_ieee", llvmcall, Int16, (Float32,), x))
+ truncdfhf2(x::Float64) = reinterpret(Float16, ccall("extern __truncdfhf2", llvmcall, Int16, (Float64,), x))
+else
+ extendhfsf2(x::Float16) = ccall("extern __extendhfsf2", llvmcall, Float32, (Float16,), x)
+ gnu_h2f_ieee(x::Float16) = ccall("extern __gnu_h2f_ieee", llvmcall, Float32, (Float16,), x)
+ truncsfhf2(x::Float32) = ccall("extern __truncsfhf2", llvmcall, Float16, (Float32,), x)
+ gnu_f2h_ieee(x::Float32) = ccall("extern __gnu_f2h_ieee", llvmcall, Float16, (Float32,), x)
+ truncdfhf2(x::Float64) = ccall("extern __truncdfhf2", llvmcall, Float16, (Float64,), x)
+end
+
+@testset "Float16 intrinsics (crt)" begin
+ @test extendhfsf2(Float16(3.3)) == 3.3007812f0
+ @test gnu_h2f_ieee(Float16(3.3)) == 3.3007812f0
+ @test truncsfhf2(3.3f0) == Float16(3.3)
+ @test gnu_f2h_ieee(3.3f0) == Float16(3.3)
+ @test truncdfhf2(3.3) == Float16(3.3)
+end
EDIT: I missed #46110, sorry. |
There's already a backport PR for 1.8: #46110. I didn't think we're going to backport this to 1.7 or 1.6 though? It would also need a backport of https://reviews.llvm.org/D129840 to all relevant LLVM branches, which hasn't happened yet. |
Backport "Emit aliases for FP16 conversion routines" (#45649) to 1.8
When using weak symbols, the WinCOFFObjectWriter keeps a list (`WeakDefaults`) that's used to make names unique. This list should be reset when the object writer is reset, because otherwise reuse of the object writer can result in freed symbols being accessed. With some added output, this becomes clear when using `llc` in `--run-twice` mode: ``` $ ./llc --compile-twice -mtriple=x86_64-pc-win32 trivial.ll -filetype=obj DefineSymbol::WeakDefaults - .weak.foo.default - .weak.bar.default DefineSymbol::WeakDefaults - .weak.foo.default - áÑJij⌂ p§┼Ø┐☺ - .debug_macinfo.dw - .weak.bar.default ``` This does not seem to leak into the output object file though, so I couldn't come up with a test. I added one that just does `--run-twice` (and verified that it does access freed memory), which should result in detecting the invalid memory accesses when running under ASAN. Observed in a Julia PR where we started using weak symbols: JuliaLang/julia#45649 Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D129840
When using weak symbols, the WinCOFFObjectWriter keeps a list (`WeakDefaults`) that's used to make names unique. This list should be reset when the object writer is reset, because otherwise reuse of the object writer can result in freed symbols being accessed. With some added output, this becomes clear when using `llc` in `--run-twice` mode: ``` $ ./llc --compile-twice -mtriple=x86_64-pc-win32 trivial.ll -filetype=obj DefineSymbol::WeakDefaults - .weak.foo.default - .weak.bar.default DefineSymbol::WeakDefaults - .weak.foo.default - áÑJij⌂ p§┼Ø┐☺ - .debug_macinfo.dw - .weak.bar.default ``` This does not seem to leak into the output object file though, so I couldn't come up with a test. I added one that just does `--run-twice` (and verified that it does access freed memory), which should result in detecting the invalid memory accesses when running under ASAN. Observed in a Julia PR where we started using weak symbols: JuliaLang/julia#45649 Differential Revision: https://reviews.llvm.org/D129840
When using weak symbols, the WinCOFFObjectWriter keeps a list (`WeakDefaults`) that's used to make names unique. This list should be reset when the object writer is reset, because otherwise reuse of the object writer can result in freed symbols being accessed. With some added output, this becomes clear when using `llc` in `--run-twice` mode: ``` $ ./llc --compile-twice -mtriple=x86_64-pc-win32 trivial.ll -filetype=obj DefineSymbol::WeakDefaults - .weak.foo.default - .weak.bar.default DefineSymbol::WeakDefaults - .weak.foo.default - áÑJij⌂ p§┼Ø┐☺ - .debug_macinfo.dw - .weak.bar.default ``` This does not seem to leak into the output object file though, so I couldn't come up with a test. I added one that just does `--run-twice` (and verified that it does access freed memory), which should result in detecting the invalid memory accesses when running under ASAN. Observed in a Julia PR where we started using weak symbols: JuliaLang/julia#45649 Reviewed By: mstorsjo Differential Revision: https://reviews.llvm.org/D129840
This was marked for backporting onto 1.6, but we ran into errors because, as Tim said, the LLVM backport has not happened. If someone wants this backported to 1.6, they will need to get a proper |
Instead of replacing them late in codegen let LLVM emit these symbols,
but intercept them in the ORC JIT.
I haven't had a chance to test this properly and it is likely that
we will need to emit these aliases also into the system-image since
loading that will not see these aliases here.