-
Notifications
You must be signed in to change notification settings - Fork 12.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[libc] Fix linking of AMDGPU device runtime control constants for math #65676
Conversation
Is there anything we can do better than used? I don't want these to make it into the final binary |
So, the other option would be to compile this and then Another solution would be to emit an internal alias to these variables, that would allow it to be stripped by LTO. |
048c9c9
to
9e81d06
Compare
I updated it to use an internal alias. This causes clang to emit the symbol long enough and with the correct linkage to link against it. The resulting symbols can then be trivially stripped once we optimize the program, see https://godbolt.org/z/66K8T3orW. |
Cmake cannot and should not be trying to link these. -mlink-builtin-bitcode is the only correct way to link device libs |
Sorry, I wasn't specific enough. I was talking about compiling this header as a bitcode file, then using |
It's sad that the device libs need awkward linking but also very long standing. Either link them awkwardly, don't use them, or link them incorrectly and hope the failure modes don't show up in bug reports. I favour don't use them for libc. |
We need to use them right now since we're using |
So technically ocml functions are now subtarget independent. It's ockl that's still problematic. They still could have the base wave size and other ABI incompatibilities. The ISA versions are only used for detecting if FMA is fast or not, which barely matters anymore. We could find a way to avoid that |
Yeah, none of the |
The wavesize incompatibility is deeper than depends on the feature. It's a full ABI incompatibility and really should have 2 separate builds. For the other math options, I have all the patches to eliminate them stuck in review |
So, this approach is mostly just a quick solution to port the |
Ping |
Summary: Currently, `libc` temporarily provides math by linking against existing vendor implementations. To use the AMDGPU DeviceRTL we need to define a handful of control constants that alter behaviour for architecture specific things. Previously these were marked `extern const` because they must be present when we link-in the vendor bitcode library. However, this causes linker errors if more than one math function was used. This patch fixes the issue by marking these functions as used and inline on top of being external. This means that they are linkable, but it gives us `linkonce_odr` semantics. The downside is that these globals won't be optimized out, but it allows us to perform constant propagation on them unlike using `weak`.
9e81d06
to
d11fae1
Compare
@llvm/pr-subscribers-backend-amdgpu ChangesSummary: Currently, `libc` temporarily provides math by linking against existing vendor implementations. To use the AMDGPU DeviceRTL we need to define a handful of control constants that alter behaviour for architecture specific things. Previously these were marked `extern const` because they must be present when we link-in the vendor bitcode library. However, this causes linker errors if more than one math function was used.This patch fixes the issue by marking these functions as used and inline -- 1 Files Affected:
diff --git a/libc/src/math/gpu/vendor/amdgpu/platform.h b/libc/src/math/gpu/vendor/amdgpu/platform.h index 6ec47c24a93a2d7..8e2c23743127f48 100644 --- a/libc/src/math/gpu/vendor/amdgpu/platform.h +++ b/libc/src/math/gpu/vendor/amdgpu/platform.h @@ -9,6 +9,8 @@ #ifndef LLVM_LIBC_SRC_MATH_GPU_AMDGPU_PLATFORM_H #define LLVM_LIBC_SRC_MATH_GPU_AMDGPU_PLATFORM_H +#include "src/__support/macros/attributes.h" + #include <stdint.h> namespace __llvm_libc { @@ -19,96 +21,106 @@ namespace __llvm_libc { extern "C" { // Disable unsafe math optimizations in the implementation. -extern const uint8_t __oclc_unsafe_math_opt = 0; +extern const LIBC_INLINE uint8_t __oclc_unsafe_math_opt = 0; // Disable denormalization at zero optimizations in the implementation. -extern const uint8_t __oclc_daz_opt = 0; +extern const LIBC_INLINE uint8_t __oclc_daz_opt = 0; // Disable rounding optimizations for 32-bit square roots. -extern const uint8_t __oclc_correctly_rounded_sqrt32 = 1; +extern const LIBC_INLINE uint8_t __oclc_correctly_rounded_sqrt32 = 1; // Disable finite math optimizations. -extern const uint8_t __oclc_finite_only_opt = 0; +extern const LIBC_INLINE uint8_t __oclc_finite_only_opt = 0; #if defined(__gfx700__) -extern const uint32_t __oclc_ISA_version = 7000; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7000; #elif defined(__gfx701__) -extern const uint32_t __oclc_ISA_version = 7001; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7001; #elif defined(__gfx702__) -extern const uint32_t __oclc_ISA_version = 7002; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7002; #elif defined(__gfx703__) -extern const uint32_t __oclc_ISA_version = 7003; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7003; #elif defined(__gfx704__) -extern const uint32_t __oclc_ISA_version = 7004; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7004; #elif defined(__gfx705__) -extern const uint32_t __oclc_ISA_version = 7005; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7005; #elif defined(__gfx801__) -extern const uint32_t __oclc_ISA_version = 8001; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8001; #elif defined(__gfx802__) -extern const uint32_t __oclc_ISA_version = 8002; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8002; #elif defined(__gfx803__) -extern const uint32_t __oclc_ISA_version = 8003; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8003; #elif defined(__gfx805__) -extern const uint32_t __oclc_ISA_version = 8005; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8005; #elif defined(__gfx810__) -extern const uint32_t __oclc_ISA_version = 8100; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8100; #elif defined(__gfx900__) -extern const uint32_t __oclc_ISA_version = 9000; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9000; #elif defined(__gfx902__) -extern const uint32_t __oclc_ISA_version = 9002; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9002; #elif defined(__gfx904__) -extern const uint32_t __oclc_ISA_version = 9004; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9004; #elif defined(__gfx906__) -extern const uint32_t __oclc_ISA_version = 9006; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9006; #elif defined(__gfx908__) -extern const uint32_t __oclc_ISA_version = 9008; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9008; #elif defined(__gfx909__) -extern const uint32_t __oclc_ISA_version = 9009; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9009; #elif defined(__gfx90a__) -extern const uint32_t __oclc_ISA_version = 9010; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9010; #elif defined(__gfx90c__) -extern const uint32_t __oclc_ISA_version = 9012; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9012; #elif defined(__gfx940__) -extern const uint32_t __oclc_ISA_version = 9400; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9400; #elif defined(__gfx1010__) -extern const uint32_t __oclc_ISA_version = 10100; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10100; #elif defined(__gfx1011__) -extern const uint32_t __oclc_ISA_version = 10101; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10101; #elif defined(__gfx1012__) -extern const uint32_t __oclc_ISA_version = 10102; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10102; #elif defined(__gfx1013__) -extern const uint32_t __oclc_ISA_version = 10103; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10103; #elif defined(__gfx1030__) -extern const uint32_t __oclc_ISA_version = 10300; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10300; #elif defined(__gfx1031__) -extern const uint32_t __oclc_ISA_version = 10301; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10301; #elif defined(__gfx1032__) -extern const uint32_t __oclc_ISA_version = 10302; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10302; #elif defined(__gfx1033__) -extern const uint32_t __oclc_ISA_version = 10303; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10303; #elif defined(__gfx1034__) -extern const uint32_t __oclc_ISA_version = 10304; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10304; #elif defined(__gfx1035__) -extern const uint32_t __oclc_ISA_version = 10305; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10305; #elif defined(__gfx1036__) -extern const uint32_t __oclc_ISA_version = 10306; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10306; #elif defined(__gfx1100__) -extern const uint32_t __oclc_ISA_version = 11000; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11000; #elif defined(__gfx1101__) -extern const uint32_t __oclc_ISA_version = 11001; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11001; #elif defined(__gfx1102__) -extern const uint32_t __oclc_ISA_version = 11002; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11002; #elif defined(__gfx1103__) -extern const uint32_t __oclc_ISA_version = 11003; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11003; #elif defined(__gfx1150__) -extern const uint32_t __oclc_ISA_version = 11500; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11500; #elif defined(__gfx1151__) -extern const uint32_t __oclc_ISA_version = 11501; +extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11501; #else #error "Unknown AMDGPU architecture" #endif } +// These aliases cause clang to emit the control constants with ODR linkage. +// This allows us to link against the symbols without preventing them from being +// optimized out or causing symbol collisions. +[[gnu::alias("__oclc_unsafe_math_opt")]] const uint8_t __oclc_unsafe_math_opt__; +[[gnu::alias("__oclc_daz_opt")]] const uint8_t __oclc_daz_opt__; +[[gnu::alias("__oclc_correctly_rounded_sqrt32")]] const uint8_t + __oclc_correctly_rounded_sqrt32__; +[[gnu::alias("__oclc_finite_only_opt")]] const uint8_t __oclc_finite_only_opt__; +[[gnu::alias("__oclc_ISA_version")]] const uint32_t __oclc_ISA_version__; + } // namespace __llvm_libc #endif // LLVM_LIBC_SRC_MATH_GPU_AMDGPU_PLATFORM_H |
@arsenm are there any issues with this? It's fixing something that's broken upstream right now without changing any semantics so I'd like to get it in if possible. |
Ping. This is a bug fix that prevents this from actually being used and doesn't change any existing behavior. |
Summary:
Currently,
libc
temporarily provides math by linking against existingvendor implementations. To use the AMDGPU DeviceRTL we need to define a
handful of control constants that alter behaviour for architecture
specific things. Previously these were marked
extern const
becausethey must be present when we link-in the vendor bitcode library.
However, this causes linker errors if more than one math function was
used.
This patch fixes the issue by marking these functions as used and inline
on top of being external. This means that they are linkable, but it
gives us
linkonce_odr
semantics. The downside is that these globalswon't be optimized out, but it allows us to perform constant propagation
on them unlike using
weak
.