Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc] Fix linking of AMDGPU device runtime control constants for math #65676

Merged
merged 3 commits into from
Oct 7, 2023

Conversation

jhuber6
Copy link
Contributor

@jhuber6 jhuber6 commented Sep 7, 2023

Summary:
Currently, libc temporarily provides math by linking against existing
vendor implementations. To use the AMDGPU DeviceRTL we need to define a
handful of control constants that alter behaviour for architecture
specific things. Previously these were marked extern const because
they must be present when we link-in the vendor bitcode library.
However, this causes linker errors if more than one math function was
used.

This patch fixes the issue by marking these functions as used and inline
on top of being external. This means that they are linkable, but it
gives us linkonce_odr semantics. The downside is that these globals
won't be optimized out, but it allows us to perform constant propagation
on them unlike using weak.

@jhuber6 jhuber6 requested a review from a team as a code owner September 7, 2023 21:13
@github-actions github-actions bot added the libc label Sep 7, 2023
@arsenm
Copy link
Contributor

arsenm commented Sep 8, 2023

Is there anything we can do better than used? I don't want these to make it into the final binary

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 8, 2023

Is there anything we can do better than used? I don't want these to make it into the final binary

So, the other option would be to compile this and then -mlink-builtin-bitcode it. However the CMake isn't really built for this so it would be a lot of custom hackery.

Another solution would be to emit an internal alias to these variables, that would allow it to be stripped by LTO.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 8, 2023

I updated it to use an internal alias. This causes clang to emit the symbol long enough and with the correct linkage to link against it. The resulting symbols can then be trivially stripped once we optimize the program, see https://godbolt.org/z/66K8T3orW.

@arsenm
Copy link
Contributor

arsenm commented Sep 8, 2023

Cmake cannot and should not be trying to link these. -mlink-builtin-bitcode is the only correct way to link device libs

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 8, 2023

Cmake cannot and should not be trying to link these. -mlink-builtin-bitcode is the only correct way to link device libs

Sorry, I wasn't specific enough. I was talking about compiling this header as a bitcode file, then using -mlink-builtin-bitcode on the device libs. That would be a nightmare of CMake since we'd need to somehow only pass the flag to the correct architecture.

@JonChesterfield
Copy link
Collaborator

It's sad that the device libs need awkward linking but also very long standing. Either link them awkwardly, don't use them, or link them incorrectly and hope the failure modes don't show up in bug reports. I favour don't use them for libc.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 8, 2023

It's sad that the device libs need awkward linking but also very long standing. Either link them awkwardly, don't use them, or link them incorrectly and hope the failure modes don't show up in bug reports. I favour don't use them for libc.

We need to use them right now since we're using ocml.bc to generate the math libraries. The end goal is to remove this completely however, this is merely a stopgap.

@arsenm
Copy link
Contributor

arsenm commented Sep 9, 2023

So technically ocml functions are now subtarget independent. It's ockl that's still problematic. They still could have the base wave size and other ABI incompatibilities. The ISA versions are only used for detecting if FMA is fast or not, which barely matters anymore. We could find a way to avoid that

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 9, 2023

So technically ocml functions are now subtarget independent. It's ockl that's still problematic. They still could have the base wave size and other ABI incompatibilities. The ISA versions are only used for detecting if FMA is fast or not, which barely matters anymore. We could find a way to avoid that

Yeah, none of the ocml functions use the wavesize. We'd still need to set the math optimization flags though.

@arsenm
Copy link
Contributor

arsenm commented Sep 9, 2023

The wavesize incompatibility is deeper than depends on the feature. It's a full ABI incompatibility and really should have 2 separate builds.

For the other math options, I have all the patches to eliminate them stuck in review

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 9, 2023

The wavesize incompatibility is deeper than depends on the feature. It's a full ABI incompatibility and really should have 2 separate builds.

For the other math options, I have all the patches to eliminate them stuck in review

So, this approach is mostly just a quick solution to port the ocml libraries to this multi-architecture build I'm doing in libc. I think long-term I'd really like to move the builds to be common, since there's generally only a few builtins we could probably just make errors in the backend if they're unsupported once the user provides an architecture. The wavesize attribute is definitely the hardest one, but at least for my purposes we may be able to allocate 64 sized buffers always and use the intrinsic to set the upper limit.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 14, 2023

Ping

Summary:
Currently, `libc` temporarily provides math by linking against existing
vendor implementations. To use the AMDGPU DeviceRTL we need to define a
handful of control constants that alter behaviour for architecture
specific things. Previously these were marked `extern const` because
they must be present when we link-in the vendor bitcode library.
However, this causes linker errors if more than one math function was
used.

This patch fixes the issue by marking these functions as used and inline
on top of being external. This means that they are linkable, but it
gives us `linkonce_odr` semantics. The downside is that these globals
won't be optimized out, but it allows us to perform constant propagation
on them unlike using `weak`.
@llvmbot
Copy link
Member

llvmbot commented Sep 14, 2023

@llvm/pr-subscribers-backend-amdgpu

Changes Summary: Currently, `libc` temporarily provides math by linking against existing vendor implementations. To use the AMDGPU DeviceRTL we need to define a handful of control constants that alter behaviour for architecture specific things. Previously these were marked `extern const` because they must be present when we link-in the vendor bitcode library. However, this causes linker errors if more than one math function was used.

This patch fixes the issue by marking these functions as used and inline
on top of being external. This means that they are linkable, but it
gives us linkonce_odr semantics. The downside is that these globals
won't be optimized out, but it allows us to perform constant propagation
on them unlike using weak.

--
Full diff: https://github.com/llvm/llvm-project/pull/65676.diff

1 Files Affected:

  • (modified) libc/src/math/gpu/vendor/amdgpu/platform.h (+53-41)
diff --git a/libc/src/math/gpu/vendor/amdgpu/platform.h b/libc/src/math/gpu/vendor/amdgpu/platform.h
index 6ec47c24a93a2d7..8e2c23743127f48 100644
--- a/libc/src/math/gpu/vendor/amdgpu/platform.h
+++ b/libc/src/math/gpu/vendor/amdgpu/platform.h
@@ -9,6 +9,8 @@
 #ifndef LLVM_LIBC_SRC_MATH_GPU_AMDGPU_PLATFORM_H
 #define LLVM_LIBC_SRC_MATH_GPU_AMDGPU_PLATFORM_H
 
+#include "src/__support/macros/attributes.h"
+
 #include <stdint.h>
 
 namespace __llvm_libc {
@@ -19,96 +21,106 @@ namespace __llvm_libc {
 extern "C" {
 
 // Disable unsafe math optimizations in the implementation.
-extern const uint8_t __oclc_unsafe_math_opt = 0;
+extern const LIBC_INLINE uint8_t __oclc_unsafe_math_opt = 0;
 
 // Disable denormalization at zero optimizations in the implementation.
-extern const uint8_t __oclc_daz_opt = 0;
+extern const LIBC_INLINE uint8_t __oclc_daz_opt = 0;
 
 // Disable rounding optimizations for 32-bit square roots.
-extern const uint8_t __oclc_correctly_rounded_sqrt32 = 1;
+extern const LIBC_INLINE uint8_t __oclc_correctly_rounded_sqrt32 = 1;
 
 // Disable finite math optimizations.
-extern const uint8_t __oclc_finite_only_opt = 0;
+extern const LIBC_INLINE uint8_t __oclc_finite_only_opt = 0;
 
 #if defined(__gfx700__)
-extern const uint32_t __oclc_ISA_version = 7000;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7000;
 #elif defined(__gfx701__)
-extern const uint32_t __oclc_ISA_version = 7001;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7001;
 #elif defined(__gfx702__)
-extern const uint32_t __oclc_ISA_version = 7002;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7002;
 #elif defined(__gfx703__)
-extern const uint32_t __oclc_ISA_version = 7003;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7003;
 #elif defined(__gfx704__)
-extern const uint32_t __oclc_ISA_version = 7004;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7004;
 #elif defined(__gfx705__)
-extern const uint32_t __oclc_ISA_version = 7005;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 7005;
 #elif defined(__gfx801__)
-extern const uint32_t __oclc_ISA_version = 8001;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8001;
 #elif defined(__gfx802__)
-extern const uint32_t __oclc_ISA_version = 8002;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8002;
 #elif defined(__gfx803__)
-extern const uint32_t __oclc_ISA_version = 8003;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8003;
 #elif defined(__gfx805__)
-extern const uint32_t __oclc_ISA_version = 8005;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8005;
 #elif defined(__gfx810__)
-extern const uint32_t __oclc_ISA_version = 8100;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 8100;
 #elif defined(__gfx900__)
-extern const uint32_t __oclc_ISA_version = 9000;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9000;
 #elif defined(__gfx902__)
-extern const uint32_t __oclc_ISA_version = 9002;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9002;
 #elif defined(__gfx904__)
-extern const uint32_t __oclc_ISA_version = 9004;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9004;
 #elif defined(__gfx906__)
-extern const uint32_t __oclc_ISA_version = 9006;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9006;
 #elif defined(__gfx908__)
-extern const uint32_t __oclc_ISA_version = 9008;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9008;
 #elif defined(__gfx909__)
-extern const uint32_t __oclc_ISA_version = 9009;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9009;
 #elif defined(__gfx90a__)
-extern const uint32_t __oclc_ISA_version = 9010;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9010;
 #elif defined(__gfx90c__)
-extern const uint32_t __oclc_ISA_version = 9012;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9012;
 #elif defined(__gfx940__)
-extern const uint32_t __oclc_ISA_version = 9400;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 9400;
 #elif defined(__gfx1010__)
-extern const uint32_t __oclc_ISA_version = 10100;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10100;
 #elif defined(__gfx1011__)
-extern const uint32_t __oclc_ISA_version = 10101;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10101;
 #elif defined(__gfx1012__)
-extern const uint32_t __oclc_ISA_version = 10102;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10102;
 #elif defined(__gfx1013__)
-extern const uint32_t __oclc_ISA_version = 10103;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10103;
 #elif defined(__gfx1030__)
-extern const uint32_t __oclc_ISA_version = 10300;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10300;
 #elif defined(__gfx1031__)
-extern const uint32_t __oclc_ISA_version = 10301;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10301;
 #elif defined(__gfx1032__)
-extern const uint32_t __oclc_ISA_version = 10302;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10302;
 #elif defined(__gfx1033__)
-extern const uint32_t __oclc_ISA_version = 10303;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10303;
 #elif defined(__gfx1034__)
-extern const uint32_t __oclc_ISA_version = 10304;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10304;
 #elif defined(__gfx1035__)
-extern const uint32_t __oclc_ISA_version = 10305;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10305;
 #elif defined(__gfx1036__)
-extern const uint32_t __oclc_ISA_version = 10306;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 10306;
 #elif defined(__gfx1100__)
-extern const uint32_t __oclc_ISA_version = 11000;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11000;
 #elif defined(__gfx1101__)
-extern const uint32_t __oclc_ISA_version = 11001;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11001;
 #elif defined(__gfx1102__)
-extern const uint32_t __oclc_ISA_version = 11002;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11002;
 #elif defined(__gfx1103__)
-extern const uint32_t __oclc_ISA_version = 11003;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11003;
 #elif defined(__gfx1150__)
-extern const uint32_t __oclc_ISA_version = 11500;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11500;
 #elif defined(__gfx1151__)
-extern const uint32_t __oclc_ISA_version = 11501;
+extern const LIBC_INLINE uint32_t __oclc_ISA_version = 11501;
 #else
 #error "Unknown AMDGPU architecture"
 #endif
 }
 
+// These aliases cause clang to emit the control constants with ODR linkage.
+// This allows us to link against the symbols without preventing them from being
+// optimized out or causing symbol collisions.
+[[gnu::alias("__oclc_unsafe_math_opt")]] const uint8_t __oclc_unsafe_math_opt__;
+[[gnu::alias("__oclc_daz_opt")]] const uint8_t __oclc_daz_opt__;
+[[gnu::alias("__oclc_correctly_rounded_sqrt32")]] const uint8_t
+    __oclc_correctly_rounded_sqrt32__;
+[[gnu::alias("__oclc_finite_only_opt")]] const uint8_t __oclc_finite_only_opt__;
+[[gnu::alias("__oclc_ISA_version")]] const uint32_t __oclc_ISA_version__;
+
 } // namespace __llvm_libc
 
 #endif // LLVM_LIBC_SRC_MATH_GPU_AMDGPU_PLATFORM_H

@jhuber6
Copy link
Contributor Author

jhuber6 commented Sep 25, 2023

@arsenm are there any issues with this? It's fixing something that's broken upstream right now without changing any semantics so I'd like to get it in if possible.

@jhuber6
Copy link
Contributor Author

jhuber6 commented Oct 2, 2023

Ping. This is a bug fix that prevents this from actually being used and doesn't change any existing behavior.

@jhuber6 jhuber6 merged commit fa23a23 into llvm:main Oct 7, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants