Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[libc] Remove remaining GPU architecture dependent instructions #81612

Merged
merged 1 commit into from
Feb 13, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 1 addition & 8 deletions libc/src/__support/GPU/amdgpu/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -152,14 +152,7 @@ LIBC_INLINE uint64_t processor_clock() { return __builtin_readcyclecounter(); }
/// Returns a fixed-frequency timestamp. The actual frequency is dependent on
/// the card and can only be queried via the driver.
LIBC_INLINE uint64_t fixed_frequency_clock() {
if constexpr (LIBC_HAS_BUILTIN(__builtin_amdgcn_s_sendmsg_rtnl))
return __builtin_amdgcn_s_sendmsg_rtnl(0x83);
else if constexpr (LIBC_HAS_BUILTIN(__builtin_amdgcn_s_memrealtime))
return __builtin_amdgcn_s_memrealtime();
else if constexpr (LIBC_HAS_BUILTIN(__builtin_amdgcn_s_memtime))
return __builtin_amdgcn_s_memtime();
else
return 0;
return __builtin_readsteadycounter();
}

/// Terminates execution of the associated wavefront.
Expand Down
6 changes: 2 additions & 4 deletions libc/src/__support/GPU/nvptx/utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -135,13 +135,11 @@ LIBC_INLINE uint32_t get_lane_size() { return 32; }
}

/// Returns the current value of the GPU's processor clock.
LIBC_INLINE uint64_t processor_clock() {
return __nvvm_read_ptx_sreg_clock64();
}
LIBC_INLINE uint64_t processor_clock() { return __builtin_readcyclecounter(); }

/// Returns a global fixed-frequency timer at nanosecond frequency.
LIBC_INLINE uint64_t fixed_frequency_clock() {
return __nvvm_read_ptx_sreg_globaltimer();
return __builtin_readsteadycounter();
}

/// Terminates execution of the calling thread.
Expand Down
5 changes: 3 additions & 2 deletions libc/src/__support/RPC/rpc_util.h
Original file line number Diff line number Diff line change
Expand Up @@ -21,8 +21,9 @@ namespace rpc {

/// Suspend the thread briefly to assist the thread scheduler during busy loops.
LIBC_INLINE void sleep_briefly() {
#if defined(LIBC_TARGET_ARCH_IS_NVPTX) && __CUDA_ARCH__ >= 700
__nvvm_nanosleep(64);
#if defined(LIBC_TARGET_ARCH_IS_NVPTX)
if (__nvvm_reflect("__CUDA_ARCH") >= 700)
LIBC_INLINE_ASM("nanosleep.u32 64;" ::: "memory");
#elif defined(LIBC_TARGET_ARCH_IS_AMDGPU)
__builtin_amdgcn_s_sleep(2);
#elif defined(LIBC_TARGET_ARCH_IS_X86)
Expand Down
5 changes: 3 additions & 2 deletions libc/src/time/gpu/nanosleep.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -23,14 +23,15 @@ LLVM_LIBC_FUNCTION(int, nanosleep,
uint64_t tick_rate = TICKS_PER_SEC / GPU_CLOCKS_PER_SEC;

uint64_t start = gpu::fixed_frequency_clock();
#if defined(LIBC_TARGET_ARCH_IS_NVPTX) && __CUDA_ARCH__ >= 700
#if defined(LIBC_TARGET_ARCH_IS_NVPTX)
uint64_t end = start + (nsecs + tick_rate - 1) / tick_rate;
uint64_t cur = gpu::fixed_frequency_clock();
// The NVPTX architecture supports sleeping and guaruntees the actual time
// slept will be somewhere between zero and twice the requested amount. Here
// we will sleep again if we undershot the time.
while (cur < end) {
__nvvm_nanosleep(static_cast<uint32_t>(nsecs));
if (__nvvm_reflect("__CUDA_ARCH") >= 700)
LIBC_INLINE_ASM("nanosleep.u32 %0;" ::"r"(nsecs));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So __nvvm_nanosleep is not always available but the asm is always supported?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I made some changes to make this well-formed. The __nvvm_reflect pass returns the backend's value of the sm it's compiling with. That used to be an issue if the ASM was invalid and O0 was run, but I made a patch that runs __nvvm_reflect in the backend to trim these branches even at O0 in #81253. So, basically this will only make it to PTX if the backend is compiled with sm_70 or greater.

cur = gpu::fixed_frequency_clock();
nsecs -= nsecs > cur - start ? cur - start : 0;
}
Expand Down
21 changes: 5 additions & 16 deletions libc/src/time/gpu/time_utils.h
Original file line number Diff line number Diff line change
Expand Up @@ -15,24 +15,13 @@ namespace LIBC_NAMESPACE {

#if defined(LIBC_TARGET_ARCH_IS_AMDGPU)
// AMDGPU does not have a single set frequency. Different architectures and
// cards can have vary values. Here we default to a few known values, but for
// complete support the frequency needs to be read from the kernel driver.
#if defined(__GFX10__) || defined(__GFX11__) || defined(__GFX12__) || \
defined(__gfx940__) || defined(__gfx941__) || defined(__gfx942__)
// These architectures use a 100 MHz fixed frequency clock.
constexpr uint64_t clock_freq = 100000000;
#elif defined(__GFX9__)
// These architectures use a 25 MHz fixed frequency clock expect for Vega 10
// which is actually 27 Mhz. We default to 25 MHz in all cases anyway.
constexpr uint64_t clock_freq = 25000000;
#else
// The frequency for these architecture is unknown. We simply default to zero.
constexpr uint64_t clock_freq = 0;
#endif
// cards can have different values. The actualy frequency needs to be read from
// the kernel driver and will be between 25 MHz and 100 MHz on most cards. All
// cards following the GFX9 ISAs use a 100 MHz clock so we will default to that.
constexpr uint64_t clock_freq = 100000000UL;

// We provide an externally visible symbol such that the runtime can set
// this to the correct value. If it is not set we try to default to the
// known values.
// this to the correct value.
extern "C" [[gnu::visibility("protected")]] uint64_t
[[clang::address_space(4)]] __llvm_libc_clock_freq;
#define GPU_CLOCKS_PER_SEC static_cast<clock_t>(__llvm_libc_clock_freq)
Expand Down
Loading