Skip to content

AOMP Release 17.0-2

Compare
Choose a tag to compare
@estewart08 estewart08 released this 28 Apr 22:15

These are the release notes for AOMP 17.0-2. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.

For AOMP 17.0-2, the last trunk commit is 921b45a855f09afe99ea9c0c173794ee4ccd5872 on April 27, 2023. The last amd-only commit is ad7b5d7a69c62dab21332cba131054d2b8a713cc on April 26, 2023 . These commits forms a frozen branch now called "aomp-17.0-2". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-2.

The integrated ROCm components for this AOMP release were built with ROCM 5.4.4 sources.
This is the 3rd AOMP release based on LLVM 17 development.
These are the changes from 17.0-1 to 17.0-2 include:

  • Changed gpurun to set value of both GPU_MAX_HW_QUEUES and LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES to 1 if there is shared use of GPU by multiple mpi ranks. Also, it is set to 1 ONLY if it was not already set by caller.
  • Added environment variables LIBOMPTARGET_AMDGPU_ KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to control how much time to wait in an active state for kernel completion and data transfer completion respectively. The default is 0 which means to wait indefinitely in blocked state. If set, and the specified timeout expires, the waiting runtime jumps to waiting for signal in blocked state.
  • Changed run_babelstream.sh to set LIBOMPTARGET_AMDGPU_KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to improve performance.
  • Fixed the amdgpu nextgen plugin to work for cov5 (code object version 5). The default code object version is cov4.
  • Fixed the amdgpu nextgen plugin to work with OMPT (OpenMP Tools environment).
  • Fixed the amdgpu nextgen plugin to work for multiple architectures supported in same image. Additional patches needed to support device clause on target region to properly offload to the correct gpu when using different architectures from the same vendor.