Releases: ROCm/aomp
rocm-5.6.1
ROCm release v5.6.1
AOMP Release 17.0-3
These are the release notes for AOMP 17.0-3. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or/while contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.
For AOMP 17.0-3, the last trunk commit is ec6b40ab9b577e6e9bf000ccd19d85a9753b6ca8 on JULY 13, 2023. The last amd-only commit is f959ea5d8d1e5aef4b6d06727a9698316d3d33cd on JULY 14, 2023 . These commits form a frozen branch now called "aomp-17.0-3". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-3.
The integrated ROCm components for this AOMP release were built with ROCM 5.6.0 sources.
This is the 4th AOMP release based on LLVM 17 development.
The changes from 17.0-2 to 17.0-3 include:
- Non-compiler components are built with ROCm 5.6.0 sources
- Support code object version 5. The libomptarget device library is now generated for both code object version 4 and code object version 5.
- flang is no longer a symbolic link to clang. A new binary called flang-legacy has the driver support for flang. This is because the clang driver support for flang is going away. The new driver binary is called flang-legacy which uses a frozen set of driver support from ROCm 5.6 now found in the flang repository.
- Enabled Big Jump Loop by default.
- Improved target teams loop transform.
- Removed the link from flang to clang. Replace it with flang-legacy.
- Implemented dynamic LDS accesses from non-kernel functions.
- Performance improvements for small kernels via lazy HSA queue creation and tracking of busy queues.
- Restored GPU_MAX_HW_QUEUES in AMDGPU nextgen plugin.
- Extended environment variable ompx_apu_maps to MI200.
- Added
--archive
to the clang-offload-packager which repackages the extracted files into a new static library. This allows a fat binary static library to become a static library for a single architecture. - Disabled PIE in llvm until build issues in centos and sles are resolved.
Errata:
- Bug in hip 5.6.0 sources when using code object v5 and -O0 causes program to crash.
- flang compilations require -fPIC (need fix in flang-legacy for 17.0-4)
- Smoke test failures
fprintf (non-deterministic)
complex_reduction (non-deterministic)
schedule (non-deterministic)
flang-274983
flang-274983-2
xteamr
rocm-5.6.0
ROCm release v5.6.0
rocm-5.5.1
ROCm release v5.5.1
rocm-5.5.0
ROCm release v5.5.0
AOMP Release 17.0-2
These are the release notes for AOMP 17.0-2. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.
For AOMP 17.0-2, the last trunk commit is 921b45a855f09afe99ea9c0c173794ee4ccd5872 on April 27, 2023. The last amd-only commit is ad7b5d7a69c62dab21332cba131054d2b8a713cc on April 26, 2023 . These commits forms a frozen branch now called "aomp-17.0-2". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-2.
The integrated ROCm components for this AOMP release were built with ROCM 5.4.4 sources.
This is the 3rd AOMP release based on LLVM 17 development.
These are the changes from 17.0-1 to 17.0-2 include:
- Changed gpurun to set value of both GPU_MAX_HW_QUEUES and LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES to 1 if there is shared use of GPU by multiple mpi ranks. Also, it is set to 1 ONLY if it was not already set by caller.
- Added environment variables LIBOMPTARGET_AMDGPU_ KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to control how much time to wait in an active state for kernel completion and data transfer completion respectively. The default is 0 which means to wait indefinitely in blocked state. If set, and the specified timeout expires, the waiting runtime jumps to waiting for signal in blocked state.
- Changed run_babelstream.sh to set LIBOMPTARGET_AMDGPU_KERNEL_BUSYWAIT and LIBOMPTARGET_AMDGPU_DATA_BUSYWAIT to improve performance.
- Fixed the amdgpu nextgen plugin to work for cov5 (code object version 5). The default code object version is cov4.
- Fixed the amdgpu nextgen plugin to work with OMPT (OpenMP Tools environment).
- Fixed the amdgpu nextgen plugin to work for multiple architectures supported in same image. Additional patches needed to support device clause on target region to properly offload to the correct gpu when using different architectures from the same vendor.
AOMP Release 17.0-1
These are the release notes for AOMP 17.0-1. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.
For AOMP 17.0-1, the last trunk commit is 3712dd73a1d50b76624ee6a520be2b1ca94c02ee on April 11th, 2023. The last amd-only commit is
1d8def5772d16c64652d68daac1b12af99fe3770 on April 12th, 2023 . These commits forms a frozen branch now called "aomp-17.0-1". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-1.
The integrated ROCm components for this AOMP release were built with ROCM 5.4.4 sources.
This is the 2nd AOMP release based on LLVM 17 development.
These are the changes from 17.0-0 to 17.0-1 include:
- Switch to nextgen plugin as default. This has shown significant performance improvements. To revert to the old plugin set LIBOMPTARGET_NEXTGEN_PLUGINS=OFF
- Switch from hostrpc to hostexec. hostexec is a significant rewrite of hostrpc. The device hostexec_invoke is now written in OpenMP for portability to other platforms. The names of the wrapper (stub) to run a host function has changed to hostexec() and hostexec_<ReturnType>() . hostexec also uses a global variable to find the transfer payload buffer instead of AMD implicit kernel args. This will support portability of hostexec, printf, and fprintf to other platforms. The update to this device global is made with global variable services in the nextgen plugin.
- An example on the use of hostexec to run MPI_Send and MPI_Recv in a target region is given. This example demonstrates how library owners can build a supplemental header file to enable transparent host execution of selected library functions within an OpenMP target regions with the same host interface. This eliminates the need for any source changes in the user code when host execution from a target region is desired. Before hostexec, users would typically have to end their target region, execute a host-only function, then start another target region. This feature significantly increases general purpose computing capabilities of OpenMP on GPGPU platforms.
- OMPT target support is incomplete with the nextgen plugin. To use OMPT, set the environment variable LIBOMPTARGET_NEXTGEN_PLUGINS=OFF
- Set GPU_MAX_HW_QUEUES in gpurun to 1 when multiple ranks per GPU. This limits GPU concurrency when the GPU is already getting shared usage. This should only set if caller (of gpurun or mpirun) did not already set it. In other words, this should trust the user if they set a value. This will be fixed in next release. Also, OpenMP nextgen plugin does not use GPU_MAX_HW_QUEUES. It uses env variable LIBOMPTARGET_AMDGPU_NUM_HSA_QUEUES.
- Critical regions created via the critical directive are now more efficient: by relaxing the semantics of locks and combining that with the use of acquire and release fences we can limit the flushing of the GPU caches to every time the lock is acquired instead of at every lock check.
- When inlining functions called from the kernel, move allocas for their arguments in the kernel entry block instead of leaving them at launch point.
- Respect environment variable to force synchronous target region executions. Available via
OMPX_FORCE_SYNC_REGIONS=1
.
Errata:
- smoke test "schedule" occasionally fails with memory fault or wrong ordering
- AMD code object version 5 does not work with nextgen plugin. When testing cov5, use LIBOMPTARGET_NEXTGEN_PLUGINS=OFF
rocm-5.4.4
ROCm release v5.4.4
AOMP Release 17.0-0
These are the release notes for AOMP 17.0-0. AOMP uses AMD developer modifications to the upstream LLVM development trunk. These differences are managed in a branch called the "amd-stg-open". This branch is found in a mirror of upstream LLVM found at https://github.com/RadeonOpenCompute/llvm-project. The amd-stg-open branch is constantly changing as AMD merges upstream development trunk with its internal open development efforts. The AMD modifications are experimental and/or contributions under review for the upstream trunk. AOMP uses a snapshot of amd-stg-open at the commit ids and dates listed below. AOMP also includes builds of related ROCm components. We call AOMP a "standalone" build as it does not use or require ROCm with the exception of the kernel module (dkms) and libdrm which are often part of the Linux distribution. AOMP is isolated from any ROCm installations by installing into /usr/lib/aomp and its use of RPATH on runtime libraries.
For AOMP 17.0-0, the last trunk commit is bd1f7c417fc04f93de6b7bbf8740351e58a90613 on March 5th, 2023. The last amd-only commit is f16add4badfa0f16d62ba025f0565a9e4475e37e on March 4th, 2023 . These commits forms a frozen branch now called "aomp-17.0-0". See https://github.com/RadeonOpenCompute/llvm-project/tree/aomp-17.0-0.
The integrated ROCm components for this AOMP release were built with ROCM 5.4.0 sources.
This is the 1st AOMP release based on LLVM 17 development.
These are the changes from 16.0-3 to 17.0-0 include:
- Add support for amdclang, amdclang++, and amdflang
- Updated build scripts for Kokkos (and updated to Kokkos v 3.7.00)
- Support for multiple blocksizes in Xteam reduction (1024 limit).
- A new execution mode BigJumpLoop for SPMD non-reduction kernels
- Additional support for OMPT function "translate_time"
- Added Centos-9 and SLES 15 SP4 rpms.
- No longer support SLES15 SP1.
Errata:
Smoke test failures:
- managed_memory: segfault, when 2+ devices are present
Hip example failure:
- device-lib
OvO:
- cpp/hierarchical_parallelism/reduction_add-complex_double/target__teams (timeout on gfx908)
- cpp/hierarchical_parallelism/reduction_add-float/target_teams (timeout on gfx908)
rocm-5.4.3
ROCm release v5.4.3