-
Notifications
You must be signed in to change notification settings - Fork 59
PAPI Component ROCM
The ROCM component exposes numerous performance events on AMD GPUs. The component is an adapter to the ROCm profiling library (ROC-profiler) which is included in standard ROCM release.
To enable reading ROCM events the user needs to link against a PAPI library that was configured with the ROCM component enabled. As an example the following command: ./configure --with-components="rocm"
is sufficient to enable the component.
Typically, the utility papi_components_avail
(available in papi/src/utils/papi_components_avail
) will display the components available to the user, and whether they are disabled, and when they are disabled why.
For ROCM, PAPI requires one environment variable: PAPI_ROCM_ROOT
.
Typically in Linux one would export these (examples are show below) but some systems have software to manage environment variables (such as modules
or spack
), so consult with your sysadmin if you have such management software.
Besides the PAPI_ROCM_ROOT
environment variable, four more environment variables are required at runtime. The component is just an interface to an AMD utility called rocprofiler
and these are used by rocprofiler
in it's operation.
These added environment variables are typically set as follows, after PAPI_ROCM_ROOT
has been exported. An example is provided below, setting PAPI_ROCM_ROOT
to a typical standard value:
export PAPI_ROCM_ROOT=/opt/rocm
export ROCP_METRICS=$PAPI_ROCM_ROOT/rocprofiler/lib/metrics.xml
export ROCPROFILER_LOG=1
export HSA_VEN_AMD_AQLPROFILE_LOG=1
export AQLPROFILE_READ_API=1
The first of these, ROCP_METRICS, must point at a file containing the descriptions of metrics. The standard location is shown above, the final three above are fixed settings.
For a standard installed system, these are the only environment variables that need to be set, for both compile and runtime.
Within PAPI_ROCM_ROOT, we expect the following standard directories:
PAPI_ROCM_ROOT/include
PAPI_ROCM_ROOT/include/hsa
PAPI_ROCM_ROOT/lib
PAPI_ROCM_ROOT/rocprofiler/lib
PAPI_ROCM_ROOT/rocprofiler/include
For the ROCM component to be operational, it must find the dynamic libraries libhsa-runtime64.so
and librocprofiler64.so
. These are normally found in the above standard directories, or one of the Linux default directories listed by /etc/ld.so.conf
, usually /usr/lib64
, /lib64
, /usr/lib
and /lib
. If these libraries are not found (or are not functional) then the component will be listed as "disabled" with a reason explaining the problem. If libraries were not found, then they are not in the expected places.
The system will search the directories listed in LD_LIBRARY_PATH
, separated by colons :
. This can be set using export; e.g.
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/WhereALibraryCanBeFound
But be careful to repeat the path so far (the $LD_LIBRARY_PATH
part) because it may contain paths needed by other packages. The current path can be viewed with echo $LD_LIBRARY_PATH
.
Known problems and limitations:
- If creation/destruction of EventSets is repeated dozens of times, the AMD portion of the software refuses further creation.
Perhaps a limit is reached, or we are not performing some necessary housekeeping.
-
Only sets of metrics and events that can be gathered in a single pass are supported.
-
Although AMD metrics may be floating point, all values are recast and returned as long long integers. Users may have to recast as
double
for display purposes.