-
Notifications
You must be signed in to change notification settings - Fork 59
PAPI Releases
- PAPI 7.2.0b1 Release August 30, 2024
- PAPI 7.1.0 Release December 20, 2023
- PAPI 7.0.1 Release March 13, 2023
- PAPI 7.0.0 Release November 15, 2022
- PAPI 6.0.0.1 Bug Fix Release April 3, 2020
- PAPI 6.0.0 Release March 4, 2020
- PAPI 5.7.0 Release March 4, 2019
- PAPI 5.6.0 Release December 20, 2017
- PAPI 5.5.1 Release November 18, 2016
PAPI 7.2.0b1 is now available as a beta release. This release introduces a new component, "rocp_sdk", which supports AMD GPUs/APUs through the ROCprofiler-SDK interface, currently still under development and testing. The release also includes general improvements to the PAPI code, enhancing both design and functionality, as well as various bug fixes.
Additional Major Changes are:
- Preliminary support for AMD ROCprofiler-SDK events
- AMD Zen5 L3 PMU support
- AMD Zen5 core PMU support
- Preset support for Zen5
- Preset support for Ice Lake ICL
- Basic support for the RISC-V architecture (no events yet)
- Initial heterogenous CPU support: Alderlake and Raptorlake can now enumerate events for both Power and Efficiency cores on heterogeneous systems
- Intel AlderLake Gracemont (E-Core) core PMU support
- Intel AlderLake Goldencove (P-Core) core PMU support
- Intel Raptorlake PMU support: Enables support for Raptorlake, Raptorlake P, Raptorlake S
- Intel GraniteRapids core PMU support
- Intel SapphireRapids uncore PMU support for:
- Coherence and Home Agent (CHA)
- Ultra Path Interconnect PMU (UPI)
- memory controller PMU (IMC)
- Intel IcelakeX uncore PMU support for:
- Mesh to IIO PMU (M2PCIE)
- UBOX PMU (UBOX)
- Mesh to UPI PMU (M3UPI)
- Ultra Path Interconnect PMU (UPI)
- Power Control unit PMU (PCU)
- Mesh to Memory PMU (M2M)
- PCIe IIO Ring Port PMU (IRP)
- PCIe I/O controller PMU (IIO)
- memory controller PMU (IMC)
- Coherency and Home Agent (CHA)
- Sysdetect: support for ARM Neoverse V2
- SDE: support for ntv_code_to_info functionality
- Removed the obsolete bundled perfctr and libpfm-3.y code
This release is the result of efforts from many people. The PAPI team would like to express special Thanks to Vince Weaver, Stephane Eranian (for libpfm4), William Cohen, Steve Kaufmann, Peinan Zhang, Rashawn Knapp and Phil Mucci.
To verify the integrity of the download, check the MD5 hash md5sum papi-7.2.0b1.tar.gz
:
e81521450fab24e7d49c952bfc347935
PAPI 7.1.0 is now available. This release includes support for Intel Sapphire Rapids and AMD Zen4 preset events. The release also includes general improvements to the PAPI code in terms of design and functionality. Furthermore, the Counter Analysis Toolkit (CAT) and the Software-Defined Events (SDE) library have also been updated.
Major Changes:
-
Support for Intel Sapphire Rapids native and preset events
-
Support for AMD Zen4 native and preset events
-
Support for event qualifiers in the ROCm component
-
New 'template' component
-
Integration into Spack package manager
-
Integration into the Extreme-Scale Scientific Software Stack (E4S)
-
Refactored cuda component with multi-thread and multi-gpu support
-
Support for ARM Neoverse V1 and V2
This release is the result of efforts from many people. The PAPI team would like to express special Thanks to Vince Weaver, Stephane Eranian (for libpfm4), William Cohen, Steve Kaufmann, Peinan Zhang, Rashawn Knapp, John Rodgers, John Linford, Bert Wesarg, Josh Minor, Kamil Iskra, Florian Weimer, Lukas Alt, William Y. Phan, Aurelian Melinte, and Phil Mucci.
To verify the integrity of the download, check the MD5 hash md5sum papi-7.1.0.tar.gz
:
0f3a940795b2dce430551142e8f938f2
PAPI 7.0.1 has been shipped. This is a minor release of PAPI introducing the following changes:
-
Support for AMD Zen4 CPUs in libpfm4
-
Support for ARM Neoverse V1 and V2 in libpfm4
-
Fix a build error encountered when building the library with gcc 10 and later
-
Resolve build warnings across different components
-
Fix bug in the ROCm component when monitoring multiple GPUs in sampling mode
-
Refactor ROCm component to simplify code and prepare it for rocmtools support
-
Refactor ROCm SMI component and support XGMI events
To verify the integrity of the download, check the MD5 hash md5sum papi-7.0.1.tar.gz
:
14bb2b09dab28232911f929ef4e4b98b
Just in time for Supercomputing 2022, PAPI 7.0.0 is now available.
This is a major release of PAPI, which offers several new components, including "intel_gpu" with monitoring capabilities on Intel GPUs; "sysdetect" (along with a new user API) for detecting details of the available hardware on a given compute system; a significant revision of the "rocm" component for AMD GPUs; the extension of the "cuda" component to enable performance monitoring on NVIDIA's compute capabilities 7.0 and beyond. Furthermore, PAPI 7.0.0 ships with a standalone "libsde" library and a new C++ API for software developers to define software-defined events from within their applications.
For specific and detailed information on changes made for this release, see ChangeLogP700.txt for filenames or keywords of interest and change summaries, or go directly to the PAPI git repository.
-
A new "intel_gpu" component with monitoring capabilities support for Intel GPUs, including GPU hardware events and memory performance metrics (e.g., bytes read/written/transferred from/to L3). The PAPI "intel_gpu" component offers two collection modes: (1) "Time-based Collection Mode," where metrics can be read at any given time during the execution of kernels. (2) "Kernel-based Collection Mode," where performance counter data is available once the kernel execution is finished.
-
A new "sysdetect" component for detecting a machine's architectural details, including the hardware's topology, specific aspects about the memory hierarchy, number and type of GPUs and CPUs on a node, thread affinity to NUMA nodes and GPU devices, etc. Additionally, PAPI offers a new API that enables users to get "sysdetect" details from within their application.
-
A major redesign of the "rocm" component for advanced monitoring features for the latest AMD GPUs. The PAPI "rocm" component is now thread-safe and offers two collection modes: "sampling" and "kernel intercept" mode.
-
Support for NVIDIA compute capability 7.0 and greater. This implies support for CUPTI's new Profiling and Perfworks APIs. The PAPI CUDA component has been refactored to work equally for NVIDIA compute capabilities <7.0 and >= 7.0.
-
A significant redesign of the "sde" component into two separate entities: (1) a standalone library "libsde" with a new API for software developers to define software-based metrics from within their applications, and (2) the PAPI "sde" component that enables monitoring of these new software-based events.
-
A new C++ interface for "libsde," which enables software developers to define software-defined events from within their C++ applications.
-
New Counter Analysis Toolkit (CAT) benchmarks and refinements of PAPI's CAT data analysis, specifically, the extension of PAPI's CAT with MPI and "distributed memory"-aware benchmarks and analysis to stress all cores per node.
-
Support for FUGAKU's A64FX Arm architecture, including monitoring capabilities for memory bandwidth and other node-wide metrics.
This release is the result of efforts from many people. The PAPI team would like to express special Thanks to Vince Weaver, Stephane Eranian (for libpfm4), William Cohen, Steve Kaufmann, Peinan Zhang, John Rodgers, Yamada Masahiko, Thomas Richter, and Phil Mucci.
To verify the integrity of the download, check the MD5 hash md5sum papi-7.0.0.tar.gz
:
71602266f8523e97a30b1a556adde1ba
PAPI 6.0.0.1 was released April 3, 2020. This release fixes a bug for static builds that caused an undefined reference to "pthread_self". Furthermore, a bug with "make -j" (parallel make) has been fixed.
To verify the integrity of the download, check the MD5 hash md5sum papi-6.0.0.1.tar.gz
:
34c536f3c4a6ad4b5615de23018503ad
PAPI 6.0.0 was released March 4, 2020. This release includes a new API for SDEs (Software Defined Events), a major revision of the 'high-level API', and several new components, including ROCM and ROCM_SMI (for AMD GPUs), powercap_ppc and sensors_ppc (for IBM Power9 and later), SDE, and the IO component (exposes I/O statistics exported by the Linux kernel). Furthermore, PAPI 6.0 ships CAT, a new Counter Analysis Toolkit that assists with native performance counter disambiguation through micro-benchmarks.
For specific and detailed information on changes made for this release, see ChangeLogP600.txt for filenames or keywords of interest and change summaries, or go directly to the PAPI git repository.
- Added the rocm component to support performance counters on AMD GPUs.
- Added the rocm_smi component; SMI is System Management Interface to monitor power usage on AMD GPUs, which is also writeable by the user, e.g. to reduce power consumption on non-critical operations.
- Added 'io' component to expose I/O statistics exported by the Linux kernel (/proc/self/io).
- Added 'SDE' component, Software Defined Events, which allows HPC software layers to expose internal performance-critical behavior via Software Defined Events (SDEs) through the PAPI interface.
- Added 'SDE API' to register performance-critical events that originate from HPC software layers, and which are recognized as 'PAPI counters' and, thus, can be monitored with the standard PAPI interface.
- Added powercap_ppc component to support monitoring and capping of power usage on IBM PowerPC architectures (Power9 and later) using the powercap interface exposed through the Linux kernel.
- Added 'sensors_ppc' component to support monitoring of system metrics on IBM PowerPC architectures (Power9 and later) using the opal/exports sysfs interface.
- Retired infiniband_umad component, it is superseded by infiniband.
- Revived PAPI's 'high-level API' to make it more intuitive and effective for novice users and quick event reporting.
- Added 'counter_analysis_toolkit' sub-directory (CAT): A tool to assist with native performance counter disambiguation through micro-benchmarks, which are used to probe different important aspects of modern CPUs, to aid the classification of native performance events.
- Standardized our environment variables and implemented a simplified, unified approach for specifying libraries necessary for components, with overrides possible for special circumstances. Eliminated component level 'configure' requirements.
- Corrected TLS issues (Thread Local Storage) and race conditions.
- Several bug fixes, documentation fixes and enhancements, improvements to README files for user instruction and code comments.
This release is the result of efforts from many people. The PAPI team would like to express special Thanks to Vince Weaver, Stephane Eranian (for libpfm4), William Cohen, Steve Kaufmann, Phil Mucci, Kevin Huck, Yunqiang Su, Carl Love, Andreas Beckmann, Al Grant and Evgeny Shcherbakov.
To verify the integrity of the download, check the MD5 hash md5sum papi-6.0.0.tar.gz
:
67d06f70fca62f4fcc95672f197638a2
PAPI 5.7.0 was released March 4, 2019. This release includes a new component, called "pcp", which interfaces to the Performance Co-Pilot (PCP). It enables PAPI users to monitor IBM POWER9 hardware performance events, particularly shared “NEST” events without root access.
This release also upgrades the (to date read-only) PAPI “nvml” component with write access to the information and controls exposed via the NVIDIA Management Library. The PAPI “nvml” component now supports both---measuring and capping power usage---on recent NVIDIA GPU architectures (e.g. V100).
We have added power monitoring as well as PMU support for recent Intel architectures such as Cascade Lake, Kaby Lake, Skylake, and Knights Mill (KNM). Furthermore, measuring power usage for AMD Fam17h chips is now available via the “rapl” component.
For specific and detailed information on changes made for this release, see ChangeLogP570.txt for filenames or keywords of interest and change summaries, or go directly to the PAPI git repository.
- Added the component PCP (Performance Co-Pilot, IBM) which allows access to PCP events via the PAPI interface.
- Added support for IBM POWER9 processors.
- Added power monitoring support for AMD Fam17h architectures via RAPL.
- Added power capping support for NVIDIA GPUs.
- Added benchmarks and testing for the “nvml” component, which allows power-management (reporting and setting) for NVIDIA GPUs.
- Re-implementation of the “cuda” component to better handle GPU events,metrics (values computed from multiple events), and NVLink events, each of which have different handling requirements and may require separate read groupings.
- Enhanced NVLink support, and added additional tests and example code for NVLink (high-speed GPU interconnect).
- Extension of test suite with more advanced testing: attach_cpu_sys_validate, attach_cpu_validate, event_destroy test, openmp.F test, attach_validate test(rdpmc issue).
- ARM64 configuration now works with newer Linux kernels (>=3.19).
- As part of the “cuda” component, expanded CUPTI-only tests to distinguish between PAPI or non-PAPI issues with NVIDIA events and metrics.
- Many memory leaks have been corrected. Not all, some 3rd party library codes still exhibit memory leaks.
- Better reporting and error handling of bugs. Changes to “infiniband_umad”name reporting to distinguish it from the “infiniband” component.
- Cleaning up of the source code, added documentation and test/utility files.
This release is the result of efforts from many people. The PAPI team would like to express special Thanks to Vince Weaver, Stephane Eranian (for libpfm4), William Cohen, Steve Kaufmann, Phil Mucci, and Konstantin Stefanov.
To verify the integrity of the download, check the MD5 hash md5sum papi-5.7.0.tar.gz
:
0e7468d61c279614ff6f39488ac3600d
PAPI 5.6.0 was released December 20, 2017. It contains a major cleanup of the source code and the build system to have consistent code structure, eliminate errors, and reduce redundancies. A number of validation tests have been added to PAPI to verify the PAPI preset events. Improvements and changes to multiple PAPI components have been made, varying from supporting new events to fixes in the component testing.
For specific and detailed information on changes made in this release, see ChangeLogP560.txt for keywords of interest or go directly to the PAPI git repository.
- Validation tests: A substantial effort to add validation tests to PAPI to check and detect problems in the definition of PAPI preset events.
- Event testing: Thorough cleanup of code in the C and Fortran testing to add processor support, cleanup output and make the testing behavior consistent.
- CUDA component: Updated and rewritten to support CUPTI Metric API (combinations of basic events). This component now supports NVLink information through the Metric API. Updated testing for the component.
- NVML component: Updated to support power management limits and improved event names. Minor other bug fixes.
- RAPL component: Added support for: Intel Atom models Goldmont / Gemini_Lake / Denverton, Skylake-X / Kabylake
- PAPI preset events: Many updates to the PAPI preset event mappings; Skylake X support, initial AMD fam17h, fix AMD fam16h, added more Power8 events, initial Power9 events.
- Updating man and help pages for papi_avail and papi_native_avail.
- Powercap component: Added test for setting power caps via PAPI powercap component.
- Infiniband component: Bugfix for infiniband_umad component.
- Uncore component: Updated to support recent processors.
- Lmsensors component updated to support correct runtime linking, better events name, and a number of bug fixes.
- Updated and fixed timer support for multiple architectures.
- All components: Cleanup and standardize testing behavior in the components.
- Build system: Much needed cleanup of configure and make scripts.
- Support for C++ was enhanced.
- Enabling optional support for reading events using perfevent-rdpmc on recent Linux kernels can speed up PAPI_read() by a factor of 5.
- Pthread testing limited to avoid excessive CPU consumption on highly parallel machines.
This release is the result of efforts from many people, with special Thanks to Vince Weaver, Phil Mucci, Steve Kauffman, William Cohen, Will Schmidt, and Stephane Eranian (for libpfm4) from the internal PAPI team.
To verify the integrity of the download, check the MD5 hash md5sum papi-5.6.0.tar.gz
:
fdd075860b2bc4b8de8b8b5c3abf594a
PAPI 5.5.1 was released November, 18, 2016. This is a point release intended primarily to add support for uncore performance monitoring events on Intel Xeon Phi Knights Landing (KNL). Other minor bug fixes have also been made.
For specific and detailed information on changes made in this release, see ChangeLogP551.txt for keywords of interest or go directly to the PAPI git repository.
- Added Knights Landing (KNL) uncore event support via libpfm4.
- Fix some possible string termination problems.
- Cleanup lustre and mx components.
- Enable RAPL for Broadwell-EP.
To verify the integrity of the download, check the MD5 hash md5sum papi-5.5.1.tar.gz
:
86a8a6f3d0f34cd83251da3514aae15d