You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
For a threaded code it is important to call the following sequence of function calls from the serial part of the program:
LIKWID_MARKER_INIT;
[...]
LIKWID_MARKER_CLOSE;
but if any openmp region is opened before the LIKWID_MARKER_INIT call, then the internal data structures are incorrect (or at least might be, depending on the underlying CPU/node arch), and counters are read incorrectly.
E.g. on A64FX with 4 ranks and 6 threads trying to read EA_L2 results in rank 0 / thread 0 reading the counter (so far so good), but also rank 1 / thread 0+1, rank 2 / thread 0+1, and rank 3 / thread 0+1 are reading the same counter. Thread 1 should not read it, but is due to a incorrectly created internal topology data structure.
The text was updated successfully, but these errors were encountered:
The bug with multiple threads reading/reporting counters (marker API only) which they should not access seems to go away when a topology file, generated via likwid-genTopoCfg, is present on the node. I assume the topology parser (when there's no topo file) has some bugs which need to be fixed, or the topo should not be recreated for threads within the marker ROI. Anyhow, if you want to recreate the issue i suggest starting with this command on a a64fx (or other node with multiple numa domains):
The issue comes from changed CPUsets is both cases. When an application is started through LIKWID, the application initially has a CPUset containing all selected HWthreads. If LIKWID_MARKER_INIT is called in this case, it "sees" all potential HWthreads taking part in the computation. As soon as a Pthread thread is started (e.g. by OpenMP), LIKWID's pinning library pins the application (the master thread) to the first HWthread and the workers to consecutive HWthreads in the CPUset. If LIKWID_MARKER_INIT is executed afterwards, it "sees" only its single-core CPUset.
If the topology file is provided, the application as well as all started threads read their topology from the file. This included the CPUset (commonly all threads are allowed because likwid-getTopoCfg is rarely executed in environments with limited CPUset).
The wiki (https://github.com/RRZE-HPC/likwid/wiki/likwid-perfctr#using-the-marker-api) states the following:
but if any openmp region is opened before the LIKWID_MARKER_INIT call, then the internal data structures are incorrect (or at least might be, depending on the underlying CPU/node arch), and counters are read incorrectly.
E.g. on A64FX with 4 ranks and 6 threads trying to read EA_L2 results in rank 0 / thread 0 reading the counter (so far so good), but also rank 1 / thread 0+1, rank 2 / thread 0+1, and rank 3 / thread 0+1 are reading the same counter. Thread 1 should not read it, but is due to a incorrectly created internal topology data structure.
The text was updated successfully, but these errors were encountered: