-
Notifications
You must be signed in to change notification settings - Fork 232
SkylakeSP
Intel® Skylake X Performance groups
The input file for the events on Intel® Skylake X can be found here.
- Core-local counters
-
Socket-wide counters
- Energy counters
- Uncore management fixed-purpose counter
- Uncore management general-purpose counters
- Last Level cache counters
- Power control unit fixed-purpose counters
- Power control unit general-purpose counters
- Memory controller fixed-purpose counters
- Memory controller general-purpose counters
- UPI Link Layer counters
- M3UPI counters
- IIO general-purpose counters
- IIO fixed-purpose counters
- IRP general-purpose counters
Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.
Counter name | Event name |
---|---|
FIXC0 | INSTR_RETIRED_ANY |
FIXC1 | CPU_CLK_UNHALTED_CORE |
FIXC2 | CPU_CLK_UNHALTED_REF |
Option | Argument | Description | Comment |
---|---|---|---|
anythread | N | Set bit 2+(index*4) in config register | |
kernel | N | Set bit (index*4) in config register |
The Intel® Skylake X microarchitecture provides 4-8 general-purpose counters consisting of a config and a counter register.
Counter name | Event name |
---|---|
PMC0 | * |
PMC1 | * |
PMC2 | * |
PMC3 | * |
PMC4 | * (only available without HyperThreading) |
PMC5 | * (only available without HyperThreading) |
PMC6 | * (only available without HyperThreading) |
PMC7 | * (only available without HyperThreading) |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
kernel | N | Set bit 17 in config register | |
anythread | N | Set bit 21 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register | |
invert | N | Set bit 23 in config register | |
in_transaction | N | Set bit 32 in config register | Only available if Intel® Transactional Synchronization Extensions are available |
in_transaction_aborted | N | Set bit 33 in config register | Only counter PMC2 and only if Intel® Transactional Synchronization Extensions are available |
The Intel® Skylake X microarchitecture provides measureing of offcore events in PMC counters. Therefore the stream of offcore events must be filtered using the OFFCORE_RESPONSE registers. The Intel® Skylake microarchitecture has two of those registers. LIKWID defines some events that perform the filtering according to the event name. Although there are many bitmasks possible, LIKWID natively provides only the ones with response type ANY. Own filtering can be applied with the OFFCORE_RESPONSE_0_OPTIONS and OFFCORE_RESPONSE_1_OPTIONS events. Only for those events two more counter options are available:
Option | Argument | Description | Comment |
---|---|---|---|
match0 | 16 bit hex value | Input value masked with 0x8FFF and written to bits 0-15 in the OFFCORE_RESPONSE register | Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/SKL. |
match1 | 22 bit hex value | Input value is written to bits 16-37 in the OFFCORE_RESPONSE register | Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/SKL. |
The Intel® Skylake X microarchitecture provides one register for the current core temperature.
Counter name | Event name |
---|---|
TMP0 | TEMP_CORE |
The Intel® Skylake X microarchitecture provides measurements of the current energy consumption through the RAPL interface.
Counter name | Event name |
---|---|
PWR0 | PWR_PKG_ENERGY |
PWR1 | PWR_PP0_ENERGY |
PWR2 | PWR_PP1_ENERGY |
PWR3 | PWR_DRAM_ENERGY |
The Intel® Skylake X microarchitecture provides measurements of the LLC coherency engine in the uncore. The description from Intel®:
The LLC coherence engine and Home agent (CHA) merges the caching agent and home
agent (HA) responsibilities of the chip into a single block. In its capacity as a caching
agent the CHA manages the interface between the core the IIO devices and the last
level cache (LLC). In its capacity as a home agent the CHA manages the interface
between the LLC and the rest of the UPI coherent fabric as well as the on die memory
controller.
The LLC hardware performance counters are exposed to the operating system through the MSR interface. The maximal amount of supported coherency engines for the Intel® Skylake X microarchitecture is 28. It may be possible that your systems does not have all CBOXes, LIKWID will skip the unavailable ones in the setup phase. The name CBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
CBOX<0-27>C0 | * |
CBOX<0-27>C1 | * |
CBOX<0-27>C2 | * |
CBOX<0-27>C3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register | |
tid | 9 bit hex value | Set bits 0-8 in MSR_UNC_C<0-27>_PMON_BOX_FILTER register and bit 19 in the config register | Bits 0-2 specify the thread, bits 3-8 the core. LIKWID 5.0 used wrongly an 8 bit hex value in bits 0-7. |
state | 10 bit hex value | Set bits 17-26 in MSR_UNC_C<0-27>_PMON_BOX_FILTER register | LLC F: 0x80, LLC M: 0x40, LLC E: 0x20, LLC S: 0x10, SF H: 0x08, SF E: 0x04, SF S: 0x02, LLC I: 0x01 |
opcode | 20 bit hex value | Set bits 9-28 in MSR_UNC_C<0-27>_PMON_BOX_FILTER1 register | A list of valid opcodes can be found in the Intel® Xeon SP uncore Manual, section 3.1.1. Bits 17,18 and 27,28 must be all 1. Is set by LIKWID. |
match0 | 2 bit hex address | Set bits 30-31 in MSR_UNC_C<0-27>_PMON_BOX_FILTER1 register | See the Intel® Xeon SP uncore Manual for more information. |
match1 | 6 bit hex value with filter mask 0x33 | Set bits 0-5 in MSR_UNC_C<0-27>_PMON_BOX_FILTER1 register | See the Intel® Xeon SP uncore Manual for more information. |
The Intel® Skylake X microarchitecture provides an event LLC_LOOKUP which can be filtered with the 'state' option. If no 'state' is set, LIKWID sets the state to 0x3FF, the default value to measure all lookups.
The Intel® Skylake X microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for Intel® Xeon® Processor
Scalable Memory Family
In this capacity, the UBox acts as the central unit for a variety of functions:
- The master for reading and writing physically distributed registers across using the Message Channel.
- The UBox is the intermediary for interrupt traffic, receiving interrupts from the system and dispatching interrupts to the appropriate core.
-
The UBox serves as the system lock master used when quiescing the platform
(e.g., Intel® UPI bus lock).
The single fixed-purpose counter counts the clock frequency of the clock source of the uncore. The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
UBOXFIX | UNCORE_CLOCK |
The Intel® Skylake X microarchitecture provides measurements of the management box in the uncore. The description from Intel®:
The UBox serves as the system configuration controller for Intel® Xeon® Processor
Scalable Memory Family
In this capacity, the UBox acts as the central unit for a variety of functions:
- The master for reading and writing physically distributed registers across using the Message Channel.
- The UBox is the intermediary for interrupt traffic, receiving interrupts from the system and dispatching interrupts to the appropriate core.
-
The UBox serves as the system lock master used when quiescing the platform
(e.g., Intel® UPI bus lock).
The uncore management performance counters are exposed to the operating system through the MSR interface. The name UBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
UBOX0 | * |
UBOX1 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 5 bit hex value | Set bits 24-28 in config register |
The Intel® Skylake X microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the Intel® Xeon® Processor Scalable
Memory Family die, responsible for distributing power to core/uncore components and
thermal management. It runs in firmware on an internal micro-controller and
coordinates the socket’s power states.
Note: Many power saving features are tracked as events in their respective units. For
example, Intel® QPI Link Power saving states and Memory CKE statistics are captured
in the Intel® QPI Perfmon and IMC Perfmon respectively.
The PCU offers four fixed-purpose counters to retrieve the cycles CPU cores stay in state C6, C3, P6 and P3. The uncore management performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
WBOX0FIX | CORES_IN_C3 |
WBOX1FIX | CORES_IN_C6 |
WBOX2FIX | CORES_IN_P3 |
WBOX3FIX | CORES_IN_P6 |
The Intel® Skylake X microarchitecture provides measurements of the power control unit (PCU) in the uncore. The description from Intel®:
The PCU is the primary Power Controller for the Intel® Xeon® Processor Scalable
Memory Family die, responsible for distributing power to core/uncore components and
thermal management. It runs in firmware on an internal micro-controller and
coordinates the socket’s power states.
Note: Many power saving features are tracked as events in their respective units. For
example, Intel® QPI Link Power saving states and Memory CKE statistics are captured
in the Intel® QPI Perfmon and IMC Perfmon respectively.
The PCU performance counters are exposed to the operating system through the MSR interface. The name WBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
WBOX0 | * |
WBOX1 | * |
WBOX2 | * |
WBOX3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 5 bit hex value | Set bits 24-28 in config register | |
occupancy_filter | 32 bit hex value | Set bits 0-31 in MSR_UNC_PCU_PMON_BOX_FILTER register |
Band0: bits 0-7, Band1: bits 8-15, Band2: bits 16-23, Band3: bits 24-31 |
occupancy | 2 bit hex value | Set bit 14-15 in config register | Cores in C0: 0x1, in C3: 0x2, in C6: 0x3 |
occ_edgedetect | N | Set bit 31 in config register | |
occ_invert | N | Set bit 30 in config register |
The Intel® Skylake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Intel® Xeon® Processor Scalable Memory Family integrated Memory Controller
provides the interface to DRAM and communicates to the rest of the Uncore through
the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, lockstep,
memory access retry, memory scrubbing, thermal throttling, mirroring, and rank
sparing.
The integrated Memory Controllers performance counters are exposed to the operating system through PCI interfaces. There may be two memory controllers in the system. There are four different PCI devices per memory controller, each covering one memory channel. Each channel has one fixed counter for the DRAM clock. The four channels of the first memory controller are MBOX0-3, the four channels of the second memory controller (if available) are named MBOX4-7. The name MBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
MBOX<0-7>FIX | DRAM_CLOCKTICKS |
The Intel® Skylake X microarchitecture provides measurements of the integrated Memory Controllers (iMC) in the uncore. The description from Intel®:
The Intel® Xeon® Processor Scalable Memory Family integrated Memory Controller
provides the interface to DRAM and communicates to the rest of the Uncore through
the Mesh2Mem block.
The memory controller also provides a variety of RAS features, such as ECC, lockstep,
memory access retry, memory scrubbing, thermal throttling, mirroring, and rank
sparing.
The integrated Memory Controllers performance counters are exposed to the operating system through PCI interfaces. There may be two memory controllers in the system. There are four different PCI devices per memory controller, each covering one memory channel. Each channel has four different general-purpose counters. The four channels of the first memory controller are MBOX0-3, the four channels of the second memory controller (if available) are named MBOX4-7. The name MBOX originates from the Nehalem EX uncore monitoring.
Counter name | Event name |
---|---|
MBOX<0-7>C0 | * |
MBOX<0-7>C1 | * |
MBOX<0-7>C2 | * |
MBOX<0-7>C3 | * |
Option | Argument | Operation | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
The Intel® Skylake X microarchitecture uses a new intersocket communication
network. The name changed from QPI to UPI. The description from Intel®:
Intel® Xeon® Processor Scalable Memory Family uses a new coherent interconnect
for scaling to multiple sockets known as Intel® Ultra Path Interconnect (Intel
UPI). Intel® UPI technology provides a cache coherent socket to socket
external communication interface between processors. The processor
implements 2 Intel® UPI links on -EP or 3 Intel® UPI links on -EX. Figures
below show a 2-socket or 4-socket server systems, as examples. Intel® UPI
is also used as a coherent communication interface between processors and
OEM 3rd party Node Controllers (XNC).
The SBOX hardware performance counters are exposed to the operating system
through PCI devices. There are 3 possible links, each providing three
general-purpose counters. The name SBOX was first used in Nehalem
architectures.
Description from Intel®: There are two Intel® UPI agents that share a
single mesh stop and a third agent in the EX part with its own mesh stop. These
links can be connected to a single destination (such as in DP), or can be
connected to two separate destinations (4s Ring or sDP). Therefore, it will be
necessary to count Intel® UPI statistics for each agent separately.
Counter name | Event name |
---|---|
SBOX<0-2>C0 | * |
SBOX<0-2>C1 | * |
SBOX<0-2>C2 | * |
SBOX<0-2>C3 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register | |
nid | 4 bit hex value | Set bits 40-43 in config register and mandatory bit 45 | Note: Node 0 has value 0x1 |
match0 | 8 bit hex address | Set bits 32-39 in config register | See the Intel® Xeon® Processor Scalable Family Uncore Reference Manual for more information. |
match1 | 10 bit hex address | Set bits 46-55 in config register | See the Intel® Xeon® Processor Scalable Family Uncore Reference Manual for more information. |
The Intel® Skylake X microarchitecture uses a new intrasocket communication
network. Instead of a ring topology it uses now a mesh topology. The description from Intel®:
M3UPI is the interface between the mesh and the Intel® UPI Link Layer. It is
responsible for translating between mesh protocol packets and flits that are used for
transmitting data across the Intel® UPI interface. It performs credit checking between
the local Intel® UPI LL, the remote Intel® UPI LL and other agents on the local mesh.
The RBOX hardware performance counters are exposed to the operating system
through PCI devices. There are up to 3 devices, each providing three
general-purpose counters. The name RBOX was first used in Nehalem
architectures.
Counter name | Event name |
---|---|
RBOX<0-2>C0 | * |
RBOX<0-2>C1 | * |
RBOX<0-2>C2 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
- IBOX0*,IBAND0* and IUTIL0* is part of the CBDMA Stack
- IBOX<1-3>*, IBAND<1-3>* and IUTIL<1-3>* is part of PCIe stack 0-2
- IIO4 is part of the MCP stack 0
Counter name | Event name |
---|---|
IBOX<0-4>C0 | * |
IBOX<0-4>C1 | * |
IBOX<0-4>C2 | * |
IBOX<0-4>C3 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 12 bit hex value | Set bits 24-35 in config register | |
mask0 | 8 bit hex mask | Channel mask filter, sets bits 36-43 in config register | Check Intel® Xeon® Processor Scalable Family Uncore Reference Manual for bit fields. |
mask1 | 3 bit hex address | FC mask, sets bits 44-46 in config register | Check Intel® Xeon® Processor Scalable Family Uncore Reference Manual for bit fields. |
- IBOX0*,IBAND0* and IUTIL0* is part of the CBDMA Stack
- IBOX<1-3>*, IBAND<1-3>* and IUTIL<1-3>* is part of PCIe stack 0-2
- IBOX4 is part of the MCP stack 0
Counter name | Event name |
---|---|
IBOX<0-4>CLK | IUNIT_CLOCKTICKS |
IBAND<0-4>PI0 | BANDWIDTH_PORT0_IN |
IBAND<0-4>PI1 | BANDWIDTH_PORT1_IN |
IBAND<0-4>PI2 | BANDWIDTH_PORT2_IN |
IBAND<0-4>PI3 | BANDWIDTH_PORT3_IN |
IBAND<0-4>PO0 | BANDWIDTH_PORT0_OUT |
IBAND<0-4>PO1 | BANDWIDTH_PORT1_OUT |
IBAND<0-4>PO2 | BANDWIDTH_PORT2_OUT |
IBAND<0-4>PO3 | BANDWIDTH_PORT3_OUT |
IUTIL<0-4>PI0 | UTLILIZATION_PORT0_IN |
IUTIL<0-4>PI1 | UTLILIZATION_PORT1_IN |
IUTIL<0-4>PI2 | UTLILIZATION_PORT2_IN |
IUTIL<0-4>PI3 | UTLILIZATION_PORT3_IN |
IUTIL<0-4>PO0 | UTLILIZATION_PORT0_OUT |
IUTIL<0-4>P1 | UTLILIZATION_PORT1_OUT |
IUTIL<0-4>P2 | UTLILIZATION_PORT2_OUT |
IUTIL<0-4>P3 | UTLILIZATION_PORT3_OUT |
- IRP0* is part of the CBDMA Stack
- IRP<1-3>* is part of PCIe stack 0-2
- IRP<4-5> is part of the MCP stack 0-1
Counter name | Event name |
---|---|
IRP<0-5>C0 | * |
IRP<0-5>C1 | * |
Option | Argument | Description | Comment |
---|---|---|---|
edgedetect | N | Set bit 18 in config register | |
invert | N | Set bit 23 in config register | |
threshold | 8 bit hex value | Set bits 24-31 in config register |
-
Applications
-
Config files
-
Daemons
-
Architectures
- Available counter options
- AMD
- Intel
- Intel Atom
- Intel Pentium M
- Intel Core2
- Intel Nehalem
- Intel NehalemEX
- Intel Westmere
- Intel WestmereEX
- Intel Xeon Phi (KNC)
- Intel Silvermont & Airmont
- Intel Goldmont
- Intel SandyBridge
- Intel SandyBridge EP/EN
- Intel IvyBridge
- Intel IvyBridge EP/EN/EX
- Intel Haswell
- Intel Haswell EP/EN/EX
- Intel Broadwell
- Intel Broadwell D
- Intel Broadwell EP
- Intel Skylake
- Intel Coffeelake
- Intel Kabylake
- Intel Xeon Phi (KNL)
- Intel Skylake X
- Intel Cascadelake SP/AP
- Intel Tigerlake
- Intel Icelake
- Intel Icelake X
- Intel SappireRapids
- Intel GraniteRapids
- Intel SierraForrest
- ARM
- POWER
-
Tutorials
-
Miscellaneous
-
Contributing