Skip to content

NehalemEX

Thomas.Roehl edited this page Nov 5, 2015 · 6 revisions

Architecture specific notes for Intel Nehalem EX

Performance groups

Intel Nehalem EX Performance groups

Events

The input file for the events on Intel Nehalem EX can be found here.

Counters

Core-local counters

Fixed-purpose counters

Since the Core2 microarchitecture, Intel® provides a set of fixed-purpose counters. Each can measure only one specific event.

Counters
Counter name Event name
FIXC0 INSTR_RETIRED_ANY
FIXC1 CPU_CLK_UNHALTED_CORE
FIXC2 CPU_CLK_UNHALTED_REF
##### Available Options
Option Argument Description Comment
anythread N Set bit 2+(index*4) in config register
kernel N Set bit (index*4) in config register

General-purpose counters

The Intel® Nehalem EX microarchitecture provides 4 general-purpose counters consisting of a config and a counter register.

Counters
Counter name Event name
PMC0 *
PMC1 *
PMC2 *
PMC3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
kernel N Set bit 17 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
Special handling for events

The Intel® Nehalem EX microarchitecture provides measuring of offcore events in PMC counters. Therefore the stream of offcore events must be filtered using the OFFCORE_RESPONSE registers. The Intel® Nehalem EX microarchitecture has two of those registers. Own filtering can be applied with the OFFCORE_RESPONSE_0_OPTIONS event. Only for those events two more counter options are available:

Option Argument Description Comment
match0 8 bit hex value Input value masked with 0xFF and written to bits 0-7 in the OFFCORE_RESPONSE register Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/NHM-EX.
match0 8 bit hex value Input value masked with 0xF7 and written to bits 8-15 in the OFFCORE_RESPONSE register Check the Intel® Software Developer System Programming Manual, Vol. 3, Chapter Performance Monitoring and https://download.01.org/perfmon/NHM-EX.

Socket-wide counters

Memory controller counters

The Intel® Nehalem EX microarchitecture provides measurements of the memory controllers in the Uncore. The description from Intel®:
The memory controller interfaces to the Intel® 7500 Scalable Memory Buffers and translates read and write commands into specific Intel® Scalable Memory Interconnect (Intel® SMI) operations. Intel SMI is based on the FB-DIMM architecture, but the Intel 7500 Scalable Memory Buffer is not an AMB2 device and has significant exceptions to the FB-DIMM2 architecture. The memory controller also provides a variety of RAS features, such as ECC, memory scrubbing, thermal throttling, mirroring, and DIMM sparing. Each socket has two independent memory controllers, and each memory controller has two Intel SMI channels that operate in lockstep.
The Intel® Nehalem EX microarchitecture has 2 memory controllers, each with 6 general-purpose counters. They are exposed through the MSR interface to the operating system kernel. The MBOX and RBOX setup routines are taken from LIKWID 3, they are not as flexible as the newer setup routines but programming of the MBOXes and RBOXes is tedious for Westmere EX. It is not possible to specify a FVID (Fill Victim Index) for the MBOX or IPERF option for RBOXes.

Counters
Counter name Event name
MBOX<0,1>C0 *
MBOX<0,1>C1 *
MBOX<0,1>C2 *
MBOX<0,1>C3 *
MBOX<0,1>C4 *
MBOX<0,1>C5 *
##### Special handling for events For the events DRAM_CMD_ALL and DRAM_CMD_ILLEGAL two counter options are available:
Option Argument Description Comment
match0 34 bit address Set bits 0-33 in MSR_M<0,1>_PMON_ADDR_MATCH register
mask0 60 bit hex value Extract bits 6-33 from address and set bits 0-27 in MSR_M<0,1>_PMON_ADDR_MASK register

For the events THERM_TRP_DN and THERM_TRP_UP you cannot measure events for all and one specific DIMM simultaneously because they program the same filter register MSR_M<0,1>_PMON_MSC_THR and have contrary configurations.

Although the events FVC_EV<0-3> are available to measure multiple memory events, some overlap and do not allow simultaneous measuring. That's because they program the same filter register MSR_M<0,1>_PMON_ZDP and have contrary configurations. One case are the FVC_EV<0-3>_BBOX_CMDS_READS and FVC_EV<0-3>_BBOX_CMDS_WRITES events that measure memory reads or writes but cannot be measured at the same time.

Home Agent counters

The Intel® Nehalem EX microarchitecture provides measurements of the Home Agent in the Uncore. The description from Intel®:
The B-Box is responsible for the protocol side of memory interactions, including coherent and non-coherent home agent protocols (as defined in the Intel® QuickPath Interconnect Specification). Additionally, the B-Box is responsible for ordering memory reads/writes to a given address such that the M-Box does not have to perform this conflict checking. All requests for memory attached to the coupled M-Box must first be ordered through the B-Box.
The memory traffic in an Intel® Nehalem EX system is controller by the Home Agents. Each MBOX has a corresponding BBOX. Each BBOX offers 4 general-purpose counters. They are exposed through the MSR interface to the operating system kernel.

Counters
Counter name Event name
BBOX<0,1>C0 *
BBOX<0,1>C1 *
BBOX<0,1>C2 *
BBOX<0,1>C3 *
##### Special handling for events
Option Argument Description Comment
match0 60 bit hex value Set bits 0-59 in MSR_B<0,1>_PMON_MATCH register For register layout and valid settings see Intel® Xeon® Processor 7500 Series Uncore Programming Guide
mask0 60 bit hex value Set bits 0-59 in MSR_B<0,1>_PMON_MASK register For register layout and valid settings see Intel® Xeon® Processor 7500 Series Uncore Programming Guide

Crossbar router counters

The Intel® Nehalem EX microarchitecture provides measurements of the crossbar router in the Uncore. The description from Intel®:
The Crossbar Router (R-Box) is a 8 port switch/router implementing the Intel® QuickPath Interconnect Link and Routing layers. The R-Box is responsible for routing and transmitting all intra- and inter-processor communication.
The Intel® Nehalem EX microarchitecture has two interfaces to the RBOX although each socket contains only one crossbar router. Each RBOX offers 8 general-purpose counters. They are exposed through the MSR interface to the operating system kernel. The RBOX setup routine is taken from LIKWID 3.

Counters
Counter name Event name
RBOX<0,1>C0 *
RBOX<0,1>C1 *
RBOX<0,1>C2 *
RBOX<0,1>C3 *
RBOX<0,1>C4 *
RBOX<0,1>C5 *
RBOX<0,1>C6 *
RBOX<0,1>C7 *

Last Level cache counters

The Intel® Nehalem EX microarchitecture provides measurements of the LLC coherency engine in the Uncore. The description from Intel®:
For the Intel Xeon Processor 7500 Series, the LLC coherence engine (C-Box) manages the interface between the core and the last level cache (LLC). All core transactions that access the LLC are directed from the core to a C-Box via the ring interconnect. The C-Box is responsible for managing data delivery from the LLC to the requesting core. It is also responsible for maintaining coherence between the cores within the socket that share the LLC; generating snoops and collecting snoop responses to the local cores when the MESI protocol requires it.
The C-Box is also the gate keeper for all Intel® QuickPath Interconnect (Intel® QPI) messages that originate in the core and is responsible for ensuring that all Intel QuickPath Interconnect messages that pass through the socket’s LLC remain coherent.

The Intel® Nehalem EX microarchitecture has 8 CBOX instances. Each CBOX offers 6 general-purpose counters. They are exposed through the MSR interface to the operating system kernel.

Counters
Counter name Event name
CBOX<0-7>C0 *
CBOX<0-7>C1 *
CBOX<0-7>C2 *
CBOX<0-7>C3 *
CBOX<0-7>C4 *
CBOX<0-7>C5 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 5 bit hex value Set bits 24-28 in config register
invert N Set bit 23 in config register

LLC-to-QPI interface counters

The Intel® Nehalem EX microarchitecture provides measurements of the LLC-to-QPI interface in the Uncore. The description from Intel®:
The S-Box represents the interface between the last level cache and the system interface. It manages flow control between the C and R & B-Boxes. The S-Box is broken into system bound (ring to Intel® QPI) and ring bound (Intel® QPI to ring) connections.
As such, it shares responsibility with the C-Box(es) as the Intel® QPI caching agent(s). It is responsible for converting C-box requests to Intel® QPI messages (i.e. snoop generation and data response messages from the snoop response) as well as converting/forwarding ring messages to Intel® QPI packets and vice versa.

The Intel® Nehalem EX microarchitecture has 2 SBOX instances. Each SBOX offers 4 general-purpose counters. They are exposed through the MSR interface to the operating system kernel.

Counters
Counter name Event name
SBOX<0,1>C0 *
SBOX<0,1>C1 *
SBOX<0,1>C2 *
SBOX<0,1>C3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
##### Special handling for events Only for the TO_R_PROG_EV events two counter options are available:
Option Argument Description Comment
match0 64 bit hex value Set bit 0-63 in MSR_S<0,1>_PMON_MATCH register For register layout and valid settings see Intel® Xeon® Processor 7500 Series Uncore Programming Guide
mask0 39 bit hex value Set bit 0-38 in MSR_S<0,1>_PMON_MASK register For register layout and valid settings see Intel® Xeon® Processor 7500 Series Uncore Programming Guide

Power control unit fixed-purpose counters

The Intel® Nehalem EX microarchitecture provides measurements of the power controller in the Uncore. The description from Intel®:
The W-Box is the primary Power Controller for the Intel® Xeon® Processor 7500 Series.
It provides one fixed-purpose counter to measure the clock frequency of the Uncore.

Counters
Counter name Event name
WBOXFIX UNCORE_CLOCKTICKS

Power control unit general-purpose counters

The Intel® Nehalem EX microarchitecture provides measurements of the power controller in the Uncore. The description from Intel®:
The W-Box is the primary Power Controller for the Intel® Xeon® Processor 7500 Series.
The Intel® Nehalem EX microarchitecture has one WBOX and it offers 4 general-purpose counters. They are exposed through the MSR interface to the operating system kernel.

Counters
Counter name Event name
WBOX0 *
WBOX1 *
WBOX2 *
WBOX3 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
threshold 8 bit hex value Set bits 24-31 in config register
invert N Set bit 23 in config register
#### Uncore management counters The Intel® Nehalem EX microarchitecture provides measurements of the system configuration controller in the Uncore. The description from Intel®:
The U-Box serves as the system configuration controller for the Intel® Xeon® Processor E7 Family.
The Intel® Nehalem EX microarchitecture has one UBOX and it offers a single general-purpose counter. It is exposed through the MSR interface to the operating system kernel. ##### Counters
Counter name Event name
UBOX0 *
##### Available Options
Option Argument Description Comment
edgedetect N Set bit 18 in config register
Clone this wiki locally