Skip to content

Linux NTB HOWTO test

fancer edited this page Dec 6, 2017 · 19 revisions

Introduction

There are several test drivers, which have been created to verify whether the NTB hardware/software basics work correctly on NTB interconnected systems:

  • NTB Ping-Pong test (drivers/ntb/test/ntb_pingpong.c)
  • NTB Debugging Tool (drivers/ntb/test/ntb_tool.c)
  • NTB Raw Performance test (drivers/ntb/test/ntb_perf.c)

Additionally there is a bash-script, which automates the sequential tests execution and the test results verification:

  • NTB test shell-script (tools/testing/selftests/ntb/ntb_test.sh)

Although the last one requires the NTB interconnected systems being also either connected over Ethernet or located within a single PCIe-bus domain.


NTB Ping-Pong test

The Ping-Pong test has been developed to verify the NTB Link, Doorbell, Scratchpad/Message interfaces. The main idea was to perform an infinite loop of data send/receive operations. Remote nodes are notified of incoming data (so called "pong") by setting the corresponding NTB Doorbell bit. Test starts on the NTB Link up event which causes an initial "ping" operation by writing a counter number to the NTB Scratchpad/Message registers and by setting the remote device NTB Doorbell bit. When the incoming NTB Doorbell event detected the local driver reads the NTB Scratchpad/Message register and executes an hrtimer with a predefined timeout. The described "ping" operation is performed when the timer expires. Each NTB port node determines its unique global index so to have the corresponding NTB Doorbell notification bit assigned. Sketch of the algorithm looks as follows:

-> Node 0 (pong, timeout, ping) -> Node 1 (pong, timeout, ping) -> ... -> Node 0 

Such design causes the sequential Ping-Pong infinite loop between all NTB interconnected systems.

NTB Ping-Pong test driver installation and usage

local # modprobe ntb_pingpong delay_ms=1000 unsafe=1

where delay_ms - timeout in ms between the "pong" and "ping" events, unsafe - flag of the test execution even if NTB Doorbell or Scratchpads can't be safely used. Number of "pongs" a local NTB node received can be retrieved by reading from the DebugFS file:

local # cat /sys/kernel/debug/ntb_pingpong/0000:01:00.0/count 
238

NTB Debugging Tool

The NTB API is pretty much complex. It consists of about 47 kernel methods, three callbacks executed within IRQ context and other traditional functions. All of them together almost completely cover any NTB hardware facilities so the client driver developers don't need to know what kind of devices are hidden behind the API scene.

Since the NTB API is used within the kernel only, it's handy to have some user-space test tools, which would help to make sure an NTB hardware driver consistently implements all the necessary API callbacks. Such driver has been created and named NTB Debugging Tool.

Since the NTB Debugging Tool interface is fully based on the NTB API and fully reflects its functionality, it's highly recommended to study the last one first. The detailed description of the API can be found in the kernel tree header file "include/linux/ntb.h". In general the NTB API can be divided on the following logical groups of operations:

  1. NTB device ports mapping information
  2. NTB Doorbell and Doorbell mask bits access methods
  3. NTB Scratchpad access methods (optional)
  4. NTB Message registers and Message status register access methods (optional)
  5. NTB Link enable/disable methods
  6. NTB Inbound/Outbound Memory Windows settings

NTB Debugging Tool installation and basic usage

The only requirement is to install the driver on at least two NTB interconnected systems.

local # modprobe ntb_tool

When it's done, a lot of files pop up at the driver DebugFS directory:

local # ls /sys/kernel/debug/ntb_tool/0000:01:00.0/
db             msg1           msg_outbits    peer4/         spad0
db_event       msg2           msg_sts        peer5/         spad1
db_mask        msg3           peer0/         peer6/         spad2
db_valid_mask  msg_event      peer1/         peer_db        spad3
link           msg_inbits     peer2/         peer_db_mask
msg0           msg_mask       peer3/         port
local # ls /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0
link           mw_trans2      peer_mw_trans0 peer_mw_trans8
link_event     mw_trans3      peer_mw_trans1 peer_mw_trans9
msg0           mw_trans4      peer_mw_trans2 port
msg1           mw_trans5      peer_mw_trans3 spad0
msg2           mw_trans6      peer_mw_trans4 spad1
msg3           mw_trans7      peer_mw_trans5 spad2
mw_trans0      mw_trans8      peer_mw_trans6 spad3
mw_trans1      mw_trans9      peer_mw_trans7

Each file corresponds to the specific NTB API group of methods and read/write operation shall lead to a certain method invocation. As one can see from the listing above the abstract NTB device supports one NTB Doorbell register, four NTB Message and Scratchpad registers, up to seven remote systems can be connected to the device ports, ten NTB Inbound/Outbound Memory Windows. In general the number of the NTB Memory Windows can be different for different ports depending on the device internal configuration.

Some of the listed files can be read- or written-only, but some of them permit both operations. Of course the written data should be specifically formatted in accordance with the NTB API group functionality. The detailed description of the implemented interface is provided in the further sections.

NTB Port Debugging Tool Interface

In general the NTB API implies that there can be multiple systems connected to a single NTB device. For instance, IDT PCIe-switches can have up to eight NTB functions activated and assigned to the device ports; Microsemi Switchtec devices can have vast 48 NTB functions assigned to up to 48 ports; Intel/AMD hardware can't interconnect more than two systems via a single PCIe function. It's also implied that each system can be connected to a single NTB device port, so the port number is used to distinguish one peer from another on the specific device. Since the ports can be unevenly enumerated (like in IDT PCIe-switches) the NTB API provides the port index abstraction, which simply monotonically enumerates all the available ports excluding the local one. The rest of the API methods accept the port index (pidx) only.

In order to get the local and peers port numbers one can simply read from the DebugFS file "port" as it shown in the following listing.

local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/port 
12
local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/port 
0

NTB Doorbell Debugging Tool Interface

It is mandatory for NTB hardware to have the NTB Doorbell register. The NTB Doorbell is usually used to notify a local or peer device of some specific events defined by the client drivers logic. The NTB Doorbells are defined as 64-bits registers with an ability of the certain bits setting and clearance. The peer NTB Doorbell bit setting causes an interrupt request generation on the remote system. Some of the NTB Doorbell bits can be masked from generating the NTB Doorbell IRQs.

Since different NTB devices can have different NTB Doorbell register format, the special API method can be used to retrieve the valid NTB Doorbell bits field:

local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/db_valid_mask
0xffffffff

Each set bit of the output data corresponds to the valid NTB Doorbell bit.

The NTB Doorbell and Doorbell mask bits can be set by writing a command of format "s 0x" to the corresponding file ("db", "peer_db", "db_mask", "peer_db_mask"). The bits clear command got the format "c 0x". See the following listing for example of reading the peer NTB Doorbell register, setting the very first bit of the peer NTB Doorbell by the local system, the NTB Doorbell register getting and clearing on the peer side:

peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/db
0x0
local # echo "s 0x1" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer_db
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/db
0x1
peer # echo "c 0x1" > /sys/kernel/debug/ntb_tool/0000:01:00.0/db
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/db
0x0

There is a way to block the console until an NTB Doorbell event raised. First of all the NTB Doorbell IRQ mask should be cleared ("db_mask"), then the desired NTB Doorbell bits field should be written to the "db_event" file. When at least one of the unmasked NTB Doorbell bits is set the "db_event"-file write operations shall be unblocked. See the next listing for details.

peer # echo "c 0x1" > /sys/kernel/debug/ntb_tool/0000:01:00.0/db_mask
local # echo "s 0x1" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer_db
peer # echo "0x1" > /sys/kernel/debug/ntb_tool/0000:01:00.0/db_event && echo "Got NTB Doorbell event"
Got NTB Doorbell event

NTB Scratchpads Debugging Tool Interface

The NTB API implies that the NTB Scratchpads are just 32-bits wide registers available for IO operations from both local and remote system sides. In general they can be used for some random data exchange.

The NTB Scratchpads interface is pretty much simple. It consists of just read and write operations from/to the local and peer registers.

peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/spad0
0x0
local # echo "0x01020304" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/spad0
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/spad0
0x01020304

Since the local NTB Scratchpad registers are available for the read/write operations from all the possible peer devices, it's up to the client driver to implement the access arbitration so to prevent the race condition between the remote nodes.

NTB Messages Debugging Tool Interface

The NTB Message registers are a bit more complex than the NTB Scratchpads even though they are usually used for the same purpose. Aside from the data exchange, they provide the notification and arbitration interface implemented as an additional NTB Message status register. The NTB Inbound and Outbound Message are also 32-bits registers. The incoming data event (which is also the register full flag) or outgoing data transfer failures are immediately reflected in the NTB Message status register by setting the corresponding bit and raising the NTB Message status interrupt if one isn't masked. It's impossible to get new incoming data until the corresponding incoming data bits (full flags) are cleared in the NTB Message status register. If the local system got the outgoing message failure bit setting, the write operation to the corresponding outbound NTB Message register would have failed due to that reason.

Since the NTB Message status register got mixed incoming and outgoing status bits, there are special NTB API methods implemented to distinguish one type of bits from another:

local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_inbits 
0xf0000
local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_outbits 
0xf

As one can see 0xf0000 bits of the NTB Message status register denote the incoming data events, but 0xf bits designate the outgoing data transfer failures.

The demonstration of the NTB Message registers usage is presented in the next listing. We check the local and peer device NTB Message status register, then we send data to the device with local port index 0, then we check local and peer status registers and read the incoming data from the peer NTB Message register. The data is supposed to be the same as the sent one.

local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_sts
0x0
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_sts
0x0
local # echo "0x01020304" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/msg0
local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_sts 
0x0
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_sts 
0x10000
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg0
0x01020304<-4

As one can see the transmission has been successfully performed. The peer has been notified of the incoming data pending by setting the corresponding status flag. The data has been received from the system connected to the NTB Port with index 4.

If we try to send new data to the same peer, the operation will fail, since the peer still hasn't cleared the incoming NTB Message register full bit.

local # echo "0x04030201" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/msg0 
local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_sts 
0x1
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg0
0x01020304<-4

As soon as we cleared that bit, the transfer shall succeed.

peer # echo "c 0x10000" > /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_sts 
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg_sts 
0x0
local # echo "0x04030201" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/msg0
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/msg0
0x04030201<-4

NTB Link Settings Debugging Tool Interface

The main NTB devices ability is the interface to perform the read/write operations from/to the remote system memory through the MMIO area. But before it can be used the NTB Link should be correspondingly configured. The NTB API hides all the possible complex configurations behind the simple NTB Link enable/disable methods.

The NTB Debugging Tool driver provides the DebugFS files to manually set the link up/down by writing a bool value to them. Similarly one can check whether the NTB Link is enabled for the specific peer device just by reading from one.

local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/link
N
local # echo Y > /sys/kernel/debug/ntb_tool/0000:01:00.0/link
peer # echo Y > /sys/kernel/debug/ntb_tool/0000:01:00.0/link
local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/link
Y

NTB Memory Windows Debugging Tool Interface

Finally when the NTB Link got up, the NTB Memory Windows can be configured and used. The NTB Memory Windows are the configurable tools, used to perform the IO operations with remote DMA memory buffer. Obviously the memory buffer size and address restrictions are NTB hardware specific. The NTB API provides the special methods to satisfy them and properly set the NTB Memory Windows up. The general algorithm of the NTB Memory Window configuration is:

  1. Allocate a memory buffer in accordance with the NTB Memory Window restriction (allocate the inbound memory window)
  2. Set DMA translation address of the allocated buffer (it can be set on the local or remote system depending on the NTB hardware specifics)
  3. Map the NTB Memory Window (map the outbound memory window)

Since the second step is hardware specific, there are NTB API methods which can be used to perform all types of the configurations. In order to make the client driver code portable, one should accordingly use both of these methods.

The NTB Debugging Tool hides most of the NTB Memory Windows configuration complexity behind a set of DebugFS files. The user should just allocate an inbound memory buffer by writing to the corresponding file. Then get its translation (xlat) address and set the peer outbound NTB Memory Window up by using it on the peer side. See the next listing for details.

local # echo 65536 > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/mw_trans0 
local # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/mw_trans0 
Inbound MW      0
Port            0 (0)
Window Address  0xa6030000
DMA Address     0x0000000006030000
Window Size     0x0000000000010000[p]
Alignment       0x0000000000000004[p]
Size Alignment  0x0000000000000001[p]
Size Max        0x0000000000100000[p]
peer # echo 0x0000000006030000:65536 > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer4/peer_mw_trans0 
peer # cat /sys/kernel/debug/ntb_tool/0000:01:00.0/peer4/peer_mw_trans0 
Outbound MW:            0
Port attached           12 (4)
Virtual address         0xae000000
Phys Address            0x000000000e000000[p]
Mapping Size            0x0000000000100000[p]
Translation Address     0x0000000006030000
Window Size             0x0000000000010000[p]

From now the peer can perform the IO operations to the locally allocated DMA memory buffer. The NTB Debugging Tool also exposes the DebugFS files to have the data read/written from/to them. The next listing shows the way it can be done.

local # head -c 13 /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/mw0 | hexdump -C
*
local # echo "Hello, peer!" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/mw0 
local # head -c 13 /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/mw0 | hexdump -C
00000000  48 65 6c 6c 6f 2c 20 70  65 65 72 21 0a           |Hello, peer!.|
peer # head -c 13 /sys/kernel/debug/ntb_tool/0000:01:00.0/peer4/peer_mw0 | hexdump -C
00000000  48 65 6c 6c 6f 2c 20 70  65 65 72 21 0a           |Hello, peer!.|
peer # echo "Hello, local!" > /sys/kernel/debug/ntb_tool/0000:01:00.0/peer4/peer_mw0 
peer # head -c 14 /sys/kernel/debug/ntb_tool/0000:01:00.0/peer4/peer_mw0 | hexdump -C
00000000  48 65 6c 6c 6f 2c 20 6c  6f 63 61 6c 21 0a        |Hello, local!.|
local # head -c 14 /sys/kernel/debug/ntb_tool/0000:01:00.0/peer0/mw0 | hexdump -C
00000000  48 65 6c 6c 6f 2c 20 6c  6f 63 61 6c 21 0a        |Hello, local!.|

NTB Raw Performance test

When NTB Memory Windows are configured the first question the developer would ask how fast the IO operations actually work? The NTB Raw Performance test driver has been developed to answer on this question by measuring the speed of write operation only (without the data integrity verification, which means just non-posted PCIe requests). So the test is simple: measure the time of raw sequential write operations to the outbound NTB Memory Windows.

NTB Raw Performance Test driver installation and usage

local # modprobe ntb_perf max_mw_size=0 chunk_order=19 total_order=30 use_dma=0

where max_mw_size - the maximum size of the allocated memory window, chunk_order - size order of the data written at one iteration (2^value bytes), total_order - total data order (2^value bytes), use_dma - flag of DMA usage instead of CPU-based memcpy_toio instruction.

In order to start the NTB Raw Performance test one need to write the peer index to the "run" DebugFS file of the driver. The index can be find in the data read from the "info". See the next listing for an example.

local # cat /sys/kernel/debug/ntb_perf/0000:01:00.0/info 
    Performance measuring tool info:

Local port 12, Global index 5
Test status: idle
Port 0 (0), Global index 0:
        Link status: up
        Out buffer addr 0xaa000000
        Out buffer size 0x0000000000100000
        Out buffer xlat 0x0000000000000000[p]
        In buffer addr 0xa6100000
        In buffer size 0x0000000000100000
        In buffer xlat 0x0000000006100000[p]
...
local # echo 0 > /sys/kernel/debug/ntb_perf/0000:01:00.0/run
local # cat /sys/kernel/debug/ntb_perf/0000:01:00.0/run 
    Peer 0 test statistics:
0: copied 1073741824 bytes in 34256902 usecs, 31 MBytes/s

Additionally there can be more than one kernel threads performing the write operation simultaneously. The setting is changed by writing the number to the DebugFS file "threads_count".

local # echo 2 > /sys/kernel/debug/ntb_perf/0000:01:00.0/threads_count
local # echo 0 > /sys/kernel/debug/ntb_perf/0000:01:00.0/run
local # cat /sys/kernel/debug/ntb_perf/0000:01:00.0/run 
    Peer 0 test statistics:
0: copied 1073741824 bytes in 68211718 usecs, 15 MBytes/s
1: copied 1073741824 bytes in 68254379 usecs, 15 MBytes/s

NTB Test shell-script

There is a bash-script, which sequentially executes almost all the previously described types of tests. It starts from NTB Debugging Tools testing all the standard NTB interfaces, then runs the NTB Ping-Pong and NTB Raw Performance tests. The script is located at the kernel source tree "tools/testing/selftests/ntb/ntb_test.sh" .

NTB Test shell-script installation and usage

Obviously the script must be copied to the target system before usage. Then it should be executed like it's shown in the next listing.

./ntb_test.sh 0000:01:00.0 0000:03:00.0

where the first and second arguments are the NTB device names on the local and remote systems. By default the script assumes, that both devices are available on the local system. In this way you don't need to have network connection configured. But if you are going to test the NTB devices located on separate hosts you will need to have them connected over Ethernet network with ssh key-based authentication configured. In this way, the typical command of the script execution may look as follows.

./ntb_test.sh -r 192.168.0.2 0000:01:00.0 0000:01:00.0

There are other arguments accepted by the script, which are mostly related with the NTB test drivers options.

./ntb_test.sh -m 65536 -p 30 -w 65536 0000:01:00.0 0000:03:00.0
Starting ntb_tool tests...
Running port tests on: 0000:01:00.0 / 0000:03:00.0
Local port 12 with index 4 on remote host
Peer port 0 with index 0 on local host
   Passed
Running link tests on: 0000:01:00.0 / 0000:03:00.0
   Passed
Running link tests on: 0000:03:00.0 / 0000:01:00.0
   Passed
Running db tests on: 0000:01:00.0 / 0000:03:00.0
   Passed
Running db tests on: 0000:03:00.0 / 0000:01:00.0
   Passed
Running spad tests on: 0000:01:00.0 / 0000:03:00.0
   Passed
Running spad tests on: 0000:03:00.0 / 0000:01:00.0
   Passed
Running msg tests on: 0000:01:00.0 / 0000:03:00.0
   Passed
Running msg tests on: 0000:03:00.0 / 0000:01:00.0
   Passed
Running mw0 test on: 0000:01:00.0 / 0000:03:00.0
   Passed
Running mw1 test on: 0000:01:00.0 / 0000:03:00.0
   Passed
...
Running mw0 test on: 0000:03:00.0 / 0000:01:00.0
   Passed
Running mw1 test on: 0000:03:00.0 / 0000:01:00.0
   Passed
...
Starting ntb_pingpong tests...
Running ping pong tests on: 0000:01:00.0 / 0000:03:00.0
    Passed
Starting ntb_perf tests...
Running local perf test with DMA
0: copied 1073741824 bytes in 34288110 usecs, 31 MBytes/s
    Passed
Running remote perf test with DMA
0: copied 1073741824 bytes in 34288110 usecs, 31 MBytes/s
    Passed

where the description of the passed parameters can be found in the help message of the script (-h).