-
Notifications
You must be signed in to change notification settings - Fork 63
4. Kernel
This section explains some technical background, functionality, and testing of SMDK kernel. It covers how to recognize a CXL memory device on OS booting, interpreting information out of BIOS. It also explains how the CXL device becomes memory interfaces, System RAM, Swap and DAX.
At first, the base address and size of the CXL device being attached should be provided by BIOS and/or the device through SRAT, CEDT, and/or DVSEC. In addition, the CXL memory range presented in the EFI memory map must be typed as soft reserved, not as usable. Details are described below.
In order for the CXL device to be detected and function properly, the OS should be able to retrieve base address and size information of CXL device from SRAT (System Resources Affinity Table). Thus, in case a CXL device is not normally detected and operated in your system, you need to check whether SRAT entry contains CXL device information such as affinity, base address, and size.
The first step is to parse the SRAT information. The SRAT table is one of the ACPI (Advanced Configuration and Power Interface) tables. Next, dump the ACPI tables from the system, and then extract the SRAT table from the dumped file.
# /path/to/SMDK/src/test/system/extract_system_info.sh
# Install packages
$ sudo apt install acpica-tools
# Extract ACPI Tables
$ sudo acpidump -o acpidump.out
# Separate Dumped files by tables
$ acpixtract -a acpidump.out
# Change raw data's format to human-readable through parser
$ iasl -d srat.dat
# Find the result
$ ls srat.dsl
srat.dsl
You can now check the details through the srat.dsl file. The srat.dsl file lists information such as Processor Local Affinity, Memory Affinity, which are the subtable type of SRAT table. In a system where the CXL device is normally initialized, the CXL memory range should be included as Memory Affinity as follows. In the example below, the Base Address of the CXL memory region is 0x2380000000, and the Address Length is 0x2000000000, that is, 128GB. In addition, the Proximity Domain of the CXL memory area is identified as 1. This value is used by OS to assign the NUMA node ID during kernel booting.
[78C0h 30912 1] Subtable Type : 01 [Memory Affinity]
[78C1h 30913 1] Length : 28
[78C2h 30914 4] Proximity Domain : 00000001
[78C6h 30918 2] Reserved1 : 0000
[78C8h 30920 8] Base Address : 0000002380000000
[78D0h 30928 8] Address Length : 0000002000000000
[78D8h 30936 4] Reserved2 : 00000000
[78DCh 30940 4] Flags (decoded below) : 00000001
Enabled : 1
Hot Pluggable : 0
Non-Volatile : 0
[78E0h 30944 8] Reserved3 : 0000000000000000
If there are multiple CXL memory devices, there would be multiple Memory Affinities in the SRAT table, and different values of the proximity domain will be assigned. If the CXL memory range is included as Memory Affinity, the SRAT Table is parsed and CXL memory is added to NUMA node during kernel booting as follows. You can check the following log using the $ dmesg command. In the example below, the CXL memory area with Proximity Domain (PXM) 1 is registered as NUMA Node 1.
$ dmesg
...
[ 0.012865] ACPI: SRAT: Node 0 PXM 0 [mem 0x00000000-0x7fffffff]
[ 0.012868] ACPI: SRAT: Node 0 PXM 0 [mem 0x100000000-0x107fffffff]
[ 0.012877] ACPI: SRAT: Node 1 PXM 1 [mem 0x2380000000-0x437fffffff]
...
If you cannot extract srat.dat file, it means your BIOS has not published the SRAT table to your OS. So the BIOS option to support SRAT table needs to be enabled. On the other hand, even though the srat.dat file is extracted, if there is no Memory Affinity for the CXL memory in srat.dsl file, there may be a need to update the BIOS to add the information to the SRAT table.
The other means that SMDK kernel uses to register CXL device is CEDT(CXL Early Discovery Table) and/or DVSEC(Designated Vendor-Specific Extended Capability). DVSEC is a structure defined in the CXL specification and includes a set of information about the capabilities of the CXL device that the vendor supports. In particular, the PCIe DVSEC for CXL device(DVSEC ID=0) contains the base address and size of the CXL device. CEDT enables the OS to locate CXL Host Bridges and location of Host Bridge registers during the boot process. Both CEDT and DVSEC contain the base address and size information. SMDK registers CXL devices as system memory using one of the 3 ways, i.e., SRAT, CEDT, and DVSEC.
It is necessary to verify that CXL memory range is registered as soft reserved in the EFI memory map. The EFI memory map can be found in the kernel boot log. Please see the example below. BIOS-e820 prefix indicates e820 memory map information received from BIOS, and it displays the memory range, memory attribute of each range.
$ dmesg
...
[ 0.000000] BIOS-e820: [mem 0x0000002380000000-0x000000437fffffff] soft reserved
...
If BIOS did not set specific-purpose memory attribute (EFI_MEMORY_SP) for the range, this area would be recognized as a usable area. To recognize this as a soft reserved area, you can set EFI_MEMORY_SP attribute by adding efi_fake_mem to the kernel command line. (e.g., efi_fake_mem=<size>@<start address>:<memory attribute>) This kernel command is used to set the memory attribute for a specific memory range. During system booting, you can add kernel commands by pressing 'e' on the kernel selection grub screen. Please refer to the Installation Guide for an example of the boot screen.
Below is an example of setting efi_fake_mem that should be added to kernel commands when the CXL memory region is recognized as usable in the BIOS memory map. In the example below, the base address is 0x2380000000, the size of the CXL memory area is 128GB, and the memory attribute to be added is 0x40000(=EFI_MEMORY_SP).
efi_fake_mem=128G@0x2380000000:0x40000
After adding the efi_fake_mem command and rebooting your system, check the e820 memory map for the CXL memory region in the booting log again. If CXL memory region is recognized as soft reserved, the CXL/kmem extension driver of SMDK will registers it as a movable memory node.
Once the SMDK kernel is booted, CXL memory channel(s) in the system is registered as movable memory nodes by default. Later, a system administrator can change the grouping policies through the CXL-CLI or sysfs interface with root permission.
# cd /path/to/SMDK/lib/cxl_cli/build/cxl/
# ./cxl <Options>
SMDK supports two grouping policies: noop and node. You can change SMDK memory partition with CXL-CLI. Please refer to CXL-CLI Guide section for more details.
# ./cxl create-region -V -G node ("noop" or "node")
By default, the grouping policy of SMDK Memory Partition is noop (represented a CXL memory as an independent nodes). Please see the table below for more details of each policy.
Value | Desc. | Example: 3ch of CXL devices @Socket 0 |
---|---|---|
node | Node Partition: Represent CXL memories as independent nodes. CXL Memories : Node = N : 1 |
node 0 : CPU #1 + DDR Memory #1 node 1 : CXL #1, #2, #3 |
noop | Single Node: Represent a CXL Memory as an independent node. CXL Memories : Node = 1 : 1 |
node 0 : CPU #1 + DDR Memory #1 node 1 : CXL #1 node 2 : CXL #2 node 3 : CXL #3 |
online/offline
- Offline: CXL memory is not recognized as a system RAM but as a soft reserved area.
Node 0, zone DMA 1 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 3 8 3 4 4 5 3 5 4 4 436
Node 0, zone Normal 1897 128 43 108 119 76 37 13 5 1 0 45128
- Online: CXL is mapped to an independent memory node.
Node 0, zone DMA 1 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 3 8 3 4 4 5 3 5 4 4 436
Node 0, zone Normal 2238 196 75 40 17 31 29 10 5 2 43595
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
SMDK Memory Partition
- noop: All CXL memory devices are added as different nodes from normal DDR memory. Every single CXL device becomes a separate node.
Node 0, zone DMA 1 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 3 8 3 4 4 5 3 5 4 4 436
Node 0, zone Normal 224 705 660 443 166 103 50 31 16 15 44047
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
Node 3, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
- node: All CXL memory devices are grouped by the installed socket, and devices of each socket are added as separate nodes.
Node 0, zone DMA 1 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 3 8 3 4 4 5 3 5 4 4 436
Node 0, zone Normal 33615 309 94 63 38 3 4 1 2 2 43597
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 98304
If the memory of the CXL device is in use, the change operation of online/offline and memory partition is canceled. You can check the result of the memory partition change using the command below.
# ./cxl list -V <--list_node | --list_dev>
In addition to noop and node options of CXL-CLI, you can freely configure CXL node partitions through create-region -V(--soft_interleaving) commands. Assuming that noop partition policy has been applied as shown in the example right above, you can combine Node 2 and Node 3 into one through the command below.
# ./cxl create-region -V --target_node 2 --ways 1 cxl2
# cat /proc/buddyinfo
Node 0, zone DMA 1 0 0 1 2 1 1 0 1 1 3
Node 0, zone DMA32 3 8 3 4 4 5 3 5 4 4 436
Node 0, zone Normal 2907 1686 828 4416 2038 913 407 192 106 58 13430
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 65536
# ./cxl list -V --list_node
[
{
"node_id" : -1,
"devices" : [ ]
}
{
"node_id" : 0,
"devices" : [ ]
}
{
"node_id" : 1,
"devices" : [ "cxl0" ]
}
{
"node_id" : 2,
"devices" : [ "cxl1" "cxl2" ] # Node 2 consists of cxl dev 1 and 2.
}
]
Please check CXL-CLI Guide for more details.
To use CXL RAM regions as System RAM, the CXL RAM regions should be mapped as the MOVABLE node. SMDK provides the extended KMEM DAX driver that CXL memory regions marked "Soft Reserved" by platform firmware add core kernel memory service as the movable node. So, The SMDK kernel basically recognizes the CXL memory range as the movable node when booting.
To check if the DAX device has successfully bound the CXL memory range, check /proc/iomem. Below is an example of a system equipped with a single channel of CXL memory expander device.
$ sudo cat /proc/iomem
880000000-287fffffff : Soft Reserved
880000000-287fffffff : dax0.0
880000000-287fffffff : System RAM (kmem)
If you want to bind it to DEVICE DAX driver, you need to make CXL device offline using the sysfs interface and reconfigure dax device to devdax mode using daxctl cli tool.
$ echo -1 | sudo tee /sys/kernel/cxl/devices/cxl0/node_id
$ cd /path/to/smdk/lib/cxl_cli/build/daxctl/
$ sudo ./daxctl reconfigure-device --mode=devdax dax0.0
$ sudo cat /proc/iomem
880000000-287fffffff : Soft Reserved
880000000-287fffffff : dax0.0
Now, this DAX device can be used through fio benchmark, etc. For more information, refer to Test section below.
If you want to unbind the CXL memory range from the DEVICE DAX device and register it as the MOVABLE node again, execute the command below.
$ cd /path/to/smdk/lib/cxl_cli/build/daxctl/
$ sudo ./daxctl reconfigure-device --mode=system-ram dax0.0
CXL swap is another memory interface for userspace applications. It allows a CXL Device to function as a swap interface, and unlike zswap, it avoids (de)compression overhead and latency fluctuations by wasting host cpu while swap-out(in) pages. When swapping takes place, CXL swap works in the middle of Linux swap procedure, prior to cast disk I/Os and then retrieve/locate the swap pages in a ZONE MOVABLE memory pool that expands and shrinks dynamically.
On executable perspective, CXL Swap is a built-in kernel module, so you don't need to insert a separate module; just turn on CONFIG_CXLSWAP when $ make menuconfig. After system booting, you can enable CXL Swap feature like below.
echo 1 > /sys/module/cxlswap/parameters/enabled
Other parameters can be found in the following Configurations.
Please note that it is recommended to use zSwap and CXL Swap exclusively because the two modules targets different contribution. (trade-off: CPU and memory density)
Note: The following configurations are located in /sys/module/cxlswap/parameters/ and can be modified by writing values in the corresponding files or using CXL-CLI. Root privileges are required to change the settings.
Config. | Desc. | Default | Note |
---|---|---|---|
accept_threshold_percent | The threshold at which cxlswap would start accepting pages again after it became full. | 90 | |
cxlpool | The memory pool for cxlswap that grows on demand and shrinks as pages are freed. | cxlbud | |
enabled | Enable or disable cxlswap at runtime. | N | |
flush (experimental) | Flush all pages in cxlpool. CXL Swap should be disabled before execute flush. | N/A | |
max_pool_percent | The maximum percentage of memory that the cxlpool can occupy. | 20 | |
same_filled_pages_enabled | Identify same-value filled pages (i.e. contents of the page have same value or repetitive pattern) during store operation, and if true, the length of the page is set to zero and the pattern or same-filled value is stored. | Y | |
non_same_filled_pages_enabled | If the attribute is disabled, the handling of non-same-value pages by cxlswap is disabled. | Y |
# echo 1 | sudo tee /sys/module/cxlswap/parameters/enabled
# cat /sys/module/cxlswap/parameters/enabled
Y
# echo 0 | sudo tee /sys/module/cxlswap/parameters/enabled
# cat /sys/module/cxlswap/parameters/enabled
N
# echo 1 | sudo tee /sys/module/cxlswap/parameters/flush
CXL Cache is one of the memory interfaces provided by the SMDK kernel. It allows CXL devices to be utilized as a 2nd-level page cache in the OS. CXL Cache puts page cache pages which selected as victim pages during the Page Frame Reclaim Algorithm (PFRA). The page cache pages stored in ZONE_MOVABLE memory pool are returned to the page cache when file read occurs, reducing the number of disk read.
CXL Cache is built-in kernel module, so you need to turn on CONFIG_CXLCACHE when you build SMDK kernel. After booting, you can enable CXL Cache through the command below.
echo 1 > /sys/module/cxlcache/parameters/enabled
Other parameters can be found in the following Configurations.
Note: The following configurations are located in /sys/module/cxlcache/parameters/ and can be modified by writing values in the following files or using CXL-CLI. Root privileges are required to change the settings.
Config. | Desc. | Default | Note |
---|---|---|---|
accept_threshold_percent | The threshold at which cxlcache would start accepting pages again after it became full. | 90 | |
cxlpool | The memory pool for cxlcache that grows on demand and shrinks as pages are freed. | cxlbud | |
enabled | Enable or disable cxlcache at runtime. | N | |
flush (experimental) | Flush all pages in cxlpool. CXL Cache should be disabled before execute flush. | N/A | |
max_pool_percent | The maximum percentage of memory that the cxlpool can occupy. | 20 |
# echo 1 | sudo tee /sys/module/cxlcache/parameters/enabled
# cat /sys/module/cxlcache/parameters/enabled
Y
# echo 0 | sudo tee /sys/module/cxlcache/parameters/enabled
# cat /sys/module/cxlcache/parameters/enabled
N
# echo 1 | sudo tee /sys/module/cxlcache/parameters/flush
If turn on CONFIG_DEBUG_FS when $ make menuconfig, monitoring of CXL Cache is done via debugfs in the /sys/kernel/debug/cxlcache directory. The effectiveness of cxlcache can be measured (across all filesystems) with:
Metrics. | Desc. | Note |
---|---|---|
evicted_pages | The number of pages evicted since CXL Cache is full. | |
pool_limit_hit | The number of times CXL Cache reached its maximum size set by module parameter. | |
pool_total_size | The size of pages currently stored in CXL Cache. | |
put_pages | The number of pages currently stored in CXL Cache. | |
reject_alloc_fail | The number of page put failures due to allocation failure from CXL pool. | |
reject_kmemcache_fail | The number of page put failures due to Slab memory allocation failure. | |
reject_reclaim_fail | The number of CXL Cache eviction failures using work queue. |
Below is a set of test cases and examples, to verify operations of SMDK kernel.
You can build the binaries required for the tests by running the make command at /path/to/SMDK/src/test once.
$ cd /path/to/SMDK/src/test
$ sudo make
This test case checks whether UEFI BIOS properly provides the CXL device related information to the kernel.
The test case checks the following:
- SRAT table contains the Memory Affinity information of the CXL memory.
- CXL memory range is included in the EFI memory map on dmesg, and it is recognized as soft reserved.
- The CXL memory range of /proc/iomem is recognized as system RAM.
- The CXL memory is recognized as Movable node in /proc/buddyinfo.
Command lines
$ cd /path/to/SMDK/src/test/system
$ ./extract_system_info.sh <CXL memory start address>
(Example) $ ./extract_system_info.sh 2080000000
Result
1. SRAT table:
[7830h 30768 1] Subtable Type : 01 [Memory Affinity]
[7831h 30769 1] Length : 28
[7832h 30770 4] Proximity Domain : 00000002
[7836h 30774 2] Reserved1 : 0000
[7838h 30776 8] Base Address : 0000002080000000
[7840h 30784 8] Address Length : 0000004000000000
[7848h 30792 4] Reserved2 : 00000000
[784Ch 30796 4] Flags (decoded below) : 00000001
Enabled : 1
Hot Pluggable : 0
Non-Volatile : 0
[7850h 30800 8] Reserved3 : 0000000000000000
2. e820 memory map:
[ 0.000000] BIOS-e820: [mem 0x0000000000000000-0x000000000009efff] usable
[ 0.000000] BIOS-e820: [mem 0x000000000009f000-0x00000000000fffff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000000100000-0x000000007356bfff] usable
[ 0.000000] BIOS-e820: [mem 0x000000007356c000-0x0000000073f16fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000073f17000-0x00000000772b2fff] ACPI NVS
[ 0.000000] BIOS-e820: [mem 0x00000000772b3000-0x00000000777fefff] ACPI data
[ 0.000000] BIOS-e820: [mem 0x00000000777ff000-0x00000000777fffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000000077800000-0x000000008fffffff] reserved
[ 0.000000] BIOS-e820: [mem 0x00000000fe010000-0x00000000fe010fff] reserved
[ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x000000207fffffff] usable
[ 0.000000] BIOS-e820: [mem 0x0000002080000000-0x000000607fffffff] soft reserved
3. /proc/iomem:
2080000000-607fffffff : System RAM (kmem)
4. /proc/buddyinfo:
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 1 2
Node 0, zone DMA32 8 7 5 5 6 4 5 5 6 3 436
Node 0, zone Normal 324 74 239 501 369 136 15 3 16 8 14318
Node 1, zone Normal 131 867 2164 1348 976 500 165 65 17 1 15904
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 65536
SMDK kernel includes Subzone architecture for efficient memory management, and this test script is for verifying the operation of subzone function. For a detailed description of subzone architecture, refer to the Memory Partition of SMDK Architecture.
In this test, 1 thread will perform a memory allocation request of size 1KiB/4KiB/128KiB/4MiB,totaling 4GiB per each thread. Then 10 threads allocate memory in the same way.
Command lines
$ cd /path/to/SMDK/src/test/subzone
$ ./run_4GB_malloc_test.sh
Result
Single Thread Testcases
TC test_malloc_1K_bytes_4M_times starts
cxl: set node, size: 1.0K bytes, iteration: 4.0M times, cxl region: cmd_create_region: created 1 region
elapsed time: ......
cxl: set noop, size: 1.0K bytes, iteration: 4.0M times, cxl region: cmd_create_region: created 1 region
elapsed time: ......
......
TC test_malloc_1K_bytes_4M_times done
TC test_malloc_4K_bytes_1M_times starts
......
TC test_malloc_4GB_10_threads_4M_unit done
This script is similar to the above (run_4GB_malloc_test.sh), but the requested memory size is random, i.e., allocation request size changes per every request. Total amount of memory requested is 4GiB.
Command lines
$ cd /path/to/SMDK/src/test/subzone
$ ./run_random_malloc_test.sh
Result
Allocation size: 4294990785
This is a test case to check whether the online/offline change and node id change of the CXL device work normally.
Command lines
$ cd /path/to/SMDK/src/test/driver
$ ./run_functional_test.sh
Result
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 7127 8050 7715 7933 4266 1563 246 16 393 99 453
Node 1, zone Movable 19 15 17 16 19 14 14 11 12 12 32755
Node 2, zone Movable 13 17 12 8 13 12 11 11 10 10 32757
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online
[OFFLINE TEST]
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 39 4620 7788 7729 4254 1589 254 18 393 99 453
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: -1
socket_id: 0
state: offline
PASS
[ONLINE TEST]
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 344 3445 6659 7927 4265 1565 247 16 394 99 452
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online
PASS
[NODE CHANGE TEST]
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 58 3564 6563 7919 4280 1565 248 17 394 99 452
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 65536
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 2
socket_id: 0
state: online
PASS
[KOBJECT RELEASE TEST]
kobject is released
PASS
[SYMLINK CHECK]
memdev: /sys/devices/pci0000:3d/0000:3d:02.0/0000:3e:00.0/mem0
PASS
This is a test case to check whether the state of the device remains unchanged when attempting to change to offline/online in the case of a CXL device that is in use or bound to DAX.
Command lines
$ cd /path/to/SMDK/src/test/driver
$ ./run_rollback_test.sh
Result
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 799 3530 6456 7984 4348 1587 253 21 402 99 447
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online
[online rollback test]
addr[0x7fea88c1b000]
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 1299 4184 6384 7893 4382 1612 280 47 394 98 442
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: 1
socket_id: 0
state: online
PASS
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 2431 3929 6618 8005 4369 1601 268 29 403 113 950
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: -1
socket_id: 0
state: offline
[offline rollback test]
./run_rollback_test.sh: line 98: echo: write error: Invalid argument
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 7 5 6 5 4 7 5 4 6 5 437
Node 0, zone Normal 1865 3845 6406 7796 4254 1558 253 22 394 99 838
Node 2, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
[[Device Info]]
start_address: 0x480000000
size: 0x2000000000
node_id: -1
socket_id: 0
state: offline
./run_rollback_test.sh: line 108: echo: write error: Invalid argument
PASS
This test checks that CXL Swap works well on various swap out/in scenario and verifies the functionality of CXL Swap flush.
The detailed description and prerequisites for each test are written in each script file. Please read the comments in the script first, before running test.
Test basic swap out/in data to/from CXL Swap. The data before swap out and the data after swap in must be the same.
Command lines
$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_storeload_test.sh
Result
Store Load Test Start
=======Test Info======
Process ID : 2034 / CXL Swap Enabled : Y
Test Name : store_load
Total Memory Size 512.00M / Memory Limit to 460.80M
======= RESULT =======
CXL Swap Stored Pages Before Swap : 81380
CXL Swap Stored Pages After Swap : 104871
====== PASS ======
...
Test swap out/in data to/from CXL Swap by multi-threaded. Regardless of the thread, the data before swap out and the data after swap in must be the same.
Command lines
$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_multithread_test.sh
Result
Multi Thread Test Start
=======Test Info======
Process ID : 2072 / CXL Swap Enabled : Y
Test Name : multi_thread
Total Memory Size 1.00G / Memory Limit to 921.60M
======= RESULT =======
Elapsed Time 0.688225 using 10 threads
CXL Swap Stored Pages Before Swap : 81384
CXL Swap Stored Pages After Swap : 104861
====== PASS ======
...
Test swap out/in shared data to/from CXL Swap. The data before swap out and the data after swap in must be the same even using shared memory.
Command lines
$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_sharedmemory_test.sh
Result
Shared Memory Test Start
=======Test Info======
Process ID : 1980 / CXL Swap Enabled : Y
Test Name : shared_memory
Total Memory Size 512.00M / Memory Limit to 460.80M
Process 1980 Initialize Data [Shmid 0]...
Process 1992 Check Initialized Data [Shmid 0]...
Process 1992 Check Initialized Data [Shmid 0] Pass
Process 1992 Modify Data [Shmid 0]...
Process 1980 Check Modified Data [Shmid 0]...
Process 1980 Check Modified Data [Shmid 0] Pass
======= RESULT =======
CXL Swap Stored Pages Before Swap : 1
CXL Swap Stored Pages After Swap : 13801
====== PASS ======
...
Test CXL Swap Flush functionality. Note that even after Flush, there can be few remain pages in CXL Swap. See the description in this script.
Command lines
$ cd /path/to/SMDK/src/test/cxlswap/
$ ./run_cxlswap_flush_test.sh
Result
Flush Test Start
Before Flush : 81373
After Flush : 5
Flush Test Finish
This test checks that CXL Cache works well on various page put/get scenario and verifies the functionality of CXL Cache flush.
The detailed description and prerequisites for each test are written in each script file. Please read comments in the script first, before running the tests.
Test page put/get to/from CXL Cache when using CXL page as page cache. The data before put and the data after get must be the same.
Command lines
$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_put_cxl_page_test.sh
Result
Put CXL Page Test Start
======Test Info======
Process ID : 10076 / CXL Cache Enabled : Y
Test Name : put_cxl_page
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 256.24M
CXL Cache Succ Get Pages After Caching : 256.41M
====== PASS ======
======Test Info======
Process ID : 10088 / CXL Cache Enabled : Y
Test Name : put_cxl_page
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.24M
CXL Cache Succ Get Pages After Caching : 512.41M
====== PASS ======
Put CXL Test Succ.
Test basic page put/get to/from CXL Cache. The data before put and the data after get must be the same.
Command lines
$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_put_get_correctness_test.sh
Result
Put Get Correctness Test Start
======Test Info======
Process ID : 59846 / CXL Cache Enabled : Y
Test Name : put_get_correctness
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 256.00M
CXL Cache Succ Get Pages After Caching : 256.00M
====== PASS ======
======Test Info======
Process ID : 59857 / CXL Cache Enabled : Y
Test Name : put_get_correctness
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.00M
CXL Cache Succ Get Pages After Caching : 512.00M
====== PASS ======
======Test Info======
Process ID : 59892 / CXL Cache Enabled : Y
Test Name : put_get_correctness
Test File Size 1.00G
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.00G
CXL Cache Succ Get Pages After Caching : 1.00G
====== PASS ======
Put Get Correctness Test Succ.
Test page put/get to/from CXL Cache when a file that already exists in CXL Cache is modified. The data before put and the data after get must be the same.
Command lines
$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_modify_put_get_correctness_test.sh
Result
Modify Put Get Correctness Test Start
======Test Info======
Process ID : 59963 / CXL Cache Enabled : Y
Test Name : modify_put_get_correctness
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.01M
CXL Cache Succ Get Pages After Caching : 512.01M
====== PASS ======
======Test Info======
Process ID : 59985 / CXL Cache Enabled : Y
Test Name : modify_put_get_correctness
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.00G
CXL Cache Succ Get Pages After Caching : 1.00G
====== PASS ======
======Test Info======
Process ID : 60028 / CXL Cache Enabled : Y
Test Name : modify_put_get_correctness
Test File Size 1.00G
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 2.00G
CXL Cache Succ Get Pages After Caching : 2.00G
====== PASS ======
Modify Put Get Correctness Test Succ.
This TC is geared to test data integrity while a bunch of put/get operations happen simultaneously out of threads.
Command lines
$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_multithread_test.sh
Result
Multi Thread Test Start
======Test Info======
Process ID : 60104 / CXL Cache Enabled : Y
Test Name : multi_thread
Test File Size 128.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.75G
CXL Cache Succ Get Pages After Caching : 1.17G
====== PASS ======
======Test Info======
Process ID : 60172 / CXL Cache Enabled : Y
Test Name : multi_thread
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 4.74G
CXL Cache Succ Get Pages After Caching : 2.47G
====== PASS ======
Multi Thread Test Succ.
Test shared file page put/get to/from CXL Cache by multi-process. Regardless of shared file situation, the data before put and the data after get must be the same.
Command lines
$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_multiprocess_test.sh
Result
Multi Process Test Start
======Test Info======
Process ID : 60366 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 256.00M
Test File Size Limit is 2.00G
======Test Info======
Process ID : 60366 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 256.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 512.01M
CXL Cache Succ Get Pages After Caching : 512.01M
====== PASS ======
======Test Info======
Process ID : 60379 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 512.00M
Test File Size Limit is 2.00G
======Test Info======
Process ID : 60379 / CXL Cache Enabled : Y
Test Name : multi_process
Test File Size 512.00M
Test File Size Limit is 2.00G
====== RESULT ======
CXL Cache Succ Put Pages After Caching : 1.00G
CXL Cache Succ Get Pages After Caching : 1.00G
====== PASS ======
Multi Process Test Succ.
Test CXL Cache Flush functionality. Note that this test is fail if there is no remaining data in the CXL Cache.
Command lines
$ cd /path/to/SMDK/src/test/cxlcache/
$ ./run_cxlcache_flush_test.sh
Result
Flush Test Start
Before Flush : 82450
After Flush : 0
Flush Test Finish
This test checks that registered CXL devices works well as DAX devices. This script releases the CXL device memory area from the memory, binds it to the DAX device, and checks if it operates as a DAX device through fio. The number of devices and address of the devices in the script should be modified to run correctly.
Note: In order to run the script below, you need to install fio in your system first. Please refer to fio GitHub for information related to the installation and usage of it.
Commnad lines
$ cd /path/to/SMDK/src/test/dax
$ vi ./run_dax_test.sh
# Change the number of devices and address of devices according to your system,
NUM_DEVICE=3
ADDRESS=("1080000000-307fffffff" "3080000000-507fffffff" "5080000000-707fffffff")
# If you are not sure about it, leave NUM_DEVICE as 0 to detect automatically
NUM_DEVICE=0
ADDRESS=()
# Download fio from https://github.com/axboe/fio.git
# Change FIO_PATH from /path/to to your system's path
FIO_PATH=/path/to/fio/
# After modifying the script
$ ./run_dax_test.sh
Result
IOMEM
480000000-f43fffffff : CXL Window 0
480000000-247fffffff : region0
480000000-247fffffff : Soft Reserved
480000000-247fffffff : dax0.0
480000000-247fffffff : System RAM (kmem)
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 5 3 6 6 4 7 5 5 7 4 437
Node 0, zone Normal 38 250 183 105 71 28 15 31 39 12 2218
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
-----------------------------------------------------------------
[
{
"chardev":"dax0.0",
"size":137438953472,
"target_node":1,
"align":2097152,
"mode":"devdax"
}
]
reconfigured 1 device
IOMEM
480000000-f43fffffff : CXL Window 0
480000000-247fffffff : region0
480000000-247fffffff : Soft Reserved
480000000-247fffffff : dax0.0
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 5 3 6 6 4 7 5 5 7 4 437
Node 0, zone Normal 24 20 13 19 17 12 1 2 20 10 2608
-----------------------------------------------------------------
FIO TEST
dev-dax-write: (g=0): rw=randwrite, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=dev-dax, iodepth=1
...
dev-dax-read: (g=1): rw=randread, bs=(R) 2048KiB-2048KiB, (W) 2048KiB-2048KiB, (T) 2048KiB-2048KiB, ioengine=dev-dax, iodepth=1
[
{
"chardev":"dax0.0",
"size":137438953472,
"target_node":1,
"align":2097152,
"mode":"system-ram",
"online_memblocks":1024,
"total_memblocks":1024,
"movable":true
}
]
reconfigured 1 device
IOMEM
480000000-f43fffffff : CXL Window 0
480000000-247fffffff : region0
480000000-247fffffff : Soft Reserved
480000000-247fffffff : dax0.0
480000000-247fffffff : System RAM (kmem)
[[Buddy Info]]
Node 0, zone DMA 0 0 0 0 0 0 0 0 1 2 2
Node 0, zone DMA32 5 3 6 6 4 7 5 5 7 4 437
Node 0, zone Normal 408 1223 1358 792 287 482 278 41 23 10 2211
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 32768
You can use QEMU to emulate a CXL system(4.2.6.1) and SMDK functionality(4.2.6.2)
First, build the QEMU.
$ cd /path/to/SMDK/lib/
$ ./build_lib.sh qemu
After downloading Ubuntu ISO image file from here, update the ISO file path, UBUNTU_ISO, in /path/to/SMDK/lib/qemu/create_gui_image.sh, then run the script.
$ cd /path/to/SMDK/lib/qemu/
$ vi create_gui_image.sh # Update UBUNTU_ISO file path.
$ ./create_gui_image.sh
When the Ubuntu installation is finished, run the following command to boot to Ubuntu.
$ cd /path/to/SMDK/lib/qemu/
$ ./setup_gui_ssh.sh
After booting, update the APT repository if necessary, and install the required package by $ sudo apt update, $ sudo apt install <packages e.g., openssh-server>, etc.
With SMDK repository cloned from github, build and install SMDK kernel. You can now emulate the SMDK Kernel with the following script.
$ cd /path/to/SMDK/lib/qemu/
$ ./run_cxl_emu_gui.sh # default setting: 6 cores, 8GB RAM. (KVM hardware acceleration is not enabled)
Note: run_cxl_emu_gui.sh disables KVM(Kernel-based Virtual Machine) to support load/store to CXL emulation memory. If you don't need this feature, you can add the '-enable-kvm' option to speed up your system when running QEMU emulation.
You can connect to the QEMU virtual machine through QEMU monitor(port: 45454) and sshd(port: 2242) with scripts below.
# Connect to QEMU Monitor
$ cd /path/to/SMDK/lib/qemu/
$ ./connect_monitor.sh
# Connect to sshd
$ cd /path/to/SMDK/lib/qemu/
$ ./connect_ssh.sh
QEMU supports CXL type3 volatile-memory emulations since v8.1.0. SMDK supports QEMU emulation since v1.5.1, which allows using userspace plugins(library, cli, BM, testcases) and OS interfaces(swap and cache).
Step 1: Configure & Build the kernel
The building procedure of the SMDK kernel for CXL emulation is identical with here. However, the region creation will fail because QEMU does not support invalidation for memregion. To avoid the failure, 'CXL_REGION_INVALIDATION_TEST' must be enabled(CONFIG_CXL_REGION_INVALIDATION_TEST=y) while kernel configuration.
If you booted the SMDK kernel on QEMU, now you can use the test script below to check if CXL emulation is working properly. If the test succeeds, CXL memory will be registered as System RAM and available for use.
$ cd /path/to/SMDK/src/test/qemu/
$ sudo ./run_qemu_test.sh
Alternatively, you can manually perform below procedures step by step.
Step 2: Create CXL region
$ cat /proc/buddyinfo
Node 0, zone DMA 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA32 7 5 5 7 6 5 8 4 5 3 475
Node 0, zone Normal 218 135 85 37 11 32 3 2 1 2 1152
$ sudo cat /proc/iomem
890000000-c8fffffff : CXL Window 0
$ cd /path/to/SMDK/lib/
$ ./build_lib.sh cxl_cli
# 16GB CXL memory region
$ sudo ./cxl_cli/build/cxl/cxl create-region -d decoder0.0 -s $((16*1024*1024*1024)) -t ram
{
"region":"region0",
"resource":"0x890000000",
"size":"16.00 GiB (17.18 GB)",
"type":"ram",
"interleave_ways":1,
"interleave_granularity":256,
"decode_state":"commit",
"mappings":[
{
"position":0,
"memdev":"mem0",
"decoder":"decoder2.0"
}
],
}
cxl region: cmd_create_region: created 1 region
$ cat /proc/buddyinfo
Node 0, zone DMA 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA32 2 2 2 4 4 1 1 2 3 4 487
Node 0, zone Normal 1678 1189 630 235 137 62 55 14 18 11 1045
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 4096
$ sudo cat /proc/iomem
...
890000000-c8fffffff : CXL Window 0
890000000-c8fffffff : region0
890000000-c8fffffff : dax0.0
890000000-c8fffffff : System RAM (kmem)
...
Specifically, when the step 2 is done normally, it is ready to use the SMDK plugins and interfaces. Please refer to the separate urls that explain how to use them - https://github.com/OpenMPDK/SMDK/wiki/5.-Plugin and https://github.com/OpenMPDK/SMDK/wiki/4.-Kernel
Limitations
- MLC BW tool and PMU related SW are not working due to the CPU dependency.
Step 3: use CXL memory
$ cd /path/to/SMDK/
$ cd lib/ && ./build_lib.sh numactl && cd -
$ cd src/test/mmap && make && cd -
$ ./lib/numactl-2.0.16/numactl -m 1 ./src/test/mmap/test_mmap_cxl
addr[0x7f0125c07010], one='1' zero='0'
addr[0x7f0125206010], one='1' zero='0'
addr[0x7f0124805010], one='1' zero='0'
addr[0x7f0123e04010], one='1' zero='0'
addr[0x7f0123403010], one='1' zero='0'
...
$ cat /proc/buddyinfo
Node 0, zone DMA 0 0 0 0 0 0 0 0 0 1 3
Node 0, zone DMA32 5 6 3 4 5 5 5 7 7 7 475
Node 0, zone Normal 881 873 624 179 400 249 127 60 27 7 1028
Node 1, zone Movable 0 0 0 0 0 0 0 0 0 0 4096
Step 4: Destroy CXL region
$ cd /path/to/SMDK/lib
$ sudo ./cxl_cli/build/daxctl/daxctl reconfigure-device --mode=devdax dax0.0 -f
[
{
"chardev":"dax0.0",
"size":17179869184,
"target_node":1,
"align":2097152,
"mode":"devdax"
}
]
reconfigured 1 device
$ sudo ./cxl_cli/build/cxl/cxl destroy-region region0 -f
cxl region: cmd_destroy_region: destroyed 1 region
$ sudo cat /proc/iomem
890000000-c8fffffff : CXL Window 0
$ cat /proc/buddyinfo
Node 0, zone DMA 0 0 0 1 0 0 0 0 2 2 2
Node 0, zone DMA32 9 8 10 10 7 10 12 8 7 11 469
Node 0, zone Normal 231 344 222 134 111 63 12 2 3 2 1119
Note: If you do not change to devdax mode before region is destroyed, the region will be deleted, but the memory area will not be deleted.
removing memory fails, because memory [0x0000000890000000-0x0000000c8fffffff] is onlined
kmem dax0.0: mapping0: 0x890000000-0xc8fffffff cannot be hotremoved until the next reboot
Step 5: Recreate CXL region
$ cd /path/to/SMDK/lib/
$ sudo ./cxl_cli/build/cxl/cxl create-region -d decoder0.0 -s 17179869184 -t ram
$ sudo ./cxl_cli/build/daxctl/daxctl reconfigure-device --mode=system-ram dax0.0 -f
If you successfully created CXL memory region, you can simulate the CHMU(CXL Hotness Monitoring Unit).
It pushes random address(which means virtual hot entry) in CHMU's hotlist every 0.1 seconds.
When the hotlist is full, it sends an interrupt to the CHMU driver to receive hot entries information.
You can check the hotlist through kernel messages($ dmesg
).
Step 1: Set CHMU Registers
# Each bitmap covers 8GB CXL memory region (in this case: 16GB region).
$ echo 0xFFFFFFFFFFFFFFFF | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/bitmap/bitmap0
$ echo 0xFFFFFFFFFFFFFFFF | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/bitmap/bitmap1
# Track R/W requests.
$ echo 3 | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/config/m2s_req_to_track
# Hotness tracking features (2: interrupt on hotlist overflow).
$ echo 2 | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/config/flags
# Set the hotness tracking unit size to 256MB(2^28).
$ echo 28 | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/config/unit_size
# Enable one of the reporting modes that the device supports.
$ echo 1 | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/config/reporting_mode
Step 2: Turn the CHMU On/Off
# Enable CHMU.
$ echo 1 | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/config/control
# Disable CHMU.
$ echo 0 | sudo tee /sys/bus/cxl/devices/hmu_mem0.0/config/control
# When CHMU is running, we can check it through kernel messages.
$ dmesg -w
...
[ 49.395460] cxl_mem mem0: hmu interrupt: 34
[ 49.395467] cxl_mem mem0: hmu0 hotlist register: 0x80
[ 49.395510] cxl_mem mem0: head: 0, tail: 63
[ 49.395534] cxl_mem mem0: dpa: 0x2f0000000, hpa: 0xd80000000
[ 49.395546] cxl_mem mem0: dpa: 0x1f0000000, hpa: 0xc80000000
...
[ 49.396450] cxl_mem mem0: dpa: 0xc0000000, hpa: 0xb50000000
[ 49.396465] cxl_mem mem0: dpa: 0x200000000, hpa: 0xc90000000
[ 49.396480] Device's head: 63, tail: 63
...