Will the performance of io_uring be better than that of spdk under the same traffic model in the latest kernel? #1153

wwwwxxxxhhhh · 2024-05-22T02:00:07Z

wwwwxxxxhhhh
May 22, 2024

I tested io_uring and spdk using the same traffic model and found that the performance of io_uring was slightly higher than that of spdk. However, when testing io_uring, the CPU usage reached nearly 500%, although I only set one job。
The traffic model is as follows：
[global]
ioengine=io_uring
sqthread_poll=1
direct=1
time_based
group_reporting
bs=4k
rw=randwrite
rwmixread=70
numjobs=1
iodepth=64
runtime=1800
ramp_time=60

[job1]
filename=/dev/nvme1n1
name=drive1

axboe · 2024-05-22T02:28:25Z

axboe
May 22, 2024
Maintainer

That job should not be 500% CPU, it should be 1 SQPOLL thread probably running 100% of the time, and then a userspace thread running anywhere from 0..100% of the time. In other words, a max load of 2. Here's running that job on a fast drive:

 725070 root      20   0  220860  27928   2048 R  99.9   0.0   0:09.29 iou-sqp-7250+ 
 725069 root      20   0  220860  27928   2048 R  99.3   0.0   0:09.27 fio

and everything else mostly idle, which is as expected, and it's doing about 1.5M IOPS here. If you're seeing 500% CPU usage, which I'm assuming is 5 things running all the time, then you have something else running.

In general, for peak performance, you don't want sqpoll. You either want hipri=1, which is polled IO, or just plain IRQ driven (don't set anything but engine=io_uring and direct=1). For hipri, you want to configure your nvme drive(s) with polled queues, using the nvme parameter poll_queues. If you look in dmesg, you should see something like:

[    2.074634] nvme nvme7: 119/0/16 default/read/poll queues

if configured correctly, the above shows nvme7 has 16 poll queues. This is important, for polled IO.

Outside of that, using fixedbufs=1 will help, and so will registerfiles=1. The former is the more important of the two.

3 replies

wwwwxxxxhhhh May 22, 2024
Author

I directly tested the io_uring, and the traffic model is as described earlier. The CPU situation is shown in the following figure:
28797 root 20 0 577268 22664 1316 R 530.6 0.0 2:47.14 fio

[root@localhost ~]# pstree -p 28797
fio(28797)─┬─{fio}(28798)
├─{fio}(29694)
├─{fio}(29776)
├─{fio}(29778)
├─{fio}(29780)
├─{fio}(29787)
├─{fio}(29789)
├─{fio}(29790)
├─{fio}(29791)
├─{fio}(29792)
├─{fio}(29793)
├─{fio}(29794)

[root@localhost ~]# dmesg |grep queues
[ 8.003606] nvme nvme1: 48/0/0 default/read/poll queues
[ 8.004417] nvme nvme0: 48/0/0 default/read/poll queues
[ 8.792317] megaraid_sas 0000:17:00.0: Max firmware commands: 1516 shared with nr_hw_queues = 48

Performance reaches 57W iops

wwwwxxxxhhhh May 22, 2024
Author

When my traffic model is not configured with sqthread_poll=1, the performance drops to 48W iops and the CPU usage is as follows:

37847 root 20 0 577260 22628 1288 R 323.3 0.0 2:10.85 fio
[root@localhost ~]# ps -aux |grep sqp
root 39485 0.0 0.0 221636 2356 pts/3 S+ 12:41 0:00 grep --color=auto sqp

[root@localhost wxh_shell_copy]# fio io_uring.conf
drive1: (g=0): rw=randwrite, bs=(R) 4096B-4096B, (W) 4096B-4096B, (T) 4096B-4096B, ioengine=io_uring, iodepth=64
fio-3.29
Starting 1 process
Jobs: 1 (f=1): [w(1)][19.9%][w=1890MiB/s][w=484k IOPS][eta 24m:49s]

[root@localhost wxh_shell_copy]# cat io_uring.conf
[global]
ioengine=io_uring
direct=1
time_based
group_reporting
bs=4k
rw=randwrite
rwmixread=70
numjobs=1
iodepth=64
runtime=1800
ramp_time=60

[job1]
filename=/dev/nvme1n1
name=drive1

wwwwxxxxhhhh May 22, 2024
Author

As described above, when I do not configure sqthread_poll=1, the CPU becomes 323. Therefore, according to previous tests, when configuring sqthread_poll, 200% of the CPU will be used, and the remaining CPU will be 323%, which should be Fio occupying 0-100%. I am not sure what CPU is being used.

axboe · 2024-05-22T16:38:22Z

axboe
May 22, 2024
Maintainer

I don't know what 'W' means in terms of IOPS.

I think you may have a crappy nvme device with a low queue depth. Can you try and paste the output of:

$ grep . /sys/block/nvme0n1/queue/*

as the only explanation here would be that the io depth exceeds the device depth.

3 replies

wwwwxxxxhhhh May 23, 2024
Author

[root@localhost ~]# grep . /sys/block/nvme1n1/queue/*
/sys/block/nvme1n1/queue/add_random:0
/sys/block/nvme1n1/queue/chunk_sectors:0
/sys/block/nvme1n1/queue/dax:0
/sys/block/nvme1n1/queue/discard_granularity:512
/sys/block/nvme1n1/queue/discard_max_bytes:2199023255040
/sys/block/nvme1n1/queue/discard_max_hw_bytes:2199023255040
/sys/block/nvme1n1/queue/discard_zeroes_data:0
/sys/block/nvme1n1/queue/fua:0
/sys/block/nvme1n1/queue/hw_sector_size:512
/sys/block/nvme1n1/queue/io_poll:0
/sys/block/nvme1n1/queue/io_poll_delay:0
/sys/block/nvme1n1/queue/iostats:0
/sys/block/nvme1n1/queue/logical_block_size:512
/sys/block/nvme1n1/queue/max_discard_segments:256
/sys/block/nvme1n1/queue/max_hw_sectors_kb:128
/sys/block/nvme1n1/queue/max_integrity_segments:0
/sys/block/nvme1n1/queue/max_sectors_kb:128
/sys/block/nvme1n1/queue/max_segments:33
/sys/block/nvme1n1/queue/max_segment_size:4294967295
/sys/block/nvme1n1/queue/minimum_io_size:512
/sys/block/nvme1n1/queue/nomerges:0
/sys/block/nvme1n1/queue/nr_requests:128
/sys/block/nvme1n1/queue/nr_zones:0
/sys/block/nvme1n1/queue/optimal_io_size:0
/sys/block/nvme1n1/queue/physical_block_size:512
/sys/block/nvme1n1/queue/read_ahead_kb:128
/sys/block/nvme1n1/queue/rotational:0
/sys/block/nvme1n1/queue/rq_affinity:0
/sys/block/nvme1n1/queue/scheduler:none
/sys/block/nvme1n1/queue/stable_writes:0
grep: /sys/block/nvme1n1/queue/wbt_lat_usec: Invalid argument
/sys/block/nvme1n1/queue/write_cache:write through
/sys/block/nvme1n1/queue/write_same_max_bytes:0
/sys/block/nvme1n1/queue/write_zeroes_max_bytes:131072
/sys/block/nvme1n1/queue/zone_append_max_bytes:0
/sys/block/nvme1n1/queue/zoned:none

wwwwxxxxhhhh May 23, 2024
Author

Sorry, I didn't express myself clearly. My W represents 0000, for example, 48W iops represents 480000 iops.

axboe May 23, 2024
Maintainer

Sorry, I didn't express myself clearly. My W represents 0000, for example, 48W iops represents 480000 iops.

Please don't invent suffixes, the industry norm here would be to use 48K IOPS to mean 48000 IOPS.

axboe · 2024-05-23T13:00:42Z

axboe
May 23, 2024
Maintainer

What kernel are you using?

5 replies

wwwwxxxxhhhh May 23, 2024
Author

5.10.0-136.74.0.154.oe2203sp1.x86_64

axboe May 23, 2024
Maintainer

5.10.0-136.74.0.154.oe2203sp1.x86_64

Please update to something that isn't ancient, this discussion isn't going to be fruitful if you're running something that is several years old.

wwwwxxxxhhhh May 23, 2024
Author

I see that the latest kernel version of Euler system 5.10 is this one. I'm not sure what you mean by updating to the latest kernel version. What do you mean by upgrading to the latest kernel version?

axboe May 23, 2024
Maintainer

5.10 is from 2020, and particularly io_uring has improved a lot since then. We can't support anything that isn't at least 6.x in terms of kernel version.

wwwwxxxxhhhh May 23, 2024
Author

Okay, I understand. Thank you. I will directly upgrade to a kernel version 6. x or higher for verification. Thank you very much.

wwwwxxxxhhhh · 2024-05-23T13:33:46Z

wwwwxxxxhhhh
May 23, 2024
Author

I conducted an experiment and when I set iodepth=1, the corresponding CPU drops. Therefore, I speculate that the extra threads should be used for kernel or user mode processing to complete io, which is also known as polling to complete io threads? But I'm a bit unsure how to set the number of threads?

0 replies

wwwwxxxxhhhh · 2024-05-23T13:56:39Z

wwwwxxxxhhhh
May 23, 2024
Author

I conducted another experiment and wrote a demo program to call the disk write operation of the library interface. I configured the params. flags=IORING SETUP IOPOLL | IORING SETUP SQPOLL | IORING SETUP SQ OFF; Io_uring-register_iowq-aff (&ring1, Sizeof (mask),&mask, Io_uring-register_iowq_max_workers (&ring1,&threadnum), when I set fd=open (disk_filename, O_RDWR| O_DIRECT)), At this point, the CPU usage is less than 300%, which is in line with expectations (SQ occupies one thread, Worker occupies one thread, one demo thread)

But when the current surface conditions remain unchanged, I set fd=open (disk_filename, O_RDWR）， At this point, the larger the iodepth, the more threads there are, The demo thread and the SQ thread each occupy one constant, and I am not sure what type of thread the remaining threads are.

When I changed the configuration to the following fd=open (disk_filename, O-RDWR), params. flags=IORING-SETUP-SQPOLL | IORING-SETUP-SQ-OFF, io_uring-register_iowq-aff (&ring1, Sizeof (mask),&mask, The number of threads displayed in io_uring-register_iowq_max_workers (&ring1,&threadnum) is still as expected, even if the iodepth increases, The CPU ratio is also correct.

In summary, I guess which threads change with iodepth should be in user mode (program queries CQE in CQ queue) or kernel mode (SQE ->CQE) used to process threads that complete io.

0 replies

axboe · 2024-05-23T15:34:21Z

axboe
May 23, 2024
Maintainer

Re-open if necessary once you're on a more recent kernel.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Will the performance of io_uring be better than that of spdk under the same traffic model in the latest kernel? #1153

{{title}}

Replies: 6 comments 11 replies

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

{{title}}

Select a reply

Will the performance of io_uring be better than that of spdk under the same traffic model in the latest kernel? #1153

wwwwxxxxhhhh May 22, 2024

Replies: 6 comments · 11 replies

axboe May 22, 2024 Maintainer

wwwwxxxxhhhh May 22, 2024 Author

wwwwxxxxhhhh May 22, 2024 Author

wwwwxxxxhhhh May 22, 2024 Author

axboe May 22, 2024 Maintainer

wwwwxxxxhhhh May 23, 2024 Author

wwwwxxxxhhhh May 23, 2024 Author

axboe May 23, 2024 Maintainer

axboe May 23, 2024 Maintainer

wwwwxxxxhhhh May 23, 2024 Author

axboe May 23, 2024 Maintainer

wwwwxxxxhhhh May 23, 2024 Author

axboe May 23, 2024 Maintainer

wwwwxxxxhhhh May 23, 2024 Author

wwwwxxxxhhhh May 23, 2024 Author

wwwwxxxxhhhh May 23, 2024 Author

axboe May 23, 2024 Maintainer

wwwwxxxxhhhh
May 22, 2024

Replies: 6 comments 11 replies

axboe
May 22, 2024
Maintainer

wwwwxxxxhhhh May 22, 2024
Author

wwwwxxxxhhhh May 22, 2024
Author

wwwwxxxxhhhh May 22, 2024
Author

axboe
May 22, 2024
Maintainer

wwwwxxxxhhhh May 23, 2024
Author

wwwwxxxxhhhh May 23, 2024
Author

axboe May 23, 2024
Maintainer

axboe
May 23, 2024
Maintainer

wwwwxxxxhhhh May 23, 2024
Author

axboe May 23, 2024
Maintainer

wwwwxxxxhhhh May 23, 2024
Author

axboe May 23, 2024
Maintainer

wwwwxxxxhhhh May 23, 2024
Author

wwwwxxxxhhhh
May 23, 2024
Author

wwwwxxxxhhhh
May 23, 2024
Author

axboe
May 23, 2024
Maintainer