Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nvme: Add support for dsm command latency option #2149

Closed
wants to merge 10 commits into from

Conversation

ikegami-t
Copy link
Contributor

No description provided.

@ikegami-t
Copy link
Contributor Author

Now I am thinking to add the latency command option for all other nvme commands also.

@igaw
Copy link
Collaborator

igaw commented Dec 12, 2023

I am not so keen on adding performance and debugging code everywhere. Makes the code base bigger/... for small gain. Someone pointed out to me that we could try to make use of user trace libraries such as lttng-ust which then lives outside of the main code base. Haven't really spend time playing with this yet. But I think this would better for these cases someone really wants to trace things.

@ikegami-t
Copy link
Contributor Author

ikegami-t commented Dec 12, 2023

The purpose is not for debugging but for adding performance statistics since currently the latency option is only added for passthru() (called by io_passthru() and admin_passthru()) and submit_io() (called by compare(), read_cmd() and write_cmd()) so tried to add for the dsm command also. By the way I thought the option can be implemented as same with the verbose and output-format as common option (I think this can be reduced the code size). Still should we use the tool lttng-ust or etc.?

(Add)
For debugging purpose -v option is also already output latency value as below so duplicated values but the values are not same. Anyway should be considered more so let me do it.

tokunori@tokunori-desktop:~/nvme-cli$ sudo .build/nvme dsm /dev/nvme0n1 -b 256 -t -v
opcode       : 09
flags        : 00
rsvd1        : 0000
nsid         : 00000001
cdw2         : 00000000
cdw3         : 00000000
data_len     : 00000010
metadata_len : 00000000
addr         : 556955f96000
metadata     : 0
cdw10        : 00000000
cdw11        : 00000000
cdw12        : 00000000
cdw13        : 00000000
cdw14        : 00000000
cdw15        : 00000000
timeout_ms   : 00000000
result       : 00000000
err          : 0
latency      : 72 us
NVMe DSM: success
 latency: 131 us

@ikegami-t ikegami-t force-pushed the nvme-dsm-latency branch 2 times, most recently from 4186b1e to 8dd1e71 Compare December 18, 2023 17:17
The purpose to add is for adding performance statistics.
Currently the latency option is only added for passthru() and submit_io().
So add for the dsm command also.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Also change passthru short option -T to -t as same with submit_io and dsm.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Since duplicated some other short options -t.
  Note: Changed get-log command lpo short option -L to -O also.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
@ikegami-t
Copy link
Contributor Author

ikegami-t commented Dec 27, 2023

For the build check needed libnvme update to add add nvme_cli_set_latency().

Previously the latency is output by libnvme debug feature.
So combine the nvme-cli and libnvme latency outputs.
For the latency output added API in libnvme to split from debug.
Note: Needed the libnvme update to add the API to output latency.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
The latency flag set by parse_and_open() but not used by the commands.
Since argconfig_parse() used instead so set the flag by the commands.
  Note: This fix sets the debugging flag also for the commands.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Since the nvme commands using NVME_ARGS() macro added the latency option.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
Note: Commands using NVME_ARGS are added the option as default.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
The functions are added in libnvme so merge the duplicated functions.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
This includes some minor changes related.

Signed-off-by: Tokunori Ikegami <ikegami.t@gmail.com>
@igaw
Copy link
Collaborator

igaw commented Jan 18, 2024

The purpose is not for debugging but for adding performance statistics

As I pointed out, please try to figure out first if we can't do this without adding printf code on every single function. We have really great tools for gathering performance stats, no need for NIH. There are tracing frameworks and great tools like bpftrace for exactly this functionality.

@ikegami-t
Copy link
Contributor Author

ikegami-t commented Jan 20, 2024

Thank you so much for your advice! Just confirmed the tool bpftrace works as expected as below.

  1. dsm command latency option: 3,480 us
root@tokunori-desktop:/home/tokunori/nvme-cli-2# .build/nvme dsm /dev/nvme0n1 -b 0x100000,0x100000,0x100000,0x100000 -s 0x0,0x100000,0x200000,0x300000 -L -d
ioctl: 4e40 (ID) latency: 3 us
IO Command opcode: 09 (Dataset Management) latency: 3480 us
NVMe DSM: success
  1. bpftrace: 3,456 us
root@tokunori-desktop:/home/tokunori# bpftrace -e 'kprobe:nvme_submit_user_cmd { @start = nsecs; } tracepoint:nvme:nvme_complete_rq /@start != 0/ { printf("latency: %u usec\n", (nsecs - @start) / 1e3); @start = 0; }'
Attaching 2 probes...
latency: 3456 usec
^C

@start: 0

Do you mean we can add the bpftrace tools bt script file for the latency checking instead of the option? By the way seems the tool depended on the kernel version environment but okay for using to check the nvme-cli latency? Also currently the latecny option is used by the commonds calling submit_io() and passthru() but okay to remain them?

(Add)
Note: The latency checking is not for the nvme-cli itself performance statisics purpose but for the nvme device performence stastics. (Probably you have already understood this but to make sure let me do again.)

Note 2: The above bpftrace program possibly does not work corretly if other command tracepoint:nvme:nvme_complete_rq traced after the nvme-cli command kprobe:nvme_submit_user_cmd probed as incorrectly output the latency as short time.

@igaw
Copy link
Collaborator

igaw commented Jan 22, 2024

Do you mean we can add the bpftrace tools bt script file for the latency checking instead of the option?

Yes, that is what I had in mind.

By the way seems the tool depended on the kernel version environment but okay for using to check the nvme-cli latency?

I know bpftrace has a bit of dependencies. But for who is this feature anyway? What is the use case? I don't think you need this feature for embedded device.

performance statisics purpose but for the nvme device performence stastics

I assume this is primarily interesting for the device developers. I am pretty sure they have way better tools to measure their device latencies and measuring the ioctl includes the whole kernel stack with all the noise it introduces. Also, measuring admin commands is not so interesting. The read/writes commands are interesting ones and nvme-cli is the wrong tool for this.

@igaw
Copy link
Collaborator

igaw commented Mar 7, 2024

With #2234 merged, you can get this info with increasing the verbosity level.

@igaw igaw closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants