Small reads in IOCP don't put enough pressure on the disk, so results can be misleading. #118

jimon · 2019-09-09T10:05:09Z

That information is coming from my private research on IOCP performance, but looking on how diskspd is implemented, it might be a case it's also prone to similar issues.

Given 4k/8k/16k/32k reads and given NVMe drive with high enough performance ( tested on samsung 970 pro ), ReadFile(Ex) takes more time than GetQueuedCompletionStatus. Meaning that operations complete faster than we can schedule practically them in one thread.

This can be observed by making two threads, one calling ReadFileEx and incrementing an atomic counter (queue depth), another thread calling GetQueuedCompletionStatus and decrementing same counter. The counter rarely gets above 1.

This is a graph showing "tried schedule depth" vs "actual mean read queue depth", only 64k/128k/1024k requests actually scale with schedule depth, while smaller ones are not.

Given that CrystalDiskMark is actively using diskspd in 4K Q32T1 configuration, the results can be hugely misleading, because there is very little difference between Q32T1 and smaller queue depths as actual disk pressure barely reaches >1. In my opinion diskspd should report mean queue depth so tools like CrystalDiskMark can better visualize what is going on.

The text was updated successfully, but these errors were encountered:

dl2n · 2019-09-09T19:32:15Z

This is a fairly well known and normal effect usually spoken of as being CPU limited. The single thread in this case is unable to sustain the target queue depth given the CPU speed and cost of issuing IO operations, coupled with the speed of the storage device. I would strongly hope CrystalDiskMark is capable of recognizing when this happens: all the information is there. What you’re asking for could be represented as a distribution per thread of the #times an IO issued/completed with a given #IOs outstanding. In the full Windows ecosystem, the Windows Performance Analyzer makes these directly visible per disk IO in the trace. In the performance counters, its available in aggregate as the average queue depth at PhysicalDisk via Little’s Law. In fact, Little’s Law should generally work to fact check the results. Its simply the relationship between latency (available per thread) and IOPS. I am hesitant to add more derived statistics to what DISKSPD reports, rather focusing on directly measured. For instance, I have some data in a result I just saw for a run using -t<some number> -o2. Thread 0 produced: 31,120 IOPS @ 0.064ms / ea -> 31120*.000064 =~ 1.992 average queue depth. Another thread produced 33, 848 @ 0.059ms =~ 1.997 These obviously agree quite well with -o2. I suspect if you do the same with your results you’ll see the average queue depth begin decoupling from target queue depth right where you expect it. https://en.wikipedia.org/wiki/Little%27s_law The distribution of @initiation/completion queue depth could be interesting, but there I’d be hesitant to add additional measurement costs for something which has very limited known use cases, and which is directly measurable using ecosystem tooling. A runtime warning might be appropriate if one or more threads appear to not have issued the full target queue depth … hmm. Sent from Mail<https://go.microsoft.com/fwlink/?LinkId=550986> for Windows 10

…

________________________________ From: Dmytro Ivanov <notifications@github.com> Sent: Monday, September 9, 2019 3:05:12 AM To: microsoft/diskspd <diskspd@noreply.github.com> Cc: Subscribed <subscribed@noreply.github.com> Subject: [microsoft/diskspd] Small reads in IOCP don't put enough pressure on the disk, so results can be misleading. (#118) That information is coming from my private research on IOCP performance, but looking on how diskspd is implemented, it might be a case it's also prone to similar issues. Given 4k/8k/16k/32k reads and given NVMe drive with high enough performance ( tested on samsung 970 pro ), ReadFile(Ex) takes more time than GetQueuedCompletionStatus. This can be observed by making two threads, one calling ReadFileEx and incrementing an atomic counter (queue depth), another thread calling GetQueuedCompletionStatus and decrementing same counter. The counter rarely gets above 1. [image]<https://user-images.githubusercontent.com/1333661/64521840-055a3b80-d2f9-11e9-834b-d097f869775b.png> This is a graph showing "tried schedule depth" vs "actual mean read queue depth", only 64k/128k/1024k requests actually scale with schedule depth, while smaller ones are not. Given that CrystalDiskMark is actively using diskspd in 4K Q32T1 configuration, the results can be hugely misleading, because there is very little difference between Q2T1 and Q32T1 as actual disk pressure barely reaches >1. In my opinion diskspd should report mean queue depth so tools like CrystalDiskMark can better visualize what is going on. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub<#118?email_source=notifications&email_token=ACCWSKLVTYP3IA6UQP5DAQLQIYNVRA5CNFSM4IUZGTZ2YY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HKEDFWA>, or mute the thread<https://github.com/notifications/unsubscribe-auth/ACCWSKJOQ7O7ECHFA4KGG7TQIYNVRANCNFSM4IUZGTZQ>.

jimon · 2019-09-10T09:26:07Z

Yes, it is completely normal behavior, just that there is no warning/feedback.
Runtime warning would be reasonable solution! 👍

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Small reads in IOCP don't put enough pressure on the disk, so results can be misleading. #118

Small reads in IOCP don't put enough pressure on the disk, so results can be misleading. #118

jimon commented Sep 9, 2019 •

edited

Loading

dl2n commented Sep 9, 2019 via email

jimon commented Sep 10, 2019

Small reads in IOCP don't put enough pressure on the disk, so results can be misleading. #118

Small reads in IOCP don't put enough pressure on the disk, so results can be misleading. #118

Comments

jimon commented Sep 9, 2019 • edited Loading

dl2n commented Sep 9, 2019 via email

jimon commented Sep 10, 2019

jimon commented Sep 9, 2019 •

edited

Loading