Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix IO might be hanging in batch completion storage system. #1466

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

lausaa
Copy link

@lausaa lausaa commented Sep 8, 2022

Some storage system could not complete the IOs until an entire batch of IOs have been submitted. Therefore, the in-flight IOs will be hanging in io_u_quiesce if they are less than a batch, and the next IO cannot be scheduled. For this case, the option iodepth_batch_complete_omit for the number of in-flight IOs that need not be retrieved, could be used to break the hanging.

Some storage system could not complete the IOs until an entire batch
of IOs have been submitted. Therefore, the in-flight IOs will be hanging
in io_u_quiesce if they are less than a batch, and the next IO cannot be
scheduled. For this case, the option iodepth_batch_complete_omit for the
number of in-flight IOs that need not be retrieved, could be used to break
the hanging.

Signed-off-by: Fanghao Sha <shafanghao@gmail.com>
@axboe
Copy link
Owner

axboe commented Sep 8, 2022

I don't think this option makes a lot of sense, and I'm skeptical on what kind of storage system would never complete any ios until a certain amount have been submitted? That sounds pretty broken. But if that's the case, just run with a lower batch setting?

@lausaa
Copy link
Author

lausaa commented Sep 9, 2022

I don't think this option makes a lot of sense, and I'm skeptical on what kind of storage system would never complete any ios until a certain amount have been submitted? That sounds pretty broken. But if that's the case, just run with a lower batch setting?

This is a tiered system. IO requests write to the SSD firstly, and dump to the backend HDD on async mode. Generally it is based on two conditions:

  1. a certain amount of total size;
  2. timeout;

One of the above conditions can trigger to dump. It prefers a continuous stream, and the timeout is set to larger seconds...
Thus, I think it need a feature that ignore the IO completion rates to some extent when using rate limit.
I understand the io_u_quiesce is for the in-flight IO latency considered. But there may be relatively large jitter when we use rate limit in a linear fashion with fixed delays.
Uhhh... I suddenly realized that maybe we can do rate limit through the token method. :-)

@zhouyong-zy
Copy link

zhouyong-zy commented Sep 11, 2022

I don't think this option makes a lot of sense, and I'm skeptical on what kind of storage system would never complete any ios until a certain amount have been submitted? That sounds pretty broken. But if that's the case, just run with a lower batch setting?

@axboe I've read this patch, and find it maybe useful in some case:

  1. We expect to merge some IOs to write together, for better ssd/hdd performance.
  2. We are wiling to compute Erasure code, and need to merge some IOs for stripe length.

Yes, We could set lower batch for this merge. But it may effect the purpose to get better batch IO bandwidth and EC penalty.

So how do you the the mechanism behind io_u_quiesce for such cases? do you suggest we need another option to control?
If not, could you give some guidance how to handle it?

@axboe
Copy link
Owner

axboe commented Sep 11, 2022

We expect to merge some IOs to write together, for better ssd/hdd performance.

But then you expect all of those to complete at the same time anyway, so it won't really change much (if anything) in terms of how the device behaves in completions.

We are wiling to compute Erasure code, and need to merge some IOs for stripe length.

But that happens behind the scenes, not IO that fio has issued nor would be waiting for.

I think I'd have less of a problem with this patch if it was more intuitive what it does. It's also not documented at all in fio.1 or the HOWTO. On top of that, it has the potential to skew timing as well. We're not quiescing anymore, only partially, if that is set.

@lausaa
Copy link
Author

lausaa commented Sep 12, 2022

On top of that, it has the potential to skew timing as well. We're not quiescing anymore, only partially, if that is set.

Yes, you are right. I think this patch could make the rate limit more even and stable, but at the cost of skewing some of the latency, if it is set.
But in our cases, maybe we have no a better way to neither skew the latency nor hang for a long time on quiescing. In other words, this option is suitable for the scenarios that are insensitive to latency and prefer a smooth data flow.
I'll document these.

@lausaa
Copy link
Author

lausaa commented Sep 14, 2022

@axboe I have updated the fio.1 and HOWTO. But could the changes to the documents cause the CI AppVeyor to fail?
-- CI test is successful after a forced push of a typo fix. ;-))

Introduce the effect and scenes of the option.

Signed-off-by: Fanghao Sha <shafanghao@gmail.com>
@lausaa
Copy link
Author

lausaa commented Sep 16, 2022

Could we proceed with this PR? @axboe

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants