-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix IO might be hanging in batch completion storage system. #1466
base: master
Are you sure you want to change the base?
Conversation
Some storage system could not complete the IOs until an entire batch of IOs have been submitted. Therefore, the in-flight IOs will be hanging in io_u_quiesce if they are less than a batch, and the next IO cannot be scheduled. For this case, the option iodepth_batch_complete_omit for the number of in-flight IOs that need not be retrieved, could be used to break the hanging. Signed-off-by: Fanghao Sha <shafanghao@gmail.com>
I don't think this option makes a lot of sense, and I'm skeptical on what kind of storage system would never complete any ios until a certain amount have been submitted? That sounds pretty broken. But if that's the case, just run with a lower batch setting? |
This is a tiered system. IO requests write to the SSD firstly, and dump to the backend HDD on async mode. Generally it is based on two conditions:
One of the above conditions can trigger to dump. It prefers a continuous stream, and the timeout is set to larger seconds... |
@axboe I've read this patch, and find it maybe useful in some case:
Yes, We could set lower batch for this merge. But it may effect the purpose to get better batch IO bandwidth and EC penalty. So how do you the the mechanism behind io_u_quiesce for such cases? do you suggest we need another option to control? |
But then you expect all of those to complete at the same time anyway, so it won't really change much (if anything) in terms of how the device behaves in completions.
But that happens behind the scenes, not IO that fio has issued nor would be waiting for. I think I'd have less of a problem with this patch if it was more intuitive what it does. It's also not documented at all in fio.1 or the HOWTO. On top of that, it has the potential to skew timing as well. We're not quiescing anymore, only partially, if that is set. |
Yes, you are right. I think this patch could make the rate limit more even and stable, but at the cost of skewing some of the latency, if it is set. |
@axboe I have updated the fio.1 and HOWTO. But could the changes to the documents cause the CI AppVeyor to fail? |
Introduce the effect and scenes of the option. Signed-off-by: Fanghao Sha <shafanghao@gmail.com>
Could we proceed with this PR? @axboe |
Some storage system could not complete the IOs until an entire batch of IOs have been submitted. Therefore, the in-flight IOs will be hanging in io_u_quiesce if they are less than a batch, and the next IO cannot be scheduled. For this case, the option iodepth_batch_complete_omit for the number of in-flight IOs that need not be retrieved, could be used to break the hanging.