-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ZTS: Fix zpool_reopen_001_pos #9680
Conversation
Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
f76f598
to
f63b930
Compare
With the debugging from the PR and the CI logs I was able to verify that the 500ms timeout was in fact reached. Given that, I've increased the timeout to 1000ms to see if the issue can still be reproduced. |
Codecov Report
@@ Coverage Diff @@
## master #9680 +/- ##
========================================
+ Coverage 79% 79% +<1%
========================================
Files 418 418
Lines 123572 123575 +3
========================================
+ Hits 97956 98193 +237
+ Misses 25616 25382 -234
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While we can't be 100% sure it solves the bug in all edgecases.
(can we ever be 100% certain?)
I think this is at least more robust so, LGTM.
@behlendorf If you mark it "not draft, ready for review as real PR" maybe remove the "WIP" tag and replace it with "review requested"? |
Doubling the timeout appears to have resolved the issue. After running this PR through many times I haven't see this failure. At a minimum it does appear to reduce the frequency. |
Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9680 Conflicts:
Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9680 Conflicts:
Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes #9680 Conflicts:
Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9680
Update the vdev_disk_open() retry logic to use a specified number of milliseconds to be more robust. Additionally, on failure log both the time waited and requested timeout to the internal log. The default maximum allowed open retry time has been increased from 500ms to 1000ms. Reviewed-by: Kjeld Schouten <kjeld@schouten-lebbing.nl> Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov> Closes openzfs#9680
Motivation and Context
Debug CI failures of zpool_reopen_001_pos. I wasn't able to reproduce
this locally so I've updated the code to provide additional debugging.
http://build.zfsonlinux.org/builders/Ubuntu%2018.04%20x86_64%20%28TEST%29/builds/6326
There default timeout values have not been yet changed since I'd like to
reproduce this issue if possible.
note: Hitting the open timeout has only been observed with test that
use a scsi_debug device. Other occasional test failures appear to be
encountering the same issue.
Description
Update the vdev_disk_open() retry logic to use a specified number
of milliseconds to be more robust. Additionally, log both the
time waited and requested timeout to the internal log for debugging.
How Has This Been Tested?
Locally by running the
zpool_reopen
ZTS tests. Pending full CI run.Types of changes
Checklist:
Signed-off-by
.