-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add delay between zpool add zvol and zpool destroy #14052
Conversation
Signed-off-by: Youzhong Yang <yyang@mathworks.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We must never hit this in the CI because the ZED is always stopped when running the test suite and then started/stopped as needed by specific test cases. Out of curiosity are there other test cases which are a problem? I'd think having multiple instances of the ZED running could potentially cause other unexpected failures.
I'm fine with applying this workaround, but let's go ahead and leave the original issue open until the root cause can also be resolved.
I will run a full test suite and report back. |
I ran the full suite twice, one on a VM with root on ext4, another on a VM with root on zfs, no more hang was observed. |
@youzhongyang good to know, thanks. |
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if the timing is not right. This is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy. This change adds a delay between zpool add zvol and zpool destroy to avoid these issue, but does not address the underlying problem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Issue openzfs#14026 Closes openzfs#14052
I suspect that a less racy solution would be to start a scrub and use the zpool wait command to wait until the scrub has finished. That in theory should close the race without relying on a sleep. I have not looked at the test to see if that would be okay for the purposes of the test. I am just posting my initial thought to share for others that might have more time to pursue this. @youzhongyang I might get to this at some point, but that would be at least a couple weeks away. Feel free to consider the idea ahead of that and if it is good, implement it in a follow up patch. |
@youzhongyang I investigated this and the issue seems to be related to zvol instead of zed, however, zed is now able to reproduce this issue as we post a change event once a vdev is added 55c1272, which causes zed to call
The above while loop would reproduce the same issue in just 1-2 iterations with zed disabled. In the above scenario, |
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if the timing is not right. This is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy. This change adds a delay between zpool add zvol and zpool destroy to avoid these issue, but does not address the underlying problem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Issue openzfs#14026 Closes openzfs#14052
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if the timing is not right. This is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy. This change adds a delay between zpool add zvol and zpool destroy to avoid these issue, but does not address the underlying problem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Issue openzfs#14026 Closes openzfs#14052
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if the timing is not right. This is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy. This change adds a delay between zpool add zvol and zpool destroy to avoid these issue, but does not address the underlying problem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Issue openzfs#14026 Closes openzfs#14052
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if the timing is not right. This is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy. This change adds a delay between zpool add zvol and zpool destroy to avoid these issue, but does not address the underlying problem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Issue openzfs#14026 Closes openzfs#14052
As investigated by openzfs#14026, the zpool_add_004_pos can reliably hang if the timing is not right. This is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy. This change adds a delay between zpool add zvol and zpool destroy to avoid these issue, but does not address the underlying problem. Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Youzhong Yang <yyang@mathworks.com> Issue openzfs#14026 Closes openzfs#14052
Signed-off-by: Youzhong Yang yyang@mathworks.com
Motivation and Context
As investigated by #14026, the zpool_add_004_pos can reliably hang if the timing is not right. It is caused by a race condition between zed doing zpool reopen (due to the zvol being added to the zpool), and the command zpool destroy.
Description
This PR adds a delay between zpool add zvol and zpool destroy.
How Has This Been Tested?
Manually run zpool_add_004_pos test case. Without the fix, it can easily hang; with the fix, it's no longer reproducible.
Types of changes
Checklist:
Signed-off-by
.