-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
zpool clear hang when resuming suspended pool #6709
Comments
Please don't remove the issue template.
This should be fixed in 3f759c0, trying to
If the disk gets back online i can reopen the pool. |
@loli10K apologies, I hadn't posted an issue in a while and didn't read the template instructions carefully enough. I've attempted to resolve that in my original post. Thanks for the response. #6399 seemed to indicate that the fix you referenced was in 0.7.1, but the commit does appear to be tagged 0.7.2. If that's correct, then I'll just upgrade to 0.7.2, see what happens, and we can close the issue. |
3f759c0 landed in zfs-0.7.0, you should have that commit. When |
Got it, I'll do that. |
Ok, when it happened this time, a "zpool clear backup" just gave me "cannot clear errors for backup: I/O error". Is that right? I thought we'd be able to clear and remove pools with this issue. |
@bjquinn that's expected, Removing a suspended pool from the system without reattaching the USB device is not supported at this time, #5242.
|
Ok thanks. In this case, I did have the same device reattached, and still got the I/O error. It took a reboot to clear up the issue. |
not for me. using 0.7.5: I've just encountered a case where after the kernel message setup: three way mirror on top of veracrypt volumes
result: |
Interestingly this happened to me again this week. Initially, a zpool clear immediately recovered the pool, which at that point said it had some errors and started a scrub. That's the first time I've seen a pool recover from a suspended state like that. I don't know what the result of the scrub would have been, since it ran for a couple hours, and then the zpool commands started hanging again and I had to do a hard reboot. I'm still on 0.7.1. |
I believe this has already been reported and is being tracked in #6649. |
I believe this only allows zpool clear to not block, but doesn't fix the real issue, that when the device is reattached the pool I/O cannot be resumed. It is most likely because the reattached device will end up with different minor number (e.g. sdb instead of sda). How does it get "reattached" to the pool then? There needs to be a way to either replace the device with the new instance, or to export and import the pool without rebooting (preferably both). |
@zviratko Yes, I believe you're right that this doesn't fix the real issue. Additionally, I have definitely run into the issue where zpool clear doesn't hang anymore, but the device still won't reattach even though I'm sure it came up as the same device name (sdc/sdd, etc.), and anyway I'm obviously not using the /dev/sdb device names to create the pool. |
@zviratko the root cause is that zfs code lacks the ability to drop on demand a suspended pool from memory, and in my understanding it's something tricky to accomplish. you need to follow this issue then only when that issue is solved, all other suspended/unavail pool related issues will be solved. |
|
Mine's whatever ships with ubuntu 18.04.1 on 4.15.0-36-generic. libzfs2linux/bionic-updates,now 0.7.5-1ubuntu16.4 amd64 [installed,automatic] |
|
System information
Describe the problem you're observing
Similar to #3256. Flaky USB enclosure (I assume) causes USB drive to temporarily disconnect and reconnect (in this specific case, disconnects as sdd, reconnects as sde). Pool is built with by-id names. Making sure drive is reconnected and issuing a zpool clear just hangs, even if the drive is disconnected long enough to bring it back up as sdd again.
Re-opening a new bug as that was what was suggested in #3256 if this issue recurred with 0.7.1. Would really prefer zpool export -F for suspended pools, but zpool clear would be sufficient if it would work.
EDIT: I should add that the current state is still an improvement over 0.6.5.x, as a system in this state can run zpool/zfs commands on other pools, as well as a zpool list or zpool status affecting the suspended pool, which was not always the case when this happened previously. Also, for the first time, I experienced a clean, normal reboot that didn't hang, whereas previously a hard reboot was always required due to hanging on shutdown while trying to unmount the suspended pool.
Describe how to reproduce the problem
I have a backup script that runs fairly heavly zfs send and rsync to an external SATA drive in a USB enclosure. Once every month or two, the drive will drop out during the backup process and get into the state as described above. When this happens, you can see something like the following logged at the console --
... which is an indication that the drive dropped, but came back.
Include any warning/errors/backtraces from the system logs
The text was updated successfully, but these errors were encountered: