Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs import -f -F failed after Mirror died. #3089

Closed
hvenzke opened this issue Feb 8, 2015 · 6 comments
Closed

zfs import -f -F failed after Mirror died. #3089

hvenzke opened this issue Feb 8, 2015 · 6 comments
Labels
Status: Stale No recent activity for issue Status: Understood The root cause of the issue is known Type: Defect Incorrect behavior (e.g. crash, hang) Type: Documentation Indicates a requested change to the documentation

Comments

@hvenzke
Copy link

hvenzke commented Feb 8, 2015

@brian , need your help please.

ad2:/home/rnot # zpool import

   pool: data
     id: 13401730686227579694
  state: UNAVAIL
 status: One or more devices are missing from the system.
 action: The pool cannot be imported. Attach the missing
        devices and try again.
   see: http://zfsonlinux.org/msg/ZFS-8000-6X
 config:

        data                                                         UNAVAIL  mi                                                                          ssing device
          raidz1-0                                                   ONLINE
            usb-SanDisk_Cruzer_Glide_200426056002B690AF7F-0:0-part1  ONLINE
            usb-SanDisk_Cruzer_Glide_200426056102B690AF7F-0:0-part1  ONLINE
            usb-SanDisk_Cruzer_Glide_200435136002B690ADD8-0:0-part1  ONLINE
          usb-SanDisk_Cruzer_Glide_20043513631437D2ECEA-0:0          ONLINE
          usb-SanDisk_Cruzer_Glide_20042401031437D2ECE8-0:0          ONLINE

        Additional devices are known to be part of this pool, though their
        exact configuration cannot be determined.

Mirror was called mirror1-0 and all devices of that are died past year.
Now the pool are offline , but main " raidz1-0 " mirror devices are still theer and plenty of free disk space is avalive in the pool.

failed import either with :
ad2:/home/rnot # zpool import -d /dev/disk/by-id/ data
cannot import 'data': one or more devices is currently unavailable

ad2:/home/rnot # zpool import -f -F data
cannot import 'data': one or more devices is currently unavailable

ad2:/home/rnot # rpm -qa | grep zfs
zfs-dkms-0.6.2+git.1387576353-22.1.noarch
zfs-0.6.2+git.1387576353-21.1.armv6hl
zfs-devel-0.6.2+git.1387576353-21.1.armv6hl
zfs-dracut-0.6.2+git.1387576353-21.1.armv6hl
libzfs2-0.6.2+git.1387576353-21.1.armv6hl
zfs-test-0.6.2+git.1387576353-21.1.armv6hl

.. i know i shuold probabl update .-. coming soon ,
but the error may probabaly Important to track down..

any valid ideas are welcome ...

@behlendorf
Copy link
Contributor

If I understand correctly your pool used to contain a top-level raidz1-0 vdev and a top-level mirror1-0 vdev. The raidz1 vdev looks full intact but all the drives which were part of the mirror1-0 vdev have failed. This situation would cause the issue you're describing. Basically the raidz1-0 knows there should be additional top level vdev present but because they don't exist the pool can't be imported.

Issue #852 was proposed as a feature request to attempt to handle this case. But if you really have lost both sides of that mirror the best case recovery situation here is just recovering all the data from the remaining raiz1-0 vdevs. If you could recreate the ZFS labels for the now missing mirror you could probably successfully import the pool as described in #852.

@hvenzke
Copy link
Author

hvenzke commented Feb 9, 2015

Hello BRian ,
Thanks for your View and hints ... replaced 2 usb hubs with more power and Plugged that to My ad2 NAS RPI.

one smal step forward :

ad2:~ # zpool import
   pool: data
     id: 13401730686227579694
  state: ONLINE
 status: One or more devices contains corrupted data.
 action: The pool can be imported using its name or numeric identifier.
   see: http://zfsonlinux.org/msg/ZFS-8000-4J
 config:
        data                                                         ONLINE
          raidz1-0                                                   ONLINE
            usb-SanDisk_Cruzer_Glide_200426056002B690AF7F-0:0-part1  ONLINE
            usb-SanDisk_Cruzer_Glide_200426056102B690AF7F-0:0-part1  ONLINE
            usb-SanDisk_Cruzer_Glide_200435136002B690ADD8-0:0-part1  ONLINE
          usb-SanDisk_Cruzer_Glide_20043513631437D2ECEA-0:0          ONLINE
          usb-SanDisk_Cruzer_Glide_20042401031437D2ECE8-0:0          ONLINE
          mirror-3                                                   ONLINE
            usb-SanDisk_Cruzer_Glide_2004432030163E92D56F-0:0        UNAVAIL  corrupted data
            5004468179912598650                                      UNAVAIL  corrupted data

.. was near 10 month not at home .. far far away... damn holy shit.
assuming Poweroutage(s) caused that ( found some shitz in logs ...)

I think we need here kind of tooling that more deeply allow manage
"desaster reoverymhandling " of zfs pools like

I missed sutch allready as an Solaris SA :.-S .. . realy ... in extreme worst cases ..

  • force split off death Mirror at all cost unless ONE mirror BEEN intact
  • force replace "named" devices at all costs of died Mirrors unless ONE mirror BEEN intact .

@hvenzke
Copy link
Author

hvenzke commented Feb 9, 2015

pool hardware :

64G usb sticks with the realdata on it - total real pool size : 128G
            usb-SanDisk_Cruzer_Glide_200426056002B690AF7F-0:0-part1  ONLINE
            usb-SanDisk_Cruzer_Glide_200426056102B690AF7F-0:0-part1  ONLINE
            usb-SanDisk_Cruzer_Glide_200435136002B690ADD8-0:0-part1  ONLINE
128G sticks  unused
          usb-SanDisk_Cruzer_Glide_20043513631437D2ECEA-0:0          ONLINE
          usb-SanDisk_Cruzer_Glide_20042401031437D2ECE8-0:0          ONLINE
          mirror-3                                                   ONLINE
128G sticks
            usb-SanDisk_Cruzer_Glide_2004432030163E92D56F-0:0        UNAVAIL  corrupted data
            5004468179912598650                                      UNAVAIL  corrupted data

@hvenzke
Copy link
Author

hvenzke commented Feb 9, 2015

I Think we need something equal like the as at exist since years in LVM2 ...
The ZFS ZIL need admin "help" as it can´t selfheal,
we need an way to tell it how to GO ON without data lost.

.. sutch whuold NOT help if devices are death at BOTH mirrors.

0 zfsdr --examine –assemble --scan /dev/sdb1 /dev/sdb2 /dev/sdb3 /dev/NNN --detail /etc/mdadm/mdadm-dr.conf

.. sutch shuold anlyze i.e the disks , trying to build /enablee the raid, and write infos to an config file
... anlyze equal to what "mdadm " does .

1 zfsfr -ay /etc/mdadm/mdadm-dr.conf
.. enables vg´s

2 zfsdr mirror-split
.. wuold try split of the death mirror, runs an zfs import -F after .

3 zfsdr diskreplace

  • tries lookup OFFLINE ZIL on pool healthy disks , attach & replace the defektive disk

on Linux LVM its possible to replace an defektive disk on NOT USED VG´s / MD´s
Sutch not exist yet in ZFS - or i am totaly Blind :)

@behlendorf
Copy link
Contributor

@remsnet if you've only lost ZIL devices then you can absolutely import the pool and move on. I misunderstood and thought the top-level vdev you lost was part of the primary pool. This functionality has existed for a long time and is described in the man page. It would probably be nice if the zpool import failure suggested this as a possible solution. Obviously you'll lose the last few seconds of data which were stored on the ZIL, but if BOTH devices have failed then there's no alternative.

From zpool(8):

           -m
                                Allows  a pool to import when there is a miss-
                                ing log device.

Try this:

zpool import -m data

@behlendorf behlendorf added the Type: Documentation Indicates a requested change to the documentation label Feb 10, 2015
@stale
Copy link

stale bot commented Aug 25, 2020

This issue has been automatically marked as "stale" because it has not had any activity for a while. It will be closed in 90 days if no further activity occurs. Thank you for your contributions.

@stale stale bot added the Status: Stale No recent activity for issue label Aug 25, 2020
@stale stale bot closed this as completed Nov 23, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Stale No recent activity for issue Status: Understood The root cause of the issue is known Type: Defect Incorrect behavior (e.g. crash, hang) Type: Documentation Indicates a requested change to the documentation
Projects
None yet
Development

No branches or pull requests

3 participants
@behlendorf @hvenzke and others