Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ZFS attach fails for disk referenced by-id #4144

Closed
CRWdog193 opened this issue Dec 26, 2015 · 5 comments
Closed

ZFS attach fails for disk referenced by-id #4144

CRWdog193 opened this issue Dec 26, 2015 · 5 comments

Comments

@CRWdog193
Copy link

Arch Linux with zfs-git 0.6.5.3_r0_g9aaf60b_4.2.5_1-1 (archzfs-git).

Migrating to new disks. Trying to convert (temporary!) single-disk vdevs to mirrors, I noted:

[root] # zpool attach -f data ata-ST2000LM003_HN-M201RAD_S377J9AGA05161 ata-ST2000LM003_HN-M201RAD_S377J9AGA05162
cannot attach ata-ST2000LM003_HN-M201RAD_S377J9AGA05162 to ata-ST2000LM003_HN-M201RAD_S377J9AGA05161: no such pool or dataset

Tried various other combinations (e.g. prefacing with /dev/disk/by-id etc.). No change.

However:

[root]# zpool attach -f data /dev/disk/by-id/ata-ST2000LM003_HN-M201RAD_S377J9AGA05161 /dev/sde
[root]#

[root]# zpool status -v
pool: data
state: ONLINE
status: One or more devices is currently being resilvered. The pool will
continue to function, possibly in a degraded state.
action: Wait for the resilver to complete.
scan: resilver in progress since Sat Dec 26 21:58:45 2015
103G scanned out of 5.12T at 391M/s, 3h44m to go
25.8G resilvered, 1.97% done
config:

NAME                                                   STATE     READ WRITE CKSUM
data                                                   ONLINE       0     0     0
  ata-ST2000LM003_HN-M201RAD_S377J9EGA08207            ONLINE       0     0     0
  mirror-1                                             ONLINE       0     0     0
    ata-ST2000LM003_HN-M201RAD_S377J9AGA05161          ONLINE       0     0     0
    sde                                                ONLINE       0     0     0  (resilvering)
  ata-ST2000LM003_HN-M201RAD_S377J9AGA05144            ONLINE       0     0     0
  ata-ST2000LM003_HN-M201RAD_S377J9EGA08198            ONLINE       0     0     0
logs
  ata-Samsung_SSD_850_PRO_128GB_S24ZNSAG401432A-part1  ONLINE       0     0     0
  ata-Samsung_SSD_850_PRO_128GB_S24ZNSAG406042N-part1  ONLINE       0     0     0
cache
  ata-Samsung_SSD_850_PRO_128GB_S24ZNSAG401432A-part2  ONLINE       0     0     0
  ata-Samsung_SSD_850_PRO_128GB_S24ZNSAG406042N-part2  ONLINE       0     0     0

errors: No known data errors

Somewhat unexpected...

@dasjoe
Copy link
Contributor

dasjoe commented Jan 1, 2016

Possible duplicate of #4077

@ilovezfs
Copy link
Contributor

ilovezfs commented Jan 1, 2016

Someone in IRC had the same issue and a reboot "fixed" it since it's some kind of udev race. Have you tried a reboot yet?

@CRWdog193
Copy link
Author

Someone in IRC had the same issue and a reboot "fixed" it since it's some kind of udev race. Have you tried a reboot yet?

Yup, pretty much had to; until rebooted, was unable to re-import the pool by-id; same "no such pool or dataset" message, IIRC.

The export seemed to work fine - based merely on a quick glance at the strings in zpool.cache, it seemed to contain all the expected "by-id" drive indentifiers. But the pool wouldn't import by-id.

After re-boot, the pool could be exported & imported as expected..

Went through a similar exercise with 6.3.2 or thereabouts (because I originally built the pool with the drive identifiers). Didn't see this problem then; straight export/import took care of it.

Only difference (other than ZFS version) is that the first time, all drives were already loaded. This time around, I (initially) populated only the necessary slots in the hot-swap bays to build the pool. By hot-plugging them; in retrospect, a udev race would make a lot of sense...

@ilovezfs
Copy link
Contributor

ilovezfs commented Jan 1, 2016

The export seemed to work fine - based merely on a quick glance at the strings in zpool.cache, it seemed to contain all the expected "by-id" drive identifiers. But the pool wouldn't import by-id.

As an aside, in the future, to dump zpool.cache in human readable format, just run zdb with no arguments.

Also, it's worth mentioning that your description of the zpool.cache contents and the idea that "export worked fine" contradict each other. Upon any export, the zpool.cache file is rewritten with all the information regarding the just exported pool expunged.

@dasjoe
Copy link
Contributor

dasjoe commented Mar 14, 2016

Duplicate of #3708.

behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 22, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
ryao pushed a commit to ClusterHQ/zfs that referenced this issue Jun 7, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants