Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zpool add fails with disks referenced by id #4077

Closed
hadees opened this issue Dec 8, 2015 · 4 comments
Closed

zpool add fails with disks referenced by id #4077

hadees opened this issue Dec 8, 2015 · 4 comments

Comments

@hadees
Copy link

hadees commented Dec 8, 2015

@ryao told me to file a bug on this.

I ran the command

zpool add -f -o ashift=12 tank raidz2 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar1 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar2 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar3 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar4 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar5 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar6 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar7 /dev/disk/by-id/ata-WDC_WD30EFRX-68EUZN0_WD-foobar8

It then failed with

cannot add to 'tank': one or more devices is currently unavailable

however all the devices were available and it seems to have partitioned them. Anyway on @dasjoe's advice I reran the command with just regular /dev/sd* and it worked.

zpool add -f -o ashift=12 tank raidz2 /dev/sds /dev/sdt /dev/sdu /dev/sdv /dev/sdw /dev/sdx /dev/sdy /dev/sdz

After that I was able to export and reimport by id.

According to @dasjoe there seems to be a race condition but he could probably speak more to that than me. Let me know if you need further information.

@ryao
Copy link
Contributor

ryao commented Dec 8, 2015

From IRC:

22:59 < hadees> For some reason I can't add a new vdev to my pool, it keeps saying "cannot add to 'tank': one or more devices is currently unavailable"
22:59 < hadees> All the devices seem to be there and aren't doing anything
23:00 <+ryao> hadees: What is the command?
23:00 < hadees> zpool add -f -o ashift=12 tank raidz2 /dev/disk/by-id/foobar etc.
23:01 < hadees> ryao: obviously with my own 8 disks at the end
23:01 < hadees> these are new disks if that matters
23:01 < dasjoe> hadees: try with /dev/sdX instead of by-id/
23:02 <+ryao> hadees: This sounds like a bug. 
23:02 < dasjoe> ryao: there seems to be a race condition with udev, I hit it a few days ago
23:02 <+ryao> Which distribution is this?
23:02 < dasjoe> ryao: I *think* by-id/ gets populated too late after zfs puts a GPT on a disk
23:03 < dasjoe> ryao: there are countermeasures against this, but as far as I understand the code ZFS waits until links to /dev/sdX appear, not for -part1 to become available
23:03 < dasjoe> s,code ZFS,code in ZFS,
23:03 -!- Brendon [uid17332@gateway/web/irccloud.com/x-agimfotazvplxjpv] has quit [Quit: Connection closed for inactivity]
23:04 <+ryao> dasjoe: that would be a race and makes sense.
23:05 < hadees> dasjoe: that seems to have done it
23:06 < dasjoe> ryao: I didn't read the code very carefully due to getting distracted, so my interpretation may be wrong
23:06 < hadees> I then exported and reimported by id

@hadees
Copy link
Author

hadees commented Dec 8, 2015

Here is my zfs get all

NAME  PROPERTY              VALUE                  SOURCE
tank  type                  filesystem             -
tank  creation              Thu Mar  5  0:27 2015  -
tank  used                  29.5T                  -
tank  available             15.4T                  -
tank  referenced            29.5T                  -
tank  compressratio         1.00x                  -
tank  mounted               yes                    -
tank  quota                 none                   default
tank  reservation           none                   default
tank  recordsize            128K                   default
tank  mountpoint            /mnt/raid              local
tank  sharenfs              off                    local
tank  checksum              on                     default
tank  compression           off                    local
tank  atime                 on                     default
tank  devices               on                     default
tank  exec                  on                     default
tank  setuid                on                     default
tank  readonly              off                    default
tank  zoned                 off                    default
tank  snapdir               hidden                 default
tank  aclinherit            restricted             default
tank  canmount              on                     default
tank  xattr                 on                     default
tank  copies                1                      default
tank  version               5                      -
tank  utf8only              off                    -
tank  normalization         none                   -
tank  casesensitivity       sensitive              -
tank  vscan                 off                    default
tank  nbmand                off                    default
tank  sharesmb              off                    default
tank  refquota              none                   default
tank  refreservation        none                   default
tank  primarycache          all                    default
tank  secondarycache        all                    default
tank  usedbysnapshots       0                      -
tank  usedbydataset         29.5T                  -
tank  usedbychildren        2.69G                  -
tank  usedbyrefreservation  0                      -
tank  logbias               latency                default
tank  dedup                 off                    default
tank  mlslabel              none                   default
tank  sync                  standard               default
tank  refcompressratio      1.00x                  -
tank  written               29.5T                  -
tank  logicalused           29.4T                  -
tank  logicalreferenced     29.4T                  -
tank  filesystem_limit      none                   default
tank  snapshot_limit        none                   default
tank  filesystem_count      none                   default
tank  snapshot_count        none                   default
tank  snapdev               hidden                 default
tank  acltype               off                    default
tank  context               none                   default
tank  fscontext             none                   default
tank  defcontext            none                   default
tank  rootcontext           none                   default
tank  relatime              on                     temporary
tank  redundant_metadata    all                    default
tank  overlay               off                    default

@dasjoe
Copy link
Contributor

dasjoe commented Mar 14, 2016

Actually, duplicate of #3708.

behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 20, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
behlendorf added a commit to behlendorf/zfs that referenced this issue Apr 22, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more certain
that all the device nodes will existing when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was implemented which includes a settle time.  In addition, the
kernel modules were updated to include retry logic for this
ENOENT case.  Due to the improved checks in the utilities it
is unlikely this logic will be invoked, however in the rare
event it is needed to will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Issue openzfs#3708
Issue openzfs#4077
Issue openzfs#4144
Issue openzfs#4214
Issue openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
nedbass pushed a commit to nedbass/zfs that referenced this issue May 6, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
ryao pushed a commit to ClusterHQ/zfs that referenced this issue Jun 7, 2016
When ZFS partitions a block device it must wait for udev to create
both a device node and all the device symlinks.  This process takes
a variable length of time and depends on factors such how many links
must be created, the complexity of the rules, etc.  Complicating
the situation further it is not uncommon for udev to create and
then remove a link multiple times while processing the udev rules.

Given the above, the existing scheme of waiting for an expected
partition to appear by name isn't 100% reliable.  At this point
udev may still remove and recreate think link resulting in the
kernel modules being unable to open the device.

In order to address this the zpool_label_disk_wait() function
has been updated to use libudev.  Until the registered system
device acknowledges that it in fully initialized the function
will wait.  Once fully initialized all device links are checked
and allowed to settle for 50ms.  This makes it far more likely
that all the device nodes will exist when the kernel modules
need to open them.

For systems without libudev an alternate zpool_label_disk_wait()
was updated to include a settle time.  In addition, the kernel
modules were updated to include retry logic for this ENOENT case.
Due to the improved checks in the utilities it is unlikely this
logic will be invoked.  However, if the rare event it is needed
it will prevent a failure.

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tony Hutter <hutter2@llnl.gov>
Signed-off-by: Richard Laager <rlaager@wiktel.com>
Closes openzfs#4523
Closes openzfs#3708
Closes openzfs#4077
Closes openzfs#4144
Closes openzfs#4214
Closes openzfs#4517
@jonxor
Copy link

jonxor commented Jan 27, 2017

I believe this may have been re-introduced as of 0.6.5.6-0ubuntu8 on ubuntu 16.04.
I have a Raidz2 array where a member failed. When I attempt to replace, it creates the partitions, but then replies with "no such pool or dataset".
I did attempt to reference the old disk by GUID, which had the same issue, but then worked after a few tries.

This can be reproduced by attempting "zpool replace -f poolname oldmember /dev/disk/by-id/newmember". It does not appear to occur when run without the -f flag. this means, however that we must manually clear out the partitions and EFI labels put on the drive from the first zpool replace attempt.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants