Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

deadlock running zpool attach or zpool replace #9256

Closed
gyakovlev opened this issue Aug 30, 2019 · 4 comments
Closed

deadlock running zpool attach or zpool replace #9256

gyakovlev opened this issue Aug 30, 2019 · 4 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@gyakovlev
Copy link
Contributor

gyakovlev commented Aug 30, 2019

System information

Type Version/Name
Distribution Name Gentoo
Distribution Version ~ppc64le
Linux Kernel 5.2.10-gentoo
Architecture ppc64le
ZFS Version 0.8.0-217_ge6cebbf8
SPL Version 0.8.0-217_ge6cebbf8

Describe the problem you're observing

system is ppc64le, running with 64k pages. 512G memory.

as soon as I run zpool attach zroot /dev/sda3 /dev/nvme0n1p3 (not the actual command, I use by-id links, simplified for readability), all IO on the system hangs.

I have a mirror rootfs pool consisting of 2 sata ssd drives.
Also I have couple of nvme drives I'd like to attach to pool.

After attaching nvme devices plan is to remove sata devices from the pool.
This is the plan:

zpool attach zroot /dev/sda3 /dev/nvme0n1p3
zpool attach zroot /dev/sdb3 /dev/nvme1n1p3
zpool remove zroot /dev/sda3
zpool remove zroot /dev/sdb3

but as soon as I run zpool attach zroot /dev/sda3 /dev/nvme0n1p3 (not the actual command, I use by-id links, simplified for readability), all IO on the system hangs.
I still can write non-zfs filesystems, but eventually every read hangs, because system is on that pool.

Describe how to reproduce the problem

running zpool attach zroot /dev/sda3 /dev/nvme0n1p3 or
zpool replace zroot /dev/old /dev/new is enough to completely deadlock the system.

Include any warning/errors/backtraces from the system logs

nothing ends up in zpool history.

NAME                                                       USED  AVAIL     REFER  MOUNTPOINT
zroot                                                     18.5G   105G       96K  none

I'm at 0.8.1 feature set, no new things enabled.

NAME   PROPERTY                       VALUE                          SOURCE
zroot  size                           127G                           -
zroot  capacity                       14%                            -
zroot  altroot                        -                              default
zroot  health                         ONLINE                         -
zroot  guid                           14075524484036841250           -
zroot  version                        -                              default
zroot  bootfs                         -                              default
zroot  delegation                     on                             default
zroot  autoreplace                    off                            default
zroot  cachefile                      -                              default
zroot  failmode                       wait                           default
zroot  listsnapshots                  off                            default
zroot  autoexpand                     off                            default
zroot  dedupratio                     1.00x                          -
zroot  free                           108G                           -
zroot  allocated                      18.5G                          -
zroot  readonly                       off                            -
zroot  ashift                         12                             local
zroot  comment                        -                              default
zroot  expandsize                     -                              -
zroot  freeing                        0                              -
zroot  fragmentation                  14%                            -
zroot  leaked                         0                              -
zroot  multihost                      off                            default
zroot  checkpoint                     -                              -
zroot  load_guid                      18211364347063391026           -
zroot  autotrim                       on                             local
zroot  feature@async_destroy          enabled                        local
zroot  feature@empty_bpobj            active                         local
zroot  feature@lz4_compress           active                         local
zroot  feature@multi_vdev_crash_dump  enabled                        local
zroot  feature@spacemap_histogram     active                         local
zroot  feature@enabled_txg            active                         local
zroot  feature@hole_birth             active                         local
zroot  feature@extensible_dataset     active                         local
zroot  feature@embedded_data          active                         local
zroot  feature@bookmarks              enabled                        local
zroot  feature@filesystem_limits      enabled                        local
zroot  feature@large_blocks           enabled                        local
zroot  feature@large_dnode            active                         local
zroot  feature@sha512                 enabled                        local
zroot  feature@skein                  enabled                        local
zroot  feature@edonr                  enabled                        local
zroot  feature@userobj_accounting     active                         local
zroot  feature@encryption             enabled                        local
zroot  feature@project_quota          active                         local
zroot  feature@device_removal         enabled                        local
zroot  feature@obsolete_counts        enabled                        local
zroot  feature@zpool_checkpoint       enabled                        local
zroot  feature@spacemap_v2            active                         local
zroot  feature@allocation_classes     enabled                        local
zroot  feature@resilver_defer         enabled                        local
zroot  feature@bookmark_v2            enabled                        local
zroot  feature@redaction_bookmarks    disabled                       local
zroot  feature@redacted_datasets      disabled                       local
zroot  feature@bookmark_written       disabled                       local
zroot  feature@log_spacemap           disabled                       local
zroot  feature@livelist               disabled                       local
INFO: task txg_sync:2541 blocked for more than 122 seconds.
      Tainted: P           O    T 5.2.10-gentoo #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync        D    0  2540      2 0x00000808
Call Trace:
[c000003fce40b5c0] [c000003fce40b610] 0xc000003fce40b610 (unreliable)
[c000003fce40b7a0] [c00000000001ec9c] __switch_to+0x2ec/0x460
[c000003fce40b800] [c0000000007850f0] __schedule+0x230/0x640
[c000003fce40b8c0] [c00000000078553c] schedule+0x3c/0x100
[c000003fce40b8f0] [c00800000dd90694] cv_wait_common+0x23c/0x450 [spl]
[c000003fce40b9c0] [c008000013430480] spa_config_enter+0x1e8/0x350 [zfs]
[c000003fce40ba80] [c00800001343a138] spa_txg_history_fini_io+0x70/0x348 [zfs]
[c000003fce40bb80] [c00800001344090c] txg_sync_thread+0x484/0x670 [zfs]
[c000003fce40bd20] [c00800000dd9f748] thread_generic_wrapper+0xb0/0x130 [spl]
[c000003fce40bdb0] [c0000000000e94ac] kthread+0x18c/0x1a0
[c000003fce40be20] [c00000000000bc94] ret_from_kernel_thread+0x5c/0x68
INFO: task mmp:2541 blocked for more than 122 seconds.
      Tainted: P           O    T 5.2.10-gentoo #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
mmp             D    0  2541      2 0x00000800
Call Trace:
[c000003fce41b870] [c00000000001ec9c] __switch_to+0x2ec/0x460
[c000003fce41b8d0] [c0000000007850f0] __schedule+0x230/0x640
[c000003fce41b990] [c00000000078553c] schedule+0x3c/0x100
[c000003fce41b9c0] [c00800000dd90694] cv_wait_common+0x23c/0x450 [spl]
[c000003fce41ba90] [c008000013430480] spa_config_enter+0x1e8/0x350 [zfs]
[c000003fce41bb50] [c008000013445490] vdev_count_leaves+0x38/0x80 [zfs]
[c000003fce41bb90] [c0080000133fb398] mmp_thread+0x370/0xab0 [zfs]
[c000003fce41bd20] [c00800000dd9f748] thread_generic_wrapper+0xb0/0x130 [spl]
[c000003fce41bdb0] [c0000000000e94ac] kthread+0x18c/0x1a0
[c000003fce41be20] [c00000000000bc94] ret_from_kernel_thread+0x5c/0x68
INFO: task zed:7338 blocked for more than 122 seconds.
      Tainted: P           O    T 5.2.10-gentoo #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zed             D    0  7338      1 0x00040000
Call Trace:
[c000003f741878d0] [c00000000001ec9c] __switch_to+0x2ec/0x460
[c000003f74187930] [c0000000007850f0] __schedule+0x230/0x640
[c000003f741879f0] [c00000000078553c] schedule+0x3c/0x100
[c000003f74187a20] [c000000000785ae8] schedule_preempt_disabled+0x18/0x30
[c000003f74187a40] [c000000000787d6c] __mutex_lock.isra.0+0x2dc/0x710
[c000003f74187ae0] [c008000013428674] spa_all_configs+0x7c/0x260 [zfs]
[c000003f74187b80] [c0080000134b4030] zfs_ioc_pool_configs+0x28/0xd0 [zfs]
[c000003f74187bb0] [c0080000134bcef4] zfsdev_ioctl+0xb9c/0xf90 [zfs]
[c000003f74187d00] [c0000000002ee53c] do_vfs_ioctl+0x9ac/0xc60
[c000003f74187db0] [c0000000002ee8a4] ksys_ioctl+0xb4/0x100
[c000003f74187e00] [c0000000002ee910] sys_ioctl+0x20/0x80
[c000003f74187e20] [c00000000000b8ac] system_call+0x5c/0x70
INFO: task zpool:128406 blocked for more than 122 seconds.
      Tainted: P           O    T 5.2.10-gentoo #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool           D    0 128406 126688 0x00040008
Call Trace:
[c000002e0b5e6730] [c00000000001ec9c] __switch_to+0x2ec/0x460
[c000002e0b5e6790] [c0000000007850f0] __schedule+0x230/0x640
[c000002e0b5e6850] [c00000000078553c] schedule+0x3c/0x100
[c000002e0b5e6880] [c00800000dd90694] cv_wait_common+0x23c/0x450 [spl]
[c000002e0b5e6950] [c008000013430480] spa_config_enter+0x1e8/0x350 [zfs]
[c000002e0b5e6a10] [c0080000134f07e4] zfs_blkptr_verify+0x33c/0x4f0 [zfs]
[c000002e0b5e6ab0] [c0080000134f8fc4] zio_read+0x6c/0x140 [zfs]
[c000002e0b5e6ba0] [c008000013338688] arc_read+0x8d0/0x20b0 [zfs]
[c000002e0b5e6d00] [c008000013353c84] dbuf_read_impl.constprop.0+0x2dc/0xea0 [zfs]
[c000002e0b5e6e80] [c008000013354afc] dbuf_read+0x2b4/0x7f0 [zfs]
[c000002e0b5e6f60] [c008000013367aac] dmu_buf_hold_array_by_dnode+0x194/0x710 [zfs]
[c000002e0b5e7050] [c00800001336a158] dmu_read_uio_dnode+0x70/0x1c0 [zfs]
[c000002e0b5e7110] [c00800001336a324] dmu_read_uio_dbuf+0x7c/0xc0 [zfs]
[c000002e0b5e7150] [c0080000134d9b28] zfs_read+0x170/0x5e0 [zfs]
[c000002e0b5e7240] [c00800001350e964] zpl_read_common_iovec+0xac/0x1d0 [zfs]
[c000002e0b5e7320] [c00800001350eb9c] zpl_iter_read+0x114/0x1e0 [zfs]
[c000002e0b5e7400] [c0000000002cc6a4] new_sync_read+0x164/0x1f0
[c000002e0b5e74b0] [c0000000002cf66c] vfs_read+0xfc/0x1e0
[c000002e0b5e7500] [c0000000002cf7a0] kernel_read+0x50/0x90
[c000002e0b5e7530] [c00800000dda2934] vn_rdwr+0x10c/0x210 [spl]
[c000002e0b5e75d0] [c00800000dd97448] kobj_read_file+0x60/0xc0 [spl]
[c000002e0b5e7660] [c00800000dd921ac] zone_get_hostid+0x104/0x180 [spl]
[c000002e0b5e76f0] [c008000013438844] spa_get_hostid+0x1c/0x38 [zfs]
[c000002e0b5e7710] [c0080000134289f8] spa_config_generate+0x1a0/0x610 [zfs]
[c000002e0b5e77e0] [c008000013464210] vdev_label_init+0x1b8/0xc80 [zfs]
[c000002e0b5e7910] [c0080000134640f8] vdev_label_init+0xa0/0xc80 [zfs]
[c000002e0b5e7a40] [c008000013452490] vdev_create+0x98/0xe0 [zfs]
[c000002e0b5e7a80] [c0080000134247e4] spa_vdev_attach+0x14c/0xb40 [zfs]
[c000002e0b5e7b50] [c0080000134b0404] zfs_ioc_vdev_attach+0xec/0x120 [zfs]
[c000002e0b5e7bb0] [c0080000134bcef4] zfsdev_ioctl+0xb9c/0xf90 [zfs]
[c000002e0b5e7d00] [c0000000002ee53c] do_vfs_ioctl+0x9ac/0xc60
[c000002e0b5e7db0] [c0000000002ee8a4] ksys_ioctl+0xb4/0x100
[c000002e0b5e7e00] [c0000000002ee910] sys_ioctl+0x20/0x80
[c000002e0b5e7e20] [c00000000000b8ac] system_call+0x5c/0x70
@gyakovlev
Copy link
Contributor Author

just to add some info, system at time of attempt is 99.999 idle, not any serious IO, except usual background daemons like syslog, cron etc.
system has 44 cores, 176 threads.

the error triggers every time, very reproducible.
if I create pool on NVME drives it operates normally.
I will try attaching devices non-root pool and see if it hangs.

@loli10K
Copy link
Contributor

loli10K commented Sep 1, 2019

It seems this was accidentally introduced with dc04a8c, we take a read lock in zfs_blkptr_verify() while already holding a write lock taken in spa_vdev_attach() -> spa_vdev_enter() -> spa_vdev_config_enter():

INFO: task zpool:128406 blocked for more than 122 seconds.
      Tainted: P           O    T 5.2.10-gentoo #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zpool           D    0 128406 126688 0x00040008
Call Trace:
[c000002e0b5e6730] [c00000000001ec9c] __switch_to+0x2ec/0x460
[c000002e0b5e6790] [c0000000007850f0] __schedule+0x230/0x640
[c000002e0b5e6850] [c00000000078553c] schedule+0x3c/0x100
[c000002e0b5e6880] [c00800000dd90694] cv_wait_common+0x23c/0x450 [spl]
[c000002e0b5e6950] [c008000013430480] spa_config_enter+0x1e8/0x350 [zfs]
[c000002e0b5e6a10] [c0080000134f07e4] zfs_blkptr_verify+0x33c/0x4f0 [zfs]   <--- trying read lock
[c000002e0b5e6ab0] [c0080000134f8fc4] zio_read+0x6c/0x140 [zfs]
[c000002e0b5e6ba0] [c008000013338688] arc_read+0x8d0/0x20b0 [zfs]
[c000002e0b5e6d00] [c008000013353c84] dbuf_read_impl.constprop.0+0x2dc/0xea0 [zfs]
[c000002e0b5e6e80] [c008000013354afc] dbuf_read+0x2b4/0x7f0 [zfs]
[c000002e0b5e6f60] [c008000013367aac] dmu_buf_hold_array_by_dnode+0x194/0x710 [zfs]
[c000002e0b5e7050] [c00800001336a158] dmu_read_uio_dnode+0x70/0x1c0 [zfs]
[c000002e0b5e7110] [c00800001336a324] dmu_read_uio_dbuf+0x7c/0xc0 [zfs]
[c000002e0b5e7150] [c0080000134d9b28] zfs_read+0x170/0x5e0 [zfs]
[c000002e0b5e7240] [c00800001350e964] zpl_read_common_iovec+0xac/0x1d0 [zfs]
[c000002e0b5e7320] [c00800001350eb9c] zpl_iter_read+0x114/0x1e0 [zfs]
[c000002e0b5e7400] [c0000000002cc6a4] new_sync_read+0x164/0x1f0
[c000002e0b5e74b0] [c0000000002cf66c] vfs_read+0xfc/0x1e0
[c000002e0b5e7500] [c0000000002cf7a0] kernel_read+0x50/0x90
[c000002e0b5e7530] [c00800000dda2934] vn_rdwr+0x10c/0x210 [spl]
[c000002e0b5e75d0] [c00800000dd97448] kobj_read_file+0x60/0xc0 [spl]
[c000002e0b5e7660] [c00800000dd921ac] zone_get_hostid+0x104/0x180 [spl]
[c000002e0b5e76f0] [c008000013438844] spa_get_hostid+0x1c/0x38 [zfs]
[c000002e0b5e7710] [c0080000134289f8] spa_config_generate+0x1a0/0x610 [zfs]
[c000002e0b5e77e0] [c008000013464210] vdev_label_init+0x1b8/0xc80 [zfs]
[c000002e0b5e7910] [c0080000134640f8] vdev_label_init+0xa0/0xc80 [zfs]
[c000002e0b5e7a40] [c008000013452490] vdev_create+0x98/0xe0 [zfs]
[c000002e0b5e7a80] [c0080000134247e4] spa_vdev_attach+0x14c/0xb40 [zfs]    <--- grabbed write lock
[c000002e0b5e7b50] [c0080000134b0404] zfs_ioc_vdev_attach+0xec/0x120 [zfs]
[c000002e0b5e7bb0] [c0080000134bcef4] zfsdev_ioctl+0xb9c/0xf90 [zfs]
[c000002e0b5e7d00] [c0000000002ee53c] do_vfs_ioctl+0x9ac/0xc60
[c000002e0b5e7db0] [c0000000002ee8a4] ksys_ioctl+0xb4/0x100
[c000002e0b5e7e00] [c0000000002ee910] sys_ioctl+0x20/0x80
[c000002e0b5e7e20] [c00000000000b8ac] system_call+0x5c/0x70

@gyakovlev you should be able to workaround this temporarily booting with spl.spl_hostid=<your-hostid> added to the kernel cmdline.

@gyakovlev
Copy link
Contributor Author

gyakovlev commented Sep 1, 2019

@loli10K I confirm workaround works, was able to attach partition normally after booting with spl.spl_hostid=0x<id>
Thanks!

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Sep 3, 2019
@behlendorf
Copy link
Contributor

@loli10K thanks for looking in to this. I've proposed a fix for the issue in #9285. @gyakovlev if it's not too much to ask, it would be great if you could verify it resolves the issue without the need for the suggested workaround.

mattmacy pushed a commit to zfsonfreebsd/ZoF that referenced this issue Sep 10, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 17, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 18, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 19, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Sep 23, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit that referenced this issue Sep 26, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9256
Closes #9285
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

3 participants