Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix /etc/hostid on root pool deadlock #9285

Merged
merged 1 commit into from
Sep 10, 2019

Conversation

behlendorf
Copy link
Contributor

@behlendorf behlendorf commented Sep 4, 2019

Motivation and Context

Resolve a regression reported in #9256.

Description

Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify(). A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool. This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer. For example, to perform a
zpool attach as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifing the multihost property. The cached value
is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

How Has This Been Tested?

New test case added which exercises the relevant code paths
when an /etc/hostid file exists in a ZFS dataset. Additionally, all
existing mmp tests were run and pass.

Pending CI results for a full run of the test suite.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Sep 4, 2019
@codecov
Copy link

codecov bot commented Sep 5, 2019

Codecov Report

Merging #9285 into master will decrease coverage by 0.1%.
The diff coverage is 91.66%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9285      +/-   ##
==========================================
- Coverage   79.12%   79.02%   -0.11%     
==========================================
  Files         401      401              
  Lines      122063   122066       +3     
==========================================
- Hits        96587    96462     -125     
- Misses      25476    25604     +128
Flag Coverage Δ
#kernel 79.75% <91.66%> (ø) ⬆️
#user 66.7% <50%> (-0.07%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4bbf047...7c8d049. Read the comment docs.

@loli10K
Copy link
Contributor

loli10K commented Sep 6, 2019

Caching the hostid seems a nice way to handle this. I had prepared a builder with root-on-zfs to investigate this issue but unfortunately i am away and can't test or review this util next monday. IIRC i had trouble reproducing the issue without a zfs rootfs, i would like to verify if the new test case triggers the issue on a system without the fix applied.

@behlendorf
Copy link
Contributor Author

No problem, whenever you get to it would be great. Thanks for making the time to look at this.

IIRC i had trouble reproducing the issue without a zfs rootfs,

You're not alone, I ended up running in to this same issue. For some reason, I also wasn't able to reproduce the issue using the test case included in this PR but without the fix applied. I ran out of time to investigate further, but felt it was worth including the test case regardless and opening the PR for review. It would be ideal if we could tweak the test case or code in such a way to reproduce the reported issue.

@loli10K
Copy link
Contributor

loli10K commented Sep 9, 2019

Found out why the reproducer does not work on my builder: when we go read the hostid on a zfs dataset it may be already cached in memory, we never call dbuf_read_impl() -> ... -> zfs_blkptr_verify() taking the (db->db_state == DB_CACHED) branch in dbuf_read() which prevents the deadlock condition:

(gdb) bt
#0  dbuf_read (db=0xffff8801399455f0, zio=0xffff8800b8bcf550, flags=30) at /usr/src/zfs/module/zfs/dbuf.c:1557
#1  0xffffffffc0246ec8 in dmu_buf_hold_array_by_dnode (dn=0xffff8801399455f0, offset=<optimized out>, length=<optimized out>, read=B_FALSE, tag=<optimized out>, numbufsp=<optimized out>, 
    dbpp=0xffff880138a778a0, flags=0) at /usr/src/zfs/module/zfs/dmu.c:545
#2  0xffffffffc0249524 in dmu_read_uio_dnode (dn=<optimized out>, uio=0xffff880138a77998, size=4) at /usr/src/zfs/module/zfs/dmu.c:1330
#3  0xffffffffc0249636 in dmu_read_uio_dbuf (zdb=0xffff8800b85077e8, uio=0xffff880138a77998, size=4) at /usr/src/zfs/module/zfs/dmu.c:1395
#4  0xffffffffc0356b62 in zfs_read (ip=<optimized out>, uio=<optimized out>, ioflag=<optimized out>, cr=<optimized out>) at /usr/src/zfs/module/zfs/zfs_vnops.c:543
#5  0xffffffffc038bfca in zpl_read_common_iovec (ip=<optimized out>, iovp=<optimized out>, count=4, nr_segs=<optimized out>, ppos=0xffff880138a77ab0, segment=<optimized out>, flags=0, cr=0xffff88003679e340, 
    skip=<optimized out>) at /usr/src/zfs/module/zfs/zpl_file.c:262
#6  0xffffffffc038c188 in zpl_iter_read_common (skip=<optimized out>, seg=<optimized out>, count=<optimized out>, nr_segs=<optimized out>, iovp=<optimized out>, kiocb=<optimized out>)
    at /usr/src/zfs/module/zfs/zpl_file.c:299
#7  zpl_iter_read (kiocb=0xffff8801399455f0, to=0xffff880138a77ad0) at /usr/src/zfs/module/zfs/zpl_file.c:330
#8  0xffffffff811eebd6 in new_sync_read (ppos=<optimized out>, len=<optimized out>, buf=<optimized out>, filp=<optimized out>) at fs/read_write.c:422
#9  __vfs_read (file=0xffff880139376f00, buf=<optimized out>, count=<optimized out>, pos=0xffff880138a77b60) at fs/read_write.c:434
#10 0xffffffff811ef116 in vfs_read (file=0xffff880139376f00, buf=0xffff880138a77bf4 "", count=<optimized out>, pos=0xffff880138a77b60) at fs/read_write.c:454
#11 0xffffffffc00f34da in spl_kernel_read (pos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>) at /usr/src/zfs/module/spl/spl-vnode.c:108
#12 vn_rdwr (uio=<optimized out>, vp=<optimized out>, addr=0xffff880138a77bf4, len=4, off=<optimized out>, seg=<optimized out>, ioflag=0, x2=0, x3=0x0 <irq_stack_union>, residp=0xffff880138a77bc8)
    at /usr/src/zfs/module/spl/spl-vnode.c:303
#13 0xffffffffc00ec16c in kobj_read_file (file=<optimized out>, buf=<optimized out>, size=4, off=<optimized out>) at /usr/src/zfs/module/spl/spl-kobj.c:64
#14 0xffffffffc00e8b58 in hostid_read (hostid=<optimized out>) at /usr/src/zfs/module/spl/spl-generic.c:588
#15 zone_get_hostid (zone=<optimized out>) at /usr/src/zfs/module/spl/spl-generic.c:615
#16 0xffffffffc02e69d0 in spa_get_hostid () at /usr/src/zfs/module/zfs/spa_misc.c:2569
#17 0xffffffffc02db4d6 in spa_config_generate (spa=0xffff8801399455f0, vd=0xffff8800b1b34000, txg=30, getstats=0) at /usr/src/zfs/module/zfs/spa_config.c:460
#18 0xffffffffc02d41d3 in spa_open_common (pool=<optimized out>, spapp=0xffff8800b8bcf550, tag=<optimized out>, nvpolicy=<optimized out>, config=0xffff8801399455f0) at /usr/src/zfs/module/zfs/spa.c:5089
#19 0xffffffffc02d4697 in spa_get_stats (name=0xffff88013a3f0000 "testpool", config=0xffff880138a77dc8, altroot=0xffff88013a3f1030 "", buflen=<optimized out>) at /usr/src/zfs/module/zfs/spa.c:5354
#20 0xffffffffc033f929 in zfs_ioc_pool_stats (zc=0xffff88013a3f0000) at /usr/src/zfs/module/zfs/zfs_ioctl.c:1677
#21 0xffffffffc03494a3 in zfsdev_ioctl (filp=<optimized out>, cmd=<optimized out>, arg=<optimized out>) at /usr/src/zfs/module/zfs/zfs_ioctl.c:7502
#22 0xffffffff81202d74 in vfs_ioctl (arg=<optimized out>, cmd=<optimized out>, filp=<optimized out>) at fs/ioctl.c:43
#23 do_vfs_ioctl (filp=0xffff880139376a00, fd=<optimized out>, cmd=<optimized out>, arg=140737391931840) at fs/ioctl.c:607
#24 0xffffffff81202fc9 in SYSC_ioctl (arg=<optimized out>, cmd=<optimized out>, fd=<optimized out>) at fs/ioctl.c:622
#25 SyS_ioctl (fd=3, cmd=23045, arg=140737391931840) at fs/ioctl.c:613
#26 0xffffffff818096f6 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:185
#27 0x00007ffffa40e7bc in ?? ()
#28 0x0000000001563f90 in ?? ()
#29 0x0000000001562060 in ?? ()
#30 0x0000000001562060 in ?? ()
#31 0x00007ffffa40b170 in ?? ()
#32 0x00007ffffa40b1c0 in ?? ()
#33 0x0000000000000202 in irq_stack_union ()
(gdb) p db->db_state
$1 = DB_CACHED

Forcing primarycache=none and secondarycache=none on the dataset containing the hostid file seems to trigger the original issue reliably, zfs_blkptr_verify() is called with config_held=B_FALSE but spa->spa_config_lock[0]->scl_writer->comm == "zpool" which results in a deadlock:

(gdb) bt
#0  zfs_blkptr_verify (spa=0xffff8800b8584000, bp=0xffff880139d6b5f0, config_held=B_FALSE) at /usr/src/zfs/module/zfs/zio.c:886
#1  0xffffffffc036f3a2 in zio_read (pio=0xffff8800b80bfa98, spa=0xffff8800b8584000, bp=0xffff880139d6b5f0, data=0xffff8800b80c1598, size=<optimized out>, done=<optimized out>, private=0xffff8800b87ad438, 
    priority=ZIO_PRIORITY_SYNC_READ, flags=ZIO_FLAG_CANFAIL, zb=0xffff880139d6b5d0) at /usr/src/zfs/module/zfs/zio.c:1017
#2  0xffffffffc0217ec3 in arc_read (pio=0xffff8800b80bfa98, spa=0xffff8800b8584000, bp=0xffff880139d6b5f0, done=0xffffffffc0229200 <dbuf_read_done>, private=0xffff8800361009e0, priority=ZIO_PRIORITY_SYNC_READ, 
    zio_flags=128, arc_flags=0xffff880139d6b5cc, zb=0xffff880139d6b5d0) at /usr/src/zfs/module/zfs/arc.c:6619
#3  0xffffffffc022cdab in dbuf_read_impl (db=0xffff8800361009e0, zio=0xffff8800b80bfa98, flags=0, dblt=DLT_PARENT, tag=<optimized out>) at /usr/src/zfs/module/zfs/dbuf.c:1473
#4  0xffffffffc022d84c in dbuf_read (db=0xffff8800361009e0, zio=0xffff8800b80bfa98, flags=30) at /usr/src/zfs/module/zfs/dbuf.c:1626
#5  0xffffffffc023bec8 in dmu_buf_hold_array_by_dnode (dn=0xffff8800b8584000, offset=<optimized out>, length=<optimized out>, read=(unknown: 3087799704), tag=<optimized out>, numbufsp=<optimized out>, 
    dbpp=0xffff880139d6b7b8, flags=0) at /usr/src/zfs/module/zfs/dmu.c:545
#6  0xffffffffc023e524 in dmu_read_uio_dnode (dn=<optimized out>, uio=0xffff880139d6b8b0, size=4) at /usr/src/zfs/module/zfs/dmu.c:1330
#7  0xffffffffc023e636 in dmu_read_uio_dbuf (zdb=0xffff880139c089e0, uio=0xffff880139d6b8b0, size=4) at /usr/src/zfs/module/zfs/dmu.c:1395
#8  0xffffffffc034bb62 in zfs_read (ip=<optimized out>, uio=<optimized out>, ioflag=<optimized out>, cr=<optimized out>) at /usr/src/zfs/module/zfs/zfs_vnops.c:543
#9  0xffffffffc0380fca in zpl_read_common_iovec (ip=<optimized out>, iovp=<optimized out>, count=4, nr_segs=<optimized out>, ppos=0xffff880139d6b9c8, segment=<optimized out>, flags=0, cr=0xffff880139eec700, 
    skip=<optimized out>) at /usr/src/zfs/module/zfs/zpl_file.c:262
#10 0xffffffffc0381188 in zpl_iter_read_common (skip=<optimized out>, seg=<optimized out>, count=<optimized out>, nr_segs=<optimized out>, iovp=<optimized out>, kiocb=<optimized out>)
    at /usr/src/zfs/module/zfs/zpl_file.c:299
#11 zpl_iter_read (kiocb=0xffff8800b8584000, to=0xffff880139d6b9e8) at /usr/src/zfs/module/zfs/zpl_file.c:330
#12 0xffffffff811eebd6 in new_sync_read (ppos=<optimized out>, len=<optimized out>, buf=<optimized out>, filp=<optimized out>) at fs/read_write.c:422
#13 __vfs_read (file=0xffff8800ba4c8200, buf=<optimized out>, count=<optimized out>, pos=0xffff880139d6ba78) at fs/read_write.c:434
#14 0xffffffff811ef116 in vfs_read (file=0xffff8800ba4c8200, buf=0xffff880139d6bb0c "", count=<optimized out>, pos=0xffff880139d6ba78) at fs/read_write.c:454
#15 0xffffffffc00d04da in spl_kernel_read (pos=<optimized out>, count=<optimized out>, buf=<optimized out>, file=<optimized out>) at /usr/src/zfs/module/spl/spl-vnode.c:108
#16 vn_rdwr (uio=<optimized out>, vp=<optimized out>, addr=0xffff880139d6bb0c, len=4, off=<optimized out>, seg=<optimized out>, ioflag=0, x2=0, x3=0x0 <irq_stack_union>, residp=0xffff880139d6bae0)
    at /usr/src/zfs/module/spl/spl-vnode.c:303
#17 0xffffffffc00c916c in kobj_read_file (file=<optimized out>, buf=<optimized out>, size=4, off=<optimized out>) at /usr/src/zfs/module/spl/spl-kobj.c:64
#18 0xffffffffc00c5b58 in hostid_read (hostid=<optimized out>) at /usr/src/zfs/module/spl/spl-generic.c:588
#19 zone_get_hostid (zone=<optimized out>) at /usr/src/zfs/module/spl/spl-generic.c:615
#20 0xffffffffc02db9d0 in spa_get_hostid () at /usr/src/zfs/module/zfs/spa_misc.c:2569
#21 0xffffffffc02d04d6 in spa_config_generate (spa=0xffff8800b8584000, vd=0xffff880139568000, txg=0, getstats=-1207167592) at /usr/src/zfs/module/zfs/spa_config.c:460
#22 0xffffffffc02fbfcb in vdev_label_init (vd=0xffff880139568000, crtxg=15, reason=VDEV_LABEL_CREATE) at /usr/src/zfs/module/zfs/vdev_label.c:1089
#23 0xffffffffc02fbee2 in vdev_label_init (vd=0xffff8800b8b68000, crtxg=15, reason=VDEV_LABEL_CREATE) at /usr/src/zfs/module/zfs/vdev_label.c:973
#24 0xffffffffc02ee9fe in vdev_create (vd=0xffff8800b8b68000, txg=15, isreplacing=B_FALSE) at /usr/src/zfs/module/zfs/vdev.c:2318
#25 0xffffffffc02cd76b in spa_vdev_attach (spa=0xffff8800b8584000, guid=<optimized out>, nvroot=<optimized out>, replacing=0) at /usr/src/zfs/module/zfs/spa.c:6557
#26 0xffffffffc033519c in zfs_ioc_vdev_attach (zc=0xffff8800b87e4000) at /usr/src/zfs/module/zfs/zfs_ioctl.c:2001
#27 0xffffffffc033e4a3 in zfsdev_ioctl (filp=<optimized out>, cmd=<optimized out>, arg=<optimized out>) at /usr/src/zfs/module/zfs/zfs_ioctl.c:7502
#28 0xffffffff81202d74 in vfs_ioctl (arg=<optimized out>, cmd=<optimized out>, filp=<optimized out>) at fs/ioctl.c:43
#29 do_vfs_ioctl (filp=0xffff880036046d00, fd=<optimized out>, cmd=<optimized out>, arg=140722226413312) at fs/ioctl.c:607
#30 0xffffffff81202fc9 in SYSC_ioctl (arg=<optimized out>, cmd=<optimized out>, fd=<optimized out>) at fs/ioctl.c:622
#31 SyS_ioctl (fd=3, cmd=23054, arg=140722226413312) at fs/ioctl.c:613
#32 0xffffffff818096f6 in entry_SYSCALL_64 () at arch/x86/entry/entry_64.S:185
#33 0x0000000000000000 in ?? ()
(gdb) p spa->spa_config_lock[0]->scl_writer->comm 
$10 = "zpool\000\000)\000ot\000\000\000\000"
(gdb) fra 4
#4  0xffffffffc022d84c in dbuf_read (db=0xffff8800361009e0, zio=0xffff8800b80bfa98, flags=30) at /usr/src/zfs/module/zfs/dbuf.c:1626
1626			err = dbuf_read_impl(db, zio, flags, dblt, FTAG);
(gdb) p db->db_state
$11 = DB_READ

Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifing the multihost property.  The cached value
is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Sep 10, 2019
@behlendorf behlendorf merged commit 25f06d6 into openzfs:master Sep 10, 2019
mattmacy pushed a commit to zfsonfreebsd/ZoF that referenced this pull request Sep 10, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 17, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 19, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 23, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#9256 
Closes openzfs#9285
tonyhutter pushed a commit that referenced this pull request Sep 26, 2019
Accidentally introduced by dc04a8c which now takes the SCL_VDEV lock
as a reader in zfs_blkptr_verify().  A deadlock can occur if the
/etc/hostid file resides on a dataset in the same pool.  This is
because reading the /etc/hostid file may occur while the caller is
holding the SCL_VDEV lock as a writer.  For example, to perform a
`zpool attach` as shown in the abbreviated stack below.

To resolve the issue we cache the system's hostid when initializing
the spa_t, or when modifying the multihost property.  The cached
value is then relied upon for subsequent accesses.

Call Trace:
    spa_config_enter+0x1e8/0x350 [zfs]
    zfs_blkptr_verify+0x33c/0x4f0 [zfs] <--- trying read lock
    zio_read+0x6c/0x140 [zfs]
    ...
    vfs_read+0xfc/0x1e0
    kernel_read+0x50/0x90
    ...
    spa_get_hostid+0x1c/0x38 [zfs]
    spa_config_generate+0x1a0/0x610 [zfs]
    vdev_label_init+0xa0/0xc80 [zfs]
    vdev_create+0x98/0xe0 [zfs]
    spa_vdev_attach+0x14c/0xb40 [zfs] <--- grabbed write lock

Reviewed-by: loli10K <ezomori.nozomu@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #9256
Closes #9285
@behlendorf behlendorf deleted the issue-9265 branch April 19, 2021 19:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants