Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: segmentation fault of write throttle while import #1

Closed
wants to merge 2 commits into from

Conversation

GeLiXin
Copy link
Owner

@GeLiXin GeLiXin commented May 16, 2016

The issue
Async writes may be issued before the pool finish the process of initialization, which means we may access a NULL point of spa->spa_dsl_pool in function vdev_queue_max_async_writes illegality. typically this happens when the self-healing zio was issued by mirror while importing.

The analysis
Pool always imported with flag of FWRITE | FREAD, when we open the meta objset with ZIO error in one of our multi-DVAS(Both silent and “noisy” data corruption), the logical vdev which have mirror ops will try to use the good data it have in hand to repair damaged children. When the self-healing write zio issues to the leaf vdev, it encount the write throttle strategy which demand the statistics of dirty data in spa->spa_dsl_pool->dp_dirty_total. Unfortunately, the pointer of spa->spa_dsl_pool will keep NULL until we open the meta objset success, and then, segmentation fault comes up.

stack below will be help to understand what I say(maybe different from the newest version):
Thread 126 (Thread 4542):
#0 0x00007f5008fa016c in pthread_cond_wait@@GLIBC_2.3.2 () from /lib/libpthread.so.0
#1 0x00007f500e069bae in cv_wait () from /version/lib/lib_naslfs_dll.so
#2 0x00007f500e19ab53 in zio_wait () from /version/lib/lib_naslfs_dll.so
#3 0x00007f500e1218b0 in arc_read() from /version/lib/lib_naslfs_dll.so
#4 0x00007f500e12fda9 in dmu_objset_open_impl () from /version/lib/lib_naslfs_dll.so
#5 0x00007f500e150762 in dsl_pool_init () from /version/lib/lib_naslfs_dll.so
openzfs#6 0x00007f500e16570f in spa_load_impl.clone.10 () from /version/lib/lib_naslfs_dll.so
openzfs#7 0x00007f500e1664bf in spa_load () from /version/lib/lib_naslfs_dll.so
openzfs#8 0x00007f500e166c77 in spa_load_best () from /version/lib/lib_naslfs_dll.so
openzfs#9 0x00007f500e167724 in spa_import () from /version/lib/lib_naslfs_dll.so

Thread 115 (Thread 4531):
#0 0x00007f5008fa3cd7 in do_sigwait () from /lib/libpthread.so.0
#1 0x00007f5008fa3d57 in sigwait () from /lib/libpthread.so.0
#2 0x00007f5009532d2c in Vos_SuspendTask () from /version/lib/lib_com_dll.so
#3 signal handler called
#4 0x00007f500e182a0e in vdev_queue_io_to_issue () from /version/lib/lib_naslfs_dll.so
#5 0x00007f500e1833a6 in vdev_queue_io () from /version/lib/lib_naslfs_dll.so
openzfs#6 0x00007f500e19c315 in zio_vdev_io_start () from /version/lib/lib_naslfs_dll.so
openzfs#7 0x00007f500e197cea in zio_execute () from /version/lib/lib_naslfs_dll.so
openzfs#8 0x00007f500e181914 in vdev_mirror_io_done () from /version/lib/lib_naslfs_dll.so
openzfs#9 0x00007f500e198475 in zio_vdev_io_done () from /version/lib/lib_naslfs_dll.so
openzfs#10 0x00007f500e197cea in zio_execute () from /version/lib/lib_naslfs_dll.so
openzfs#11 0x00007f500e19ca93 in zio_done () from /version/lib/lib_naslfs_dll.so
openzfs#12 0x00007f500e197cea in zio_execute () from /version/lib/lib_naslfs_dll.so
openzfs#13 0x00007f500e070f79 in taskq_thread () from /version/lib/lib_naslfs_dll.so

The patch
Do not ask for the dirty data statistics anymore in vdev_queue_max_async_writes before the pool pointer get initialized. Instead we push data out as fast as possible to speed up the self-healing process triggered by import.

@GeLiXin GeLiXin force-pushed the GeLiXin-patch-writethrottle branch 5 times, most recently from 1e34428 to 47a660a Compare May 21, 2016 04:05
GeLiXin added 2 commits May 21, 2016 12:07
Signed-off-by: Ge Lixin <47034221@qq.com>
@GeLiXin GeLiXin closed this Jul 22, 2016
GeLiXin pushed a commit that referenced this pull request Aug 2, 2016
DMU_MAX_ACCESS should be cast to a uint64_t otherwise the
multiplication of DMU_MAX_ACCESS with spa_asize_inflation will
be 32 bit and may lead to an overflow. Currently DMU_MAX_ACCESS
is 64 * 1024 * 1024, so spa_asize_inflation being 64 or more will
lead to an overflow.

Found by static analysis with CoverityScan 0.8.5

CID 150942 (#1 of 1): Unintentional integer overflow
  (OVERFLOW_BEFORE_WIDEN)
overflow_before_widen: Potentially overflowing expression
  67108864 * spa_asize_inflation with type int (32 bits, signed)
  is evaluated using 32-bit arithmetic, and then used in a context
  that expects an expression of type uint64_t (64 bits, unsigned).

Signed-off-by: Colin Ian King <colin.king@canonical.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4889
GeLiXin pushed a commit that referenced this pull request Aug 2, 2016
Leaks reported by using AddressSanitizer, GCC 6.1.0

Direct leak of 4097 byte(s) in 1 object(s) allocated from:
    #1 0x414f73 in process_options cmd/ztest/ztest.c:721

Direct leak of 5440 byte(s) in 17 object(s) allocated from:
    #1 0x41bfd5 in umem_alloc ../../lib/libspl/include/umem.h:88
    #2 0x41bfd5 in ztest_zap_parallel cmd/ztest/ztest.c:4659
    #3 0x4163a8 in ztest_execute cmd/ztest/ztest.c:5907

Signed-off-by: Gvozden Neskovic <neskovic@gmail.com>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#4896
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant