Skip to content

txg_sync blocked for more than 120s #3613

@sluitz

Description

@sluitz

All ZFS read/writes blocked / hung. Did not clear up by itself for an hour (had to force hard reset of system). Possibly triggered by snapshot create or delete. Pool scrub was going on during problem.

Kernel: 2.6.32-504.30.3.el6.x86_64 (CentOS 6.x)
ZFS and SPL rpms built from git master (yesterday)
ZFS: zfs-dkms-0.6.4-164_g53b1d97.el6.noarch
SPL: spl-dkms-0.6.4-13_g37d7cd9.el6.noarch

System: Intel Atom C2750 with 32G of ECC RAM

This happened after an upgrade from git build zfs-dkms-0.6.3-155_g7b2d78a.el6.noarch.rpm / spl-dkms-0.6.3-50_g917fef2.el6.noarch.rpm which ran for many months (since November 2014) without any problems.

Steffen

Kernel log:

INFO: task txg_sync:2417 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.30.3.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
txg_sync D 0000000000000001 0 2417 2 0x00000000
ffff8808123dd220 0000000000000046 0000000000000000 ffff8808123dd1e4
0000000000000001 ffff88082fc24300 00005ceb0411e34c ffff8800283d58c0
0000000000005750 0000000106110f28 ffff8808222ff068 ffff8808123ddfd8
Call Trace:
[] __mutex_lock_slowpath+0x96/0x210
[] mutex_lock+0x2b/0x50
[] cv_wait_common+0xb7/0x130 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] ? buf_hash_find+0x9f/0x180 [zfs]
[] __cv_wait+0x15/0x20 [spl]
[] arc_read+0xb5/0xa70 [zfs]
[] ? read_tsc+0x9/0x20
[] ? getrawmonotonic+0x34/0xb0
[] ? arc_getbuf_func+0x0/0x80 [zfs]
[] dsl_scan_visitbp+0x509/0xb60 [zfs]
[] dsl_scan_visitbp+0x324/0xb60 [zfs]
[] dsl_scan_visitbp+0x324/0xb60 [zfs]
[] dsl_scan_visitbp+0x324/0xb60 [zfs]
[] dsl_scan_visitbp+0x324/0xb60 [zfs]
[] dsl_scan_visitbp+0x324/0xb60 [zfs]
[] dsl_scan_visitbp+0x324/0xb60 [zfs]
[] ? arc_read+0x3e1/0xa70 [zfs]
[] dsl_scan_visitbp+0x83e/0xb60 [zfs]
[] dsl_scan_visitds+0xe2/0x4c0 [zfs]
[] dsl_scan_sync+0x28f/0xbc0 [zfs]
[] spa_sync+0x3c7/0xb10 [zfs]
[] ? __wake_up_common+0x59/0x90
[] ? __wake_up+0x53/0x70
[] ? read_tsc+0x9/0x20
[] txg_sync_thread+0x389/0x620 [zfs]
[] ? account_entity_enqueue+0x7e/0x90
[] ? txg_sync_thread+0x0/0x620 [zfs]
[] ? txg_sync_thread+0x0/0x620 [zfs]
[] thread_generic_wrapper+0x68/0x80 [spl]
[] ? thread_generic_wrapper+0x0/0x80 [spl]
[] kthread+0x9e/0xc0
[] child_rip+0xa/0x20
[] ? kthread+0x0/0xc0
[] ? child_rip+0x0/0x20

INFO: task zfs:25498 blocked for more than 120 seconds.
Tainted: P --------------- 2.6.32-504.30.3.el6.x86_64 #1
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
zfs D 0000000000000001 0 25498 25487 0x00000080
ffff8805c5c3fa28 0000000000000082 ffff8805c5c3f978 ffffffffa022dfce
ffff8805c5c3faf8 ffffffffa02532b7 ffff8805c5c3f998 ffffffff00000000
ffff88072165fb00 ffff88081478e800 ffff88081e9cc5f8 ffff8805c5c3ffd8
Call Trace:
[] ? dmu_buf_rele+0xe/0x10 [zfs]
[] ? dsl_dataset_snapshot_check+0x117/0x3a0 [zfs]
[] ? prepare_to_wait_exclusive+0x4e/0x80
[] cv_wait_common+0x11d/0x130 [spl]
[] ? autoremove_wake_function+0x0/0x40
[] __cv_wait+0x15/0x20 [spl]
[] txg_wait_synced+0x8b/0xd0 [zfs]
[] ? dsl_dataset_snapshot_check+0x0/0x3a0 [zfs]
[] dsl_sync_task+0x16a/0x250 [zfs]
[] ? dsl_dataset_snapshot_sync+0x0/0x1a0 [zfs]
[] ? dsl_dataset_snapshot_check+0x0/0x3a0 [zfs]
[] ? dsl_dataset_snapshot_sync+0x0/0x1a0 [zfs]
[] dsl_dataset_snapshot+0x139/0x2e0 [zfs]
[] ? nvlist_add_common+0x3eb/0x450 [znvpair]
[] ? __kmalloc_node+0x4d/0x60
[] ? spl_kmem_alloc_debug+0x9c/0x1e0 [spl]
[] ? nvlist_lookup_common+0x84/0xd0 [znvpair]
[] zfs_ioc_snapshot+0x249/0x290 [zfs]
[] zfsdev_ioctl+0x1cf/0x4d0 [zfs]
[] vfs_ioctl+0x22/0xa0
[] do_vfs_ioctl+0x84/0x580
[] sys_ioctl+0x81/0xa0
[] system_call_fastpath+0x16/0x1b

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type: PerformancePerformance improvement or performance problem

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions