Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filesystems disappearing on reboot #2224

Closed
ldonzis opened this issue Mar 30, 2014 · 23 comments
Closed

filesystems disappearing on reboot #2224

ldonzis opened this issue Mar 30, 2014 · 23 comments

Comments

@ldonzis
Copy link

ldonzis commented Mar 30, 2014

Sorry if this is a duplicate, but I've searched and can't find anything exactly like this. And sorry for the lack of details, but I think I can reproduce it as it's happened twice.

We have two NAS servers that are sync'ed by almost continuous replication of snapshots, and it works very well.

However, after rebooting the secondary server (the one that's running zfs recv), a couple of filesystems have vanished. (zfs list does not show them.) However, doing a zfs create on one of the missing ones does not return any error, yet the created filesystem still doesn't show up. Attempting to destroy the filesystem says that it doesn't exist. The only recovery I've found so far is to destroy the parent filesystem, then re-create the parent and children, and then do a full replication.

It's entirely possible/likely that a recv was in progress when a "reboot" command was issued, which may or may not be related. If you have any suggestions for obtaining better information that would be useful, I'm willing to try it.

Thanks,
lew

@behlendorf behlendorf added the Bug label Apr 4, 2014
@behlendorf behlendorf added this to the 0.7.0 milestone Apr 4, 2014
@behlendorf
Copy link
Contributor

Thanks for taking the time to open the issue so we can track this.

@ldonzis
Copy link
Author

ldonzis commented Jul 12, 2014

Update... this just happened again today on reboot after upgrading from 0.6.2 to 0.6.3. Again, there was very likely a zfs recv in progress at the time. However, this time, when the system rebooted, instead of a filesystem being missing, it was renamed.

The normal filesystem name is "ds/ds2", but zfs list showed this:

ds/ds2recv-3577-1 1.39T 1.11T 589M /ds2
ds/ds2recv-3577-1/backups 491G 1.11T 491G /share/backups-noarchive
ds/ds2recv-3577-1/vm2 930G 1.11T 930G /share/vm2

I then tried "zfs rename ds/ds2recv-3577-1 ds/ds2", and that succeeded, but then only ds/ds2 existed, in other words, the subordinate filesystems of ds/ds2/backups and ds/ds2/vm2 were gone:

ds/ds2 589M 2.50T 589M /ds2

and as you can see, the space used by the missing filesystems was freed up.

Then, I decided to re-create the filesystem, so, "zfs destroy -r ds/ds2", followed by "zfs create ds/ds2". However, at this point, "zfs list" does not show ds/ds2 existing.

At that point, running "zfs create" would appear to succeed, but the filesystem was not created. If I run "zfs create ds/ds2" twice in very rapid succession, I get:

cannot create 'ds/ds2': dataset already exists

However, if I pause more than about a second between the two zfs create's, then it appears to succeed, but the filesystem isn't created.

As before, the only apparent way to recover was to destroy the whole pool and re-create everything from scratch.

@ldonzis
Copy link
Author

ldonzis commented Jul 21, 2014

One more update. Sorry for lack of a crash dump, but the receiving system just became inaccessible, i.e., no way to log into it and it was effectively "hung" while doing a recv. After pushing the hardware reset button, it came back up but one of the ZFS filesystems is missing.

Below is the kernel log from the first indication that it was stuck.

Thanks,
lew

Jul 21 08:38:27 ss2 kernel: [472320.484011] INFO: task kswapd0:37 blocked for more than 120 seconds.
Jul 21 08:38:27 ss2 kernel: [472320.484393] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:38:27 ss2 kernel: [472320.484839] kswapd0 D 0000000000000003 0 37 2 0x00000000
Jul 21 08:38:27 ss2 kernel: [472320.484844] ffff8804131c5bb0 0000000000000046 ffff8803e4bc3cb8 ffff8803ee8c8470
Jul 21 08:38:27 ss2 kernel: [472320.484848] ffff8804131c5fd8 ffff8804131c5fd8 ffff8804131c5fd8 00000000000137c0
Jul 21 08:38:27 ss2 kernel: [472320.484852] ffff880401e4ae00 ffff880414e55c00 ffff8804131c5b90 ffff8803eea68470
Jul 21 08:38:27 ss2 kernel: [472320.484856] Call Trace:
Jul 21 08:38:27 ss2 kernel: [472320.484866] [] schedule+0x3f/0x60
Jul 21 08:38:27 ss2 kernel: [472320.484869] [] __mutex_lock_slowpath+0xd7/0x150
Jul 21 08:38:27 ss2 kernel: [472320.484872] [] mutex_lock+0x2a/0x50
Jul 21 08:38:27 ss2 kernel: [472320.484929] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
Jul 21 08:38:27 ss2 kernel: [472320.484934] [] prune_super+0x74/0x1a0
Jul 21 08:38:27 ss2 kernel: [472320.484939] [] shrink_slab+0xa9/0x300
Jul 21 08:38:27 ss2 kernel: [472320.484942] [] balance_pgdat+0x50a/0x6d0
Jul 21 08:38:27 ss2 kernel: [472320.484944] [] kswapd+0x121/0x220
Jul 21 08:38:27 ss2 kernel: [472320.484947] [] ? balance_pgdat+0x6d0/0x6d0
Jul 21 08:38:27 ss2 kernel: [472320.484951] [] kthread+0x8c/0xa0
Jul 21 08:38:27 ss2 kernel: [472320.484956] [] kernel_thread_helper+0x4/0x10
Jul 21 08:38:27 ss2 kernel: [472320.484959] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:38:27 ss2 kernel: [472320.484962] [] ? gs_change+0x13/0x13
Jul 21 08:38:27 ss2 kernel: [472320.484967] INFO: task spl_kmem_cache/:347 blocked for more than 120 seconds.
Jul 21 08:38:27 ss2 kernel: [472320.485374] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:38:27 ss2 kernel: [472320.485820] spl_kmem_cache/ D ffffffff81806280 0 347 2 0x00000000
Jul 21 08:38:27 ss2 kernel: [472320.485823] ffff880414efb800 0000000000000046 ffff880414efb840 ffffffffa00d9491
Jul 21 08:38:27 ss2 kernel: [472320.485827] ffff880414efbfd8 ffff880414efbfd8 ffff880414efbfd8 00000000000137c0
Jul 21 08:38:27 ss2 kernel: [472320.485831] ffffffff81c0d020 ffff880410490000 ffff8804153168b0 ffff8803eea68470
Jul 21 08:38:27 ss2 kernel: [472320.485834] Call Trace:
Jul 21 08:38:27 ss2 kernel: [472320.485846] [] ? spl_slab_reclaim+0x271/0x440 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485849] [] schedule+0x3f/0x60
Jul 21 08:38:27 ss2 kernel: [472320.485852] [] __mutex_lock_slowpath+0xd7/0x150
Jul 21 08:38:27 ss2 kernel: [472320.485854] [] ? wake_up_bit+0x2e/0x40
Jul 21 08:38:27 ss2 kernel: [472320.485857] [] mutex_lock+0x2a/0x50
Jul 21 08:38:27 ss2 kernel: [472320.485882] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
Jul 21 08:38:27 ss2 kernel: [472320.485885] [] prune_super+0x74/0x1a0
Jul 21 08:38:27 ss2 kernel: [472320.485887] [] shrink_slab+0xa9/0x300
Jul 21 08:38:27 ss2 kernel: [472320.485890] [] do_try_to_free_pages+0x268/0x2d0
Jul 21 08:38:27 ss2 kernel: [472320.485893] [] try_to_free_pages+0xab/0x170
Jul 21 08:38:27 ss2 kernel: [472320.485897] [] __alloc_pages_nodemask+0x59e/0x8f0
Jul 21 08:38:27 ss2 kernel: [472320.485901] [] alloc_pages_current+0xb6/0x120
Jul 21 08:38:27 ss2 kernel: [472320.485905] [] __vmalloc_area_node+0x103/0x1c0
Jul 21 08:38:27 ss2 kernel: [472320.485911] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485914] [] __vmalloc_node_range+0x86/0xd0
Jul 21 08:38:27 ss2 kernel: [472320.485920] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485923] [] __vmalloc_node+0x35/0x40
Jul 21 08:38:27 ss2 kernel: [472320.485929] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485932] [] __vmalloc+0x22/0x30
Jul 21 08:38:27 ss2 kernel: [472320.485938] [] kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485944] [] spl_cache_grow_work+0x40/0x430 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485951] [] taskq_thread+0x236/0x4b0 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485956] [] ? try_to_wake_up+0x200/0x200
Jul 21 08:38:27 ss2 kernel: [472320.485963] [] ? task_done+0x160/0x160 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.485965] [] kthread+0x8c/0xa0
Jul 21 08:38:27 ss2 kernel: [472320.485968] [] kernel_thread_helper+0x4/0x10
Jul 21 08:38:27 ss2 kernel: [472320.485971] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:38:27 ss2 kernel: [472320.485974] [] ? gs_change+0x13/0x13
Jul 21 08:38:27 ss2 kernel: [472320.485990] INFO: task txg_quiesce:2183 blocked for more than 120 seconds.
Jul 21 08:38:27 ss2 kernel: [472320.486373] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:38:27 ss2 kernel: [472320.486821] txg_quiesce D ffffffff81806280 0 2183 2 0x00000000
Jul 21 08:38:27 ss2 kernel: [472320.486824] ffff8803fdb8fd40 0000000000000046 ffff8803fdb8fde0 ffffffffa00d2973
Jul 21 08:38:27 ss2 kernel: [472320.486828] ffff8803fdb8ffd8 ffff8803fdb8ffd8 ffff8803fdb8ffd8 00000000000137c0
Jul 21 08:38:27 ss2 kernel: [472320.486832] ffff880415561700 ffff880401e84500 ffff8803fdb8fd50 ffffc90009ea8450
Jul 21 08:38:27 ss2 kernel: [472320.486835] Call Trace:
Jul 21 08:38:27 ss2 kernel: [472320.486842] [] ? spl_debug_msg+0x483/0x8e0 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.486845] [] schedule+0x3f/0x60
Jul 21 08:38:27 ss2 kernel: [472320.486853] [] cv_wait_common+0xfd/0x1b0 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.486855] [] ? add_wait_queue+0x60/0x60
Jul 21 08:38:27 ss2 kernel: [472320.486863] [] __cv_wait+0x15/0x20 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.486888] [] txg_quiesce_thread+0x24b/0x3a0 [zfs]
Jul 21 08:38:27 ss2 kernel: [472320.486913] [] ? txg_thread_wait.isra.2+0x40/0x40 [zfs]
Jul 21 08:38:27 ss2 kernel: [472320.486920] [] thread_generic_wrapper+0x78/0x90 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.486927] [] ? __thread_create+0x300/0x300 [spl]
Jul 21 08:38:27 ss2 kernel: [472320.486930] [] kthread+0x8c/0xa0
Jul 21 08:38:27 ss2 kernel: [472320.486933] [] kernel_thread_helper+0x4/0x10
Jul 21 08:38:27 ss2 kernel: [472320.486936] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:38:27 ss2 kernel: [472320.486938] [] ? gs_change+0x13/0x13
Jul 21 08:40:27 ss2 kernel: [472440.484018] INFO: task kswapd0:37 blocked for more than 120 seconds.
Jul 21 08:40:27 ss2 kernel: [472440.484378] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:40:27 ss2 kernel: [472440.484826] kswapd0 D 0000000000000003 0 37 2 0x00000000
Jul 21 08:40:27 ss2 kernel: [472440.484830] ffff8804131c5bb0 0000000000000046 ffff8803e4bc3cb8 ffff8803ee8c8470
Jul 21 08:40:27 ss2 kernel: [472440.484835] ffff8804131c5fd8 ffff8804131c5fd8 ffff8804131c5fd8 00000000000137c0
Jul 21 08:40:27 ss2 kernel: [472440.484839] ffff880401e4ae00 ffff880414e55c00 ffff8804131c5b90 ffff8803eea68470
Jul 21 08:40:27 ss2 kernel: [472440.484843] Call Trace:
Jul 21 08:40:27 ss2 kernel: [472440.484852] [] schedule+0x3f/0x60
Jul 21 08:40:27 ss2 kernel: [472440.484855] [] __mutex_lock_slowpath+0xd7/0x150
Jul 21 08:40:27 ss2 kernel: [472440.484858] [] mutex_lock+0x2a/0x50
Jul 21 08:40:27 ss2 kernel: [472440.484915] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
Jul 21 08:40:27 ss2 kernel: [472440.484921] [] prune_super+0x74/0x1a0
Jul 21 08:40:27 ss2 kernel: [472440.484925] [] shrink_slab+0xa9/0x300
Jul 21 08:40:27 ss2 kernel: [472440.484928] [] balance_pgdat+0x50a/0x6d0
Jul 21 08:40:27 ss2 kernel: [472440.484930] [] kswapd+0x121/0x220
Jul 21 08:40:27 ss2 kernel: [472440.484933] [] ? balance_pgdat+0x6d0/0x6d0
Jul 21 08:40:27 ss2 kernel: [472440.484938] [] kthread+0x8c/0xa0
Jul 21 08:40:27 ss2 kernel: [472440.484943] [] kernel_thread_helper+0x4/0x10
Jul 21 08:40:27 ss2 kernel: [472440.484946] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:40:27 ss2 kernel: [472440.484948] [] ? gs_change+0x13/0x13
Jul 21 08:40:27 ss2 kernel: [472440.484954] INFO: task spl_kmem_cache/:347 blocked for more than 120 seconds.
Jul 21 08:40:27 ss2 kernel: [472440.485362] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:40:27 ss2 kernel: [472440.485808] spl_kmem_cache/ D ffffffff81806280 0 347 2 0x00000000
Jul 21 08:40:27 ss2 kernel: [472440.485811] ffff880414efb800 0000000000000046 ffff880414efb840 ffffffffa00d9491
Jul 21 08:40:27 ss2 kernel: [472440.485815] ffff880414efbfd8 ffff880414efbfd8 ffff880414efbfd8 00000000000137c0
Jul 21 08:40:27 ss2 kernel: [472440.485819] ffffffff81c0d020 ffff880410490000 ffff8804153168b0 ffff8803eea68470
Jul 21 08:40:27 ss2 kernel: [472440.485822] Call Trace:
Jul 21 08:40:27 ss2 kernel: [472440.485834] [] ? spl_slab_reclaim+0x271/0x440 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485838] [] schedule+0x3f/0x60
Jul 21 08:40:27 ss2 kernel: [472440.485840] [] __mutex_lock_slowpath+0xd7/0x150
Jul 21 08:40:27 ss2 kernel: [472440.485843] [] ? wake_up_bit+0x2e/0x40
Jul 21 08:40:27 ss2 kernel: [472440.485845] [] mutex_lock+0x2a/0x50
Jul 21 08:40:27 ss2 kernel: [472440.485870] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
Jul 21 08:40:27 ss2 kernel: [472440.485873] [] prune_super+0x74/0x1a0
Jul 21 08:40:27 ss2 kernel: [472440.485876] [] shrink_slab+0xa9/0x300
Jul 21 08:40:27 ss2 kernel: [472440.485879] [] do_try_to_free_pages+0x268/0x2d0
Jul 21 08:40:27 ss2 kernel: [472440.485881] [] try_to_free_pages+0xab/0x170
Jul 21 08:40:27 ss2 kernel: [472440.485886] [] __alloc_pages_nodemask+0x59e/0x8f0
Jul 21 08:40:27 ss2 kernel: [472440.485889] [] alloc_pages_current+0xb6/0x120
Jul 21 08:40:27 ss2 kernel: [472440.485893] [] __vmalloc_area_node+0x103/0x1c0
Jul 21 08:40:27 ss2 kernel: [472440.485899] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485902] [] __vmalloc_node_range+0x86/0xd0
Jul 21 08:40:27 ss2 kernel: [472440.485909] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485911] [] __vmalloc_node+0x35/0x40
Jul 21 08:40:27 ss2 kernel: [472440.485917] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485920] [] __vmalloc+0x22/0x30
Jul 21 08:40:27 ss2 kernel: [472440.485926] [] kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485932] [] spl_cache_grow_work+0x40/0x430 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485940] [] taskq_thread+0x236/0x4b0 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485944] [] ? try_to_wake_up+0x200/0x200
Jul 21 08:40:27 ss2 kernel: [472440.485951] [] ? task_done+0x160/0x160 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.485953] [] kthread+0x8c/0xa0
Jul 21 08:40:27 ss2 kernel: [472440.485956] [] kernel_thread_helper+0x4/0x10
Jul 21 08:40:27 ss2 kernel: [472440.485959] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:40:27 ss2 kernel: [472440.485962] [] ? gs_change+0x13/0x13
Jul 21 08:40:27 ss2 kernel: [472440.485984] INFO: task txg_quiesce:2183 blocked for more than 120 seconds.
Jul 21 08:40:27 ss2 kernel: [472440.486367] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:40:27 ss2 kernel: [472440.486815] txg_quiesce D ffffffff81806280 0 2183 2 0x00000000
Jul 21 08:40:27 ss2 kernel: [472440.486818] ffff8803fdb8fd40 0000000000000046 ffff8803fdb8fde0 ffffffffa00d2973
Jul 21 08:40:27 ss2 kernel: [472440.486822] ffff8803fdb8ffd8 ffff8803fdb8ffd8 ffff8803fdb8ffd8 00000000000137c0
Jul 21 08:40:27 ss2 kernel: [472440.486825] ffff880415561700 ffff880401e84500 ffff8803fdb8fd50 ffffc90009ea8450
Jul 21 08:40:27 ss2 kernel: [472440.486829] Call Trace:
Jul 21 08:40:27 ss2 kernel: [472440.486835] [] ? spl_debug_msg+0x483/0x8e0 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.486838] [] schedule+0x3f/0x60
Jul 21 08:40:27 ss2 kernel: [472440.486846] [] cv_wait_common+0xfd/0x1b0 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.486849] [] ? add_wait_queue+0x60/0x60
Jul 21 08:40:27 ss2 kernel: [472440.486856] [] __cv_wait+0x15/0x20 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.486882] [] txg_quiesce_thread+0x24b/0x3a0 [zfs]
Jul 21 08:40:27 ss2 kernel: [472440.486907] [] ? txg_thread_wait.isra.2+0x40/0x40 [zfs]
Jul 21 08:40:27 ss2 kernel: [472440.486914] [] thread_generic_wrapper+0x78/0x90 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.486920] [] ? __thread_create+0x300/0x300 [spl]
Jul 21 08:40:27 ss2 kernel: [472440.486923] [] kthread+0x8c/0xa0
Jul 21 08:40:27 ss2 kernel: [472440.486926] [] kernel_thread_helper+0x4/0x10
Jul 21 08:40:27 ss2 kernel: [472440.486929] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:40:27 ss2 kernel: [472440.486932] [] ? gs_change+0x13/0x13
Jul 21 08:42:27 ss2 kernel: [472560.484017] INFO: task kswapd0:37 blocked for more than 120 seconds.
Jul 21 08:42:27 ss2 kernel: [472560.484376] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:42:27 ss2 kernel: [472560.484824] kswapd0 D 0000000000000003 0 37 2 0x00000000
Jul 21 08:42:27 ss2 kernel: [472560.484829] ffff8804131c5bb0 0000000000000046 ffff8803e4bc3cb8 ffff8803ee8c8470
Jul 21 08:42:27 ss2 kernel: [472560.484833] ffff8804131c5fd8 ffff8804131c5fd8 ffff8804131c5fd8 00000000000137c0
Jul 21 08:42:27 ss2 kernel: [472560.484837] ffff880401e4ae00 ffff880414e55c00 ffff8804131c5b90 ffff8803eea68470
Jul 21 08:42:27 ss2 kernel: [472560.484841] Call Trace:
Jul 21 08:42:27 ss2 kernel: [472560.484850] [] schedule+0x3f/0x60
Jul 21 08:42:27 ss2 kernel: [472560.484853] [] __mutex_lock_slowpath+0xd7/0x150
Jul 21 08:42:27 ss2 kernel: [472560.484856] [] mutex_lock+0x2a/0x50
Jul 21 08:42:27 ss2 kernel: [472560.484914] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.484919] [] prune_super+0x74/0x1a0
Jul 21 08:42:27 ss2 kernel: [472560.484922] [] shrink_slab+0xa9/0x300
Jul 21 08:42:27 ss2 kernel: [472560.484925] [] balance_pgdat+0x50a/0x6d0
Jul 21 08:42:27 ss2 kernel: [472560.484928] [] kswapd+0x121/0x220
Jul 21 08:42:27 ss2 kernel: [472560.484931] [] ? balance_pgdat+0x6d0/0x6d0
Jul 21 08:42:27 ss2 kernel: [472560.484935] [] kthread+0x8c/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.484940] [] kernel_thread_helper+0x4/0x10
Jul 21 08:42:27 ss2 kernel: [472560.484943] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.484946] [] ? gs_change+0x13/0x13
Jul 21 08:42:27 ss2 kernel: [472560.484951] INFO: task spl_kmem_cache/:347 blocked for more than 120 seconds.
Jul 21 08:42:27 ss2 kernel: [472560.485359] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:42:27 ss2 kernel: [472560.485804] spl_kmem_cache/ D ffffffff81806280 0 347 2 0x00000000
Jul 21 08:42:27 ss2 kernel: [472560.485807] ffff880414efb800 0000000000000046 ffff880414efb840 ffffffffa00d9491
Jul 21 08:42:27 ss2 kernel: [472560.485811] ffff880414efbfd8 ffff880414efbfd8 ffff880414efbfd8 00000000000137c0
Jul 21 08:42:27 ss2 kernel: [472560.485815] ffffffff81c0d020 ffff880410490000 ffff8804153168b0 ffff8803eea68470
Jul 21 08:42:27 ss2 kernel: [472560.485819] Call Trace:
Jul 21 08:42:27 ss2 kernel: [472560.485831] [] ? spl_slab_reclaim+0x271/0x440 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485834] [] schedule+0x3f/0x60
Jul 21 08:42:27 ss2 kernel: [472560.485836] [] __mutex_lock_slowpath+0xd7/0x150
Jul 21 08:42:27 ss2 kernel: [472560.485839] [] ? wake_up_bit+0x2e/0x40
Jul 21 08:42:27 ss2 kernel: [472560.485842] [] mutex_lock+0x2a/0x50
Jul 21 08:42:27 ss2 kernel: [472560.485866] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.485869] [] prune_super+0x74/0x1a0
Jul 21 08:42:27 ss2 kernel: [472560.485872] [] shrink_slab+0xa9/0x300
Jul 21 08:42:27 ss2 kernel: [472560.485875] [] do_try_to_free_pages+0x268/0x2d0
Jul 21 08:42:27 ss2 kernel: [472560.485878] [] try_to_free_pages+0xab/0x170
Jul 21 08:42:27 ss2 kernel: [472560.485882] [] __alloc_pages_nodemask+0x59e/0x8f0
Jul 21 08:42:27 ss2 kernel: [472560.485886] [] alloc_pages_current+0xb6/0x120
Jul 21 08:42:27 ss2 kernel: [472560.485889] [] __vmalloc_area_node+0x103/0x1c0
Jul 21 08:42:27 ss2 kernel: [472560.485896] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485899] [] __vmalloc_node_range+0x86/0xd0
Jul 21 08:42:27 ss2 kernel: [472560.485905] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485907] [] __vmalloc_node+0x35/0x40
Jul 21 08:42:27 ss2 kernel: [472560.485913] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485916] [] __vmalloc+0x22/0x30
Jul 21 08:42:27 ss2 kernel: [472560.485922] [] kv_alloc.isra.9+0x59/0x60 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485928] [] spl_cache_grow_work+0x40/0x430 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485936] [] taskq_thread+0x236/0x4b0 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485940] [] ? try_to_wake_up+0x200/0x200
Jul 21 08:42:27 ss2 kernel: [472560.485947] [] ? task_done+0x160/0x160 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.485950] [] kthread+0x8c/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.485953] [] kernel_thread_helper+0x4/0x10
Jul 21 08:42:27 ss2 kernel: [472560.485955] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.485958] [] ? gs_change+0x13/0x13
Jul 21 08:42:27 ss2 kernel: [472560.485980] INFO: task txg_quiesce:2183 blocked for more than 120 seconds.
Jul 21 08:42:27 ss2 kernel: [472560.486363] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:42:27 ss2 kernel: [472560.486811] txg_quiesce D ffffffff81806280 0 2183 2 0x00000000
Jul 21 08:42:27 ss2 kernel: [472560.486814] ffff8803fdb8fd40 0000000000000046 ffff8803fdb8fde0 ffffffffa00d2973
Jul 21 08:42:27 ss2 kernel: [472560.486818] ffff8803fdb8ffd8 ffff8803fdb8ffd8 ffff8803fdb8ffd8 00000000000137c0
Jul 21 08:42:27 ss2 kernel: [472560.486822] ffff880415561700 ffff880401e84500 ffff8803fdb8fd50 ffffc90009ea8450
Jul 21 08:42:27 ss2 kernel: [472560.486825] Call Trace:
Jul 21 08:42:27 ss2 kernel: [472560.486832] [] ? spl_debug_msg+0x483/0x8e0 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.486835] [] schedule+0x3f/0x60
Jul 21 08:42:27 ss2 kernel: [472560.486843] [] cv_wait_common+0xfd/0x1b0 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.486846] [] ? add_wait_queue+0x60/0x60
Jul 21 08:42:27 ss2 kernel: [472560.486853] [] __cv_wait+0x15/0x20 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.486878] [] txg_quiesce_thread+0x24b/0x3a0 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.486903] [] ? txg_thread_wait.isra.2+0x40/0x40 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.486910] [] thread_generic_wrapper+0x78/0x90 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.486917] [] ? __thread_create+0x300/0x300 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.486920] [] kthread+0x8c/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.486923] [] kernel_thread_helper+0x4/0x10
Jul 21 08:42:27 ss2 kernel: [472560.486926] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.486928] [] ? gs_change+0x13/0x13
Jul 21 08:42:27 ss2 kernel: [472560.486932] INFO: task nfsd:2370 blocked for more than 120 seconds.
Jul 21 08:42:27 ss2 kernel: [472560.487291] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Jul 21 08:42:27 ss2 kernel: [472560.487737] nfsd D ffffffff81806280 0 2370 2 0x00000000
Jul 21 08:42:27 ss2 kernel: [472560.487740] ffff8803f90dd740 0000000000000046 0000000000000030 ffffffffa00d735a
Jul 21 08:42:27 ss2 kernel: [472560.487744] ffff8803f90ddfd8 ffff8803f90ddfd8 ffff8803f90ddfd8 00000000000137c0
Jul 21 08:42:27 ss2 kernel: [472560.487747] ffff880415561700 ffff8803ff21c500 ffff8803f90dd750 ffff8803d8357e38
Jul 21 08:42:27 ss2 kernel: [472560.487751] Call Trace:
Jul 21 08:42:27 ss2 kernel: [472560.487758] [] ? kmem_alloc_debug+0x14a/0x3d0 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.487761] [] schedule+0x3f/0x60
Jul 21 08:42:27 ss2 kernel: [472560.487768] [] cv_wait_common+0xfd/0x1b0 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.487772] [] ? default_spin_lock_flags+0x9/0x10
Jul 21 08:42:27 ss2 kernel: [472560.487775] [] ? add_wait_queue+0x60/0x60
Jul 21 08:42:27 ss2 kernel: [472560.487782] [] __cv_wait+0x15/0x20 [spl]
Jul 21 08:42:27 ss2 kernel: [472560.487808] [] zfs_range_lock+0x294/0x5c0 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.487815] [] ? uio_prefaultpages+0xd2/0x110 [zcommon]
Jul 21 08:42:27 ss2 kernel: [472560.487840] [] zfs_write+0x85b/0xbe0 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.487843] [] ? mutex_lock+0x1d/0x50
Jul 21 08:42:27 ss2 kernel: [472560.487863] [] ? dnode_rele+0x54/0x90 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.487867] [] ? _raw_spin_lock+0xe/0x20
Jul 21 08:42:27 ss2 kernel: [472560.487870] [] ? rw_copy_check_uvector+0x62/0x130
Jul 21 08:42:27 ss2 kernel: [472560.487895] [] zpl_write_common+0x52/0x90 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.487920] [] zpl_write+0x68/0xa0 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.487945] [] ? zpl_write_common+0x90/0x90 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.487948] [] do_loop_readv_writev+0x59/0x90
Jul 21 08:42:27 ss2 kernel: [472560.487950] [] do_readv_writev+0x1b6/0x1d0
Jul 21 08:42:27 ss2 kernel: [472560.487975] [] ? zpl_open+0x60/0x80 [zfs]
Jul 21 08:42:27 ss2 kernel: [472560.487978] [] ? _raw_spin_lock+0xe/0x20
Jul 21 08:42:27 ss2 kernel: [472560.487982] [] ? vfsmount_lock_local_unlock+0x1e/0x30
Jul 21 08:42:27 ss2 kernel: [472560.487985] [] ? mntput_no_expire+0x30/0xf0
Jul 21 08:42:27 ss2 kernel: [472560.487988] [] vfs_writev+0x3c/0x50
Jul 21 08:42:27 ss2 kernel: [472560.487997] [] nfsd_vfs_write.isra.13+0xf4/0x2a0 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488023] [] ? fh_verify+0x15b/0x220 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488030] [] nfsd_write+0xf8/0x110 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488038] [] ? nfsd_cache_lookup+0x369/0x420 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488047] [] nfsd3_proc_write+0xb1/0x140 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488054] [] nfsd_dispatch+0xeb/0x230 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488072] [] svc_process_common+0x345/0x690 [sunrpc]
Jul 21 08:42:27 ss2 kernel: [472560.488078] [] ? try_to_wake_up+0x200/0x200
Jul 21 08:42:27 ss2 kernel: [472560.488088] [] svc_process+0x102/0x150 [sunrpc]
Jul 21 08:42:27 ss2 kernel: [472560.488095] [] nfsd+0xbd/0x160 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488101] [] ? nfsd_startup+0xf0/0xf0 [nfsd]
Jul 21 08:42:27 ss2 kernel: [472560.488105] [] kthread+0x8c/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.488110] [] kernel_thread_helper+0x4/0x10
Jul 21 08:42:27 ss2 kernel: [472560.488114] [] ? flush_kthread_worker+0xa0/0xa0
Jul 21 08:42:27 ss2 kernel: [472560.488119] [] ? gs_change+0x13/0x13

@ldonzis
Copy link
Author

ldonzis commented Jul 25, 2014

I think we're getting closer to being able to reproduce this. From our experience, 0.6.3 has been somewhat less stable than 0.6.2, having "stopped" a few times. When this happens, the system is running, but any attempt to run a "zfs" command, like to create or destroy a snapshot, etc., will hang. And the "zfs receive" in process is totally stuck. The system won't do a clean reboot; we have to push the reset button.

This happened a couple of days ago, and I made sure to check that after rebooting, all of the filesystems were intact and nothing appeared to be missing. The system in question is the target of replication, so the source system was still hung running "zfs send", and in fact, the "rsh zfs receive" was also still hung. After killing those processes and restarting the synchronization, that is when an entire filesystem went missing and the space was freed. Re-creating the filesystem and re-sync'ing, as usual, restored normal operation.

Anyway, the thing I wanted to mention is that the system appeared ok on reboot, but a subsequent zfs receive somehow destroyed the target filesystem.

I admit, though, these are slightly different symptoms than reported at the beginning of this thread, where a filesystem would disappear and it would then be impossible to re-create it without destroying the whole pool and starting over.

@ldonzis
Copy link
Author

ldonzis commented Jul 25, 2014

Last night, we updated the kernel and rebooted the system, and it ran less than 12 hours before having "hang" problems. I'll run through the reboot procedure, but in the mean time here is the kernel log.

[39840.452012] INFO: task kswapd0:37 blocked for more than 120 seconds.
[39840.452380] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[39840.452811] kswapd0 D ffffffff81806240 0 37 2 0x00000000
[39840.452816] ffff8804131afbb0 0000000000000046 ffff8800532aeac8 ffff8800567d9c00
[39840.452820] ffff8804131affd8 ffff8804131affd8 ffff8804131affd8 00000000000137c0
[39840.452824] ffff880415561700 ffff880414e55c00 ffff8804131afb90 ffff8803ee6d0470
[39840.452828] Call Trace:
[39840.452837] [] schedule+0x3f/0x60
[39840.452840] [] __mutex_lock_slowpath+0xd7/0x150
[39840.452843] [] mutex_lock+0x2a/0x50
[39840.452901] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[39840.452906] [] prune_super+0x74/0x1a0
[39840.452910] [] shrink_slab+0xa9/0x300
[39840.452913] [] balance_pgdat+0x50a/0x6d0
[39840.452916] [] kswapd+0x121/0x220
[39840.452919] [] ? balance_pgdat+0x6d0/0x6d0
[39840.452923] [] kthread+0x8c/0xa0
[39840.452928] [] kernel_thread_helper+0x4/0x10
[39840.452931] [] ? flush_kthread_worker+0xa0/0xa0
[39840.452934] [] ? gs_change+0x13/0x13
[39840.452937] INFO: task spl_kmem_cache/:337 blocked for more than 120 seconds.
[39840.453330] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[39840.453764] spl_kmem_cache/ D ffffffff81806240 0 337 2 0x00000000
[39840.453768] ffff88041175f800 0000000000000046 ffff88041175f840 ffffffffa00e2555
[39840.453771] ffff88041175ffd8 ffff88041175ffd8 ffff88041175ffd8 00000000000137c0
[39840.453775] ffff880415581700 ffff880410f1dc00 ffff8804124078b0 ffff8803ee6d0470
[39840.453779] Call Trace:
[39840.453790] [] ? spl_slab_reclaim+0x335/0x440 [spl]
[39840.453794] [] schedule+0x3f/0x60
[39840.453796] [] __mutex_lock_slowpath+0xd7/0x150
[39840.453799] [] ? wake_up_bit+0x2e/0x40
[39840.453801] [] mutex_lock+0x2a/0x50
[39840.453826] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[39840.453829] [] prune_super+0x74/0x1a0
[39840.453831] [] shrink_slab+0xa9/0x300
[39840.453834] [] do_try_to_free_pages+0x268/0x2d0
[39840.453837] [] try_to_free_pages+0xab/0x170
[39840.453841] [] __alloc_pages_nodemask+0x59e/0x8f0
[39840.453845] [] alloc_pages_current+0xb6/0x120
[39840.453849] [] __vmalloc_area_node+0x103/0x1c0
[39840.453856] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[39840.453859] [] __vmalloc_node_range+0x86/0xd0
[39840.453865] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[39840.453868] [] __vmalloc_node+0x35/0x40
[39840.453874] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[39840.453876] [] __vmalloc+0x22/0x30
[39840.453882] [] kv_alloc.isra.9+0x59/0x60 [spl]
[39840.453888] [] spl_cache_grow_work+0x40/0x430 [spl]
[39840.453893] [] ? __wake_up+0x53/0x70
[39840.453901] [] taskq_thread+0x236/0x4b0 [spl]
[39840.453905] [] ? try_to_wake_up+0x200/0x200
[39840.453911] [] ? task_done+0x160/0x160 [spl]
[39840.453914] [] kthread+0x8c/0xa0
[39840.453917] [] kernel_thread_helper+0x4/0x10
[39840.453920] [] ? flush_kthread_worker+0xa0/0xa0
[39840.453923] [] ? gs_change+0x13/0x13
[39960.452017] INFO: task kswapd0:37 blocked for more than 120 seconds.
[39960.452019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[39960.452020] kswapd0 D ffffffff81806240 0 37 2 0x00000000
[39960.452024] ffff8804131afbb0 0000000000000046 ffff8800532aeac8 ffff8800567d9c00
[39960.452028] ffff8804131affd8 ffff8804131affd8 ffff8804131affd8 00000000000137c0
[39960.452031] ffff880415561700 ffff880414e55c00 ffff8804131afb90 ffff8803ee6d0470
[39960.452033] Call Trace:
[39960.452040] [] schedule+0x3f/0x60
[39960.452043] [] __mutex_lock_slowpath+0xd7/0x150
[39960.452045] [] mutex_lock+0x2a/0x50
[39960.452098] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[39960.452101] [] prune_super+0x74/0x1a0
[39960.452104] [] shrink_slab+0xa9/0x300
[39960.452107] [] balance_pgdat+0x50a/0x6d0
[39960.452109] [] kswapd+0x121/0x220
[39960.452111] [] ? balance_pgdat+0x6d0/0x6d0
[39960.452114] [] kthread+0x8c/0xa0
[39960.452118] [] kernel_thread_helper+0x4/0x10
[39960.452120] [] ? flush_kthread_worker+0xa0/0xa0
[39960.452122] [] ? gs_change+0x13/0x13
[39960.452126] INFO: task spl_kmem_cache/:337 blocked for more than 120 seconds.
[39960.452127] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[39960.452129] spl_kmem_cache/ D ffffffff81806240 0 337 2 0x00000000
[39960.452131] ffff88041175f800 0000000000000046 ffff88041175f840 ffffffffa00e2555
[39960.452134] ffff88041175ffd8 ffff88041175ffd8 ffff88041175ffd8 00000000000137c0
[39960.452137] ffff880415581700 ffff880410f1dc00 ffff8804124078b0 ffff8803ee6d0470
[39960.452139] Call Trace:
[39960.452149] [] ? spl_slab_reclaim+0x335/0x440 [spl]
[39960.452152] [] schedule+0x3f/0x60
[39960.452154] [] __mutex_lock_slowpath+0xd7/0x150
[39960.452156] [] ? wake_up_bit+0x2e/0x40
[39960.452158] [] mutex_lock+0x2a/0x50
[39960.452182] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[39960.452185] [] prune_super+0x74/0x1a0
[39960.452187] [] shrink_slab+0xa9/0x300
[39960.452189] [] do_try_to_free_pages+0x268/0x2d0
[39960.452192] [] try_to_free_pages+0xab/0x170
[39960.452195] [] __alloc_pages_nodemask+0x59e/0x8f0
[39960.452198] [] alloc_pages_current+0xb6/0x120
[39960.452201] [] __vmalloc_area_node+0x103/0x1c0
[39960.452207] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[39960.452210] [] __vmalloc_node_range+0x86/0xd0
[39960.452216] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[39960.452218] [] __vmalloc_node+0x35/0x40
[39960.452224] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[39960.452226] [] __vmalloc+0x22/0x30
[39960.452231] [] kv_alloc.isra.9+0x59/0x60 [spl]
[39960.452237] [] spl_cache_grow_work+0x40/0x430 [spl]
[39960.452240] [] ? __wake_up+0x53/0x70
[39960.452247] [] taskq_thread+0x236/0x4b0 [spl]
[39960.452251] [] ? try_to_wake_up+0x200/0x200
[39960.452257] [] ? task_done+0x160/0x160 [spl]
[39960.452259] [] kthread+0x8c/0xa0
[39960.452262] [] kernel_thread_helper+0x4/0x10
[39960.452265] [] ? flush_kthread_worker+0xa0/0xa0
[39960.452267] [] ? gs_change+0x13/0x13
[40080.452017] INFO: task kswapd0:37 blocked for more than 120 seconds.
[40080.452019] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[40080.452021] kswapd0 D ffffffff81806240 0 37 2 0x00000000
[40080.452026] ffff8804131afbb0 0000000000000046 ffff8800532aeac8 ffff8800567d9c00
[40080.452029] ffff8804131affd8 ffff8804131affd8 ffff8804131affd8 00000000000137c0
[40080.452032] ffff880415561700 ffff880414e55c00 ffff8804131afb90 ffff8803ee6d0470
[40080.452035] Call Trace:
[40080.452045] [] schedule+0x3f/0x60
[40080.452048] [] __mutex_lock_slowpath+0xd7/0x150
[40080.452050] [] mutex_lock+0x2a/0x50
[40080.452129] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[40080.452134] [] prune_super+0x74/0x1a0
[40080.452137] [] shrink_slab+0xa9/0x300
[40080.452140] [] balance_pgdat+0x50a/0x6d0
[40080.452142] [] kswapd+0x121/0x220
[40080.452145] [] ? balance_pgdat+0x6d0/0x6d0
[40080.452148] [] kthread+0x8c/0xa0
[40080.452153] [] kernel_thread_helper+0x4/0x10
[40080.452155] [] ? flush_kthread_worker+0xa0/0xa0
[40080.452158] [] ? gs_change+0x13/0x13
[40080.452163] INFO: task spl_kmem_cache/:337 blocked for more than 120 seconds.
[40080.452164] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[40080.452165] spl_kmem_cache/ D ffffffff81806240 0 337 2 0x00000000
[40080.452168] ffff88041175f800 0000000000000046 ffff88041175f840 ffffffffa00e2555
[40080.452171] ffff88041175ffd8 ffff88041175ffd8 ffff88041175ffd8 00000000000137c0
[40080.452174] ffff880415581700 ffff880410f1dc00 ffff8804124078b0 ffff8803ee6d0470
[40080.452176] Call Trace:
[40080.452193] [] ? spl_slab_reclaim+0x335/0x440 [spl]
[40080.452196] [] schedule+0x3f/0x60
[40080.452198] [] __mutex_lock_slowpath+0xd7/0x150
[40080.452200] [] ? wake_up_bit+0x2e/0x40
[40080.452202] [] mutex_lock+0x2a/0x50
[40080.452226] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[40080.452229] [] prune_super+0x74/0x1a0
[40080.452231] [] shrink_slab+0xa9/0x300
[40080.452234] [] do_try_to_free_pages+0x268/0x2d0
[40080.452236] [] try_to_free_pages+0xab/0x170
[40080.452240] [] __alloc_pages_nodemask+0x59e/0x8f0
[40080.452244] [] alloc_pages_current+0xb6/0x120
[40080.452247] [] __vmalloc_area_node+0x103/0x1c0
[40080.452253] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[40080.452256] [] __vmalloc_node_range+0x86/0xd0
[40080.452261] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[40080.452264] [] __vmalloc_node+0x35/0x40
[40080.452270] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[40080.452272] [] __vmalloc+0x22/0x30
[40080.452277] [] kv_alloc.isra.9+0x59/0x60 [spl]
[40080.452283] [] spl_cache_grow_work+0x40/0x430 [spl]
[40080.452287] [] ? __wake_up+0x53/0x70
[40080.452294] [] taskq_thread+0x236/0x4b0 [spl]
[40080.452298] [] ? try_to_wake_up+0x200/0x200
[40080.452304] [] ? task_done+0x160/0x160 [spl]
[40080.452307] [] kthread+0x8c/0xa0
[40080.452309] [] kernel_thread_helper+0x4/0x10
[40080.452312] [] ? flush_kthread_worker+0xa0/0xa0
[40080.452314] [] ? gs_change+0x13/0x13
[40200.452016] INFO: task kswapd0:37 blocked for more than 120 seconds.
[40200.452018] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[40200.452021] kswapd0 D ffffffff81806240 0 37 2 0x00000000
[40200.452025] ffff8804131afbb0 0000000000000046 ffff8800532aeac8 ffff8800567d9c00
[40200.452028] ffff8804131affd8 ffff8804131affd8 ffff8804131affd8 00000000000137c0
[40200.452031] ffff880415561700 ffff880414e55c00 ffff8804131afb90 ffff8803ee6d0470
[40200.452034] Call Trace:
[40200.452044] [] schedule+0x3f/0x60
[40200.452046] [] __mutex_lock_slowpath+0xd7/0x150
[40200.452049] [] mutex_lock+0x2a/0x50
[40200.452127] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[40200.452131] [] prune_super+0x74/0x1a0
[40200.452135] [] shrink_slab+0xa9/0x300
[40200.452138] [] balance_pgdat+0x50a/0x6d0
[40200.452140] [] kswapd+0x121/0x220
[40200.452143] [] ? balance_pgdat+0x6d0/0x6d0
[40200.452146] [] kthread+0x8c/0xa0
[40200.452151] [] kernel_thread_helper+0x4/0x10
[40200.452153] [] ? flush_kthread_worker+0xa0/0xa0
[40200.452155] [] ? gs_change+0x13/0x13
[40200.452160] INFO: task spl_kmem_cache/:337 blocked for more than 120 seconds.
[40200.452162] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[40200.452163] spl_kmem_cache/ D ffffffff81806240 0 337 2 0x00000000
[40200.452166] ffff88041175f800 0000000000000046 ffff88041175f840 ffffffffa00e2555
[40200.452169] ffff88041175ffd8 ffff88041175ffd8 ffff88041175ffd8 00000000000137c0
[40200.452171] ffff880415581700 ffff880410f1dc00 ffff8804124078b0 ffff8803ee6d0470
[40200.452174] Call Trace:
[40200.452190] [] ? spl_slab_reclaim+0x335/0x440 [spl]
[40200.452192] [] schedule+0x3f/0x60
[40200.452195] [] __mutex_lock_slowpath+0xd7/0x150
[40200.452197] [] ? wake_up_bit+0x2e/0x40
[40200.452199] [] mutex_lock+0x2a/0x50
[40200.452223] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[40200.452226] [] prune_super+0x74/0x1a0
[40200.452228] [] shrink_slab+0xa9/0x300
[40200.452230] [] do_try_to_free_pages+0x268/0x2d0
[40200.452233] [] try_to_free_pages+0xab/0x170
[40200.452237] [] __alloc_pages_nodemask+0x59e/0x8f0
[40200.452240] [] alloc_pages_current+0xb6/0x120
[40200.452244] [] __vmalloc_area_node+0x103/0x1c0
[40200.452250] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[40200.452253] [] __vmalloc_node_range+0x86/0xd0
[40200.452258] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[40200.452261] [] __vmalloc_node+0x35/0x40
[40200.452266] [] ? kv_alloc.isra.9+0x59/0x60 [spl]
[40200.452268] [] __vmalloc+0x22/0x30
[40200.452274] [] kv_alloc.isra.9+0x59/0x60 [spl]
[40200.452280] [] spl_cache_grow_work+0x40/0x430 [spl]
[40200.452284] [] ? __wake_up+0x53/0x70
[40200.452291] [] taskq_thread+0x236/0x4b0 [spl]
[40200.452295] [] ? try_to_wake_up+0x200/0x200
[40200.452301] [] ? task_done+0x160/0x160 [spl]
[40200.452303] [] kthread+0x8c/0xa0
[40200.452306] [] kernel_thread_helper+0x4/0x10
[40200.452308] [] ? flush_kthread_worker+0xa0/0xa0
[40200.452311] [] ? gs_change+0x13/0x13
[40200.452334] INFO: task nfsd:2362 blocked for more than 120 seconds.
[40200.452335] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[40200.452337] nfsd D ffffffff81806240 0 2362 2 0x00000000
[40200.452339] ffff8803f8aed960 0000000000000046 ffff880058338a08 ffff88005543d300
[40200.452342] ffff8803f8aedfd8 ffff8803f8aedfd8 ffff8803f8aedfd8 00000000000137c0
[40200.452345] ffffffff81c0d020 ffff880402981700 000000000007aff8 ffff8803ee6d0470
[40200.452348] Call Trace:
[40200.452350] [] schedule+0x3f/0x60
[40200.452352] [] __mutex_lock_slowpath+0xd7/0x150
[40200.452355] [] mutex_lock+0x2a/0x50
[40200.452379] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[40200.452382] [] prune_super+0x74/0x1a0
[40200.452384] [] shrink_slab+0xa9/0x300
[40200.452387] [] do_try_to_free_pages+0x268/0x2d0
[40200.452389] [] try_to_free_pages+0xab/0x170
[40200.452392] [] __alloc_pages_nodemask+0x59e/0x8f0
[40200.452395] [] alloc_pages_current+0xb6/0x120
[40200.452423] [] svc_recv+0xdc/0x740 [sunrpc]
[40200.452432] [] ? svc_xprt_put+0x19/0x20 [sunrpc]
[40200.452441] [] ? svc_xprt_release+0xad/0xe0 [sunrpc]
[40200.452444] [] ? try_to_wake_up+0x200/0x200
[40200.452458] [] nfsd+0xa5/0x160 [nfsd]
[40200.452462] [] ? nfsd_startup+0xf0/0xf0 [nfsd]
[40200.452465] [] kthread+0x8c/0xa0
[40200.452467] [] kernel_thread_helper+0x4/0x10
[40200.452470] [] ? flush_kthread_worker+0xa0/0xa0
[40200.452472] [] ? gs_change+0x13/0x13
[40200.452474] INFO: task nfsd:2363 blocked for more than 120 seconds.
[40200.452475] "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
[40200.452477] nfsd D ffffffff81806240 0 2363 2 0x00000000
[40200.452479] ffff8803f8bef960 0000000000000046 ffff880265966ac8 ffff88025fca13c0
[40200.452482] ffff8803f8beffd8 ffff8803f8beffd8 ffff8803f8beffd8 00000000000137c0
[40200.452485] ffff880415561700 ffff8803fdc9ae00 ffff8803f8bef940 ffff8803ee6d0470
[40200.452487] Call Trace:
[40200.452490] [] schedule+0x3f/0x60
[40200.452492] [] __mutex_lock_slowpath+0xd7/0x150
[40200.452494] [] mutex_lock+0x2a/0x50
[40200.452518] [] zpl_nr_cached_objects+0x2c/0x50 [zfs]
[40200.452521] [] prune_super+0x74/0x1a0
[40200.452523] [] shrink_slab+0xa9/0x300
[40200.452526] [] do_try_to_free_pages+0x268/0x2d0
[40200.452528] [] try_to_free_pages+0xab/0x170
[40200.452531] [] __alloc_pages_nodemask+0x59e/0x8f0
[40200.452533] [] ? free_pcppages_bulk+0x2b3/0x3f0
[40200.452536] [] alloc_pages_current+0xb6/0x120
[40200.452545] [] svc_recv+0xdc/0x740 [sunrpc]
[40200.452554] [] ? svc_xprt_put+0x19/0x20 [sunrpc]
[40200.452563] [] ? svc_xprt_release+0xad/0xe0 [sunrpc]
[40200.452566] [] ? try_to_wake_up+0x200/0x200
[40200.452571] [] nfsd+0xa5/0x160 [nfsd]
[40200.452575] [] ? nfsd_startup+0xf0/0xf0 [nfsd]
[40200.452577] [] kthread+0x8c/0xa0
[40200.452580] [] kernel_thread_helper+0x4/0x10
[40200.452583] [] ? flush_kthread_worker+0xa0/0xa0
[40200.452585] [] ? gs_change+0x13/0x13

@ldonzis
Copy link
Author

ldonzis commented Jul 25, 2014

After reboot, everything looked fine on both systems, and both had the same set of snapshots. However, as soon as I started the zfs send from the primary to the backup, the target filesystem was destroyed. Here's the stdout from the sending side:

send from @replication-2014-07-25-09:11:03.238718041 to ds/ds2@replication-2014-07-25-09:11:25.722351302 estimated size is 0
send from @replication-2014-07-25-09:11:03.238718041 to ds/ds2/backups@replication-2014-07-25-09:11:25.722351302 estimated size is 0
send from @replication-2014-07-25-09:11:03.238718041 to ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302 estimated size is 97.5M
total estimated size is 97.5M
TIME SENT SNAPSHOT
TIME SENT SNAPSHOT
TIME SENT SNAPSHOT
09:11:27 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:28 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:29 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:30 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:31 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:32 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:33 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:34 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:35 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:36 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:37 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
09:11:38 146K ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302
cannot receive incremental stream: destination 'ds/ds2/vm2' does not exist
warning: cannot send 'ds/ds2/vm2@replication-2014-07-25-09:11:25.722351302': Broken pipe

at that point ds/ds2/vm2 on the target system was destroyed.

@dweeezil
Copy link
Contributor

@ldonzis Could you please post the output of zpool history -i.

@ldonzis
Copy link
Author

ldonzis commented Jul 25, 2014

Certainly, however it's more than 280MB, is that ok to post? Or I could put it on an FTP server.

Here are the entries up to where it hung this morning, followed by the reboot and restarting of the sync operation (ds1 and ds2 are running separate/independent zfs send/recv operations).

2014-07-25.07:05:25 zfs receive -F ds/ds2
2014-07-25.07:05:25 [txg:3702108] destroy ds/ds2@replication-2014-07-25-07:05:17.661563677 (596930)
2014-07-25.07:05:25 [txg:3702109] destroy ds/ds2/backups@replication-2014-07-25-07:05:17.661563677 (596941)
2014-07-25.07:05:25 [txg:3702110] destroy ds/ds2/vm2@replication-2014-07-25-07:05:17.661563677 (596952)
2014-07-25.07:05:25 [txg:3702111] destroy ds/ds1@replication-2014-07-25-07:04:29.736428302 (596506)
2014-07-25.07:05:26 [txg:3702112] receive ds/ds2/%recv (596999)
2014-07-25.07:05:26 [txg:3702113] finish receiving ds/ds2/%recv (596999) snap=replication-2014-07-25-07:05:24.784125694
2014-07-25.07:05:26 [txg:3702113] clone swap ds/ds2/%recv (596999) parent=ds2
2014-07-25.07:05:26 [txg:3702113] snapshot ds/ds2@replication-2014-07-25-07:05:24.784125694 (597004)
2014-07-25.07:05:26 [txg:3702113] destroy ds/ds2/%recv (596999)
2014-07-25.07:05:26 [txg:3702114] receive ds/ds2/backups/%recv (597010)
2014-07-25.07:05:26 [txg:3702115] destroy ds/ds1/vm@replication-2014-07-25-07:04:29.736428302 (596588)
2014-07-25.07:05:26 [txg:3702116] finish receiving ds/ds2/backups/%recv (597010) snap=replication-2014-07-25-07:05:24.784125694
2014-07-25.07:05:26 [txg:3702116] clone swap ds/ds2/backups/%recv (597010) parent=backups
2014-07-25.07:05:26 [txg:3702116] snapshot ds/ds2/backups@replication-2014-07-25-07:05:24.784125694 (597022)
2014-07-25.07:05:26 [txg:3702116] destroy ds/ds2/backups/%recv (597010)
2014-07-25.07:05:26 [txg:3702117] receive ds/ds1/%recv (597028)
2014-07-25.09:02:21 [txg:3702120] open pool version 5000; software version 5000/5; uts ss2 3.2.0-67-generic #101-Ubuntu SMP Tue Jul 15 17:46:11 UTC 2014 x86_64
2014-07-25.09:02:21 [txg:3702120] destroy ds/ds1/%recv (597028)
2014-07-25.09:10:47 [txg:3702222] destroy ds/ds2@replication-2014-07-25-07:05:21.343329018 (596963)
2014-07-25.09:10:48 [txg:3702223] destroy ds/ds2/backups@replication-2014-07-25-07:05:21.343329018 (596975)
2014-07-25.09:10:48 [txg:3702224] destroy ds/ds2/vm2@replication-2014-07-25-07:05:21.343329018 (596988)
2014-07-25.09:10:48 [txg:3702228] destroy ds/ds2/vm2 (538804)
2014-07-25.09:10:50 [txg:3702230] receive ds/ds2/%recv (593943)
2014-07-25.09:10:50 [txg:3702230] receive ds/ds1/%recv (593954)
2014-07-25.09:10:53 [txg:3702232] finish receiving ds/ds2/%recv (593943) snap=replication-2014-07-25-09:10:47.919816498
2014-07-25.09:10:53 [txg:3702232] clone swap ds/ds2/%recv (593943) parent=ds2
2014-07-25.09:10:53 [txg:3702232] snapshot ds/ds2@replication-2014-07-25-09:10:47.919816498 (593966)
2014-07-25.09:10:53 [txg:3702232] destroy ds/ds2/%recv (593943)
2014-07-25.09:10:56 [txg:3702234] receive ds/ds2/backups/%recv (593977)
2014-07-25.09:11:01 [txg:3702236] finish receiving ds/ds2/backups/%recv (593977) snap=replication-2014-07-25-09:10:47.919816498
2014-07-25.09:11:01 [txg:3702236] clone swap ds/ds2/backups/%recv (593977) parent=backups
2014-07-25.09:11:01 [txg:3702236] snapshot ds/ds2/backups@replication-2014-07-25-09:10:47.919816498 (593983)
2014-07-25.09:11:01 [txg:3702236] destroy ds/ds2/backups/%recv (593977)
2014-07-25.09:11:06 [txg:3702238] destroy ds/ds2@replication-2014-07-25-07:05:24.784125694 (597004)
2014-07-25.09:11:11 [txg:3702240] destroy ds/ds2/backups@replication-2014-07-25-07:05:24.784125694 (597022)
2014-07-25.09:11:16 [txg:3702242] receive ds/ds2/%recv (593990)
2014-07-25.09:11:19 [txg:3702244] finish receiving ds/ds2/%recv (593990) snap=replication-2014-07-25-09:11:03.238718041
2014-07-25.09:11:19 [txg:3702244] clone swap ds/ds2/%recv (593990) parent=ds2
2014-07-25.09:11:19 [txg:3702244] snapshot ds/ds2@replication-2014-07-25-09:11:03.238718041 (593998)
2014-07-25.09:11:19 [txg:3702244] destroy ds/ds2/%recv (593990)
2014-07-25.09:11:19 [txg:3702244] finish receiving ds/ds1/%recv (593954) snap=replication-2014-07-25-09:10:48.551690737

@behlendorf
Copy link
Contributor

@ldonzis It appears you've uncovered an unlikely buy certainly possible deadlock which can occur in the memory reclaim code. If you're comfortable rebuilding from source I strongly suspect you'll be able to resolve the issue by undefining HAVE_NR_CACHED_OBJECTS and rebuilding from source. You can go about this a few ways.

  • After the ./configure step (but before make) manually comment it out of the zfs_config.h header.
--- zfs_config.h~   2014-07-18 12:31:30.130388876 -0700
+++ zfs_config.h    2014-07-25 11:11:27.309866627 -0700
@@ -269,7 +269,7 @@
 #define HAVE_MOUNT_NODEV 1

 /* sops->nr_cached_objects() exists */
-#define HAVE_NR_CACHED_OBJECTS 1
+/* #undef HAVE_NR_CACHED_OBJECTS */

 /* open_bdev_exclusive() is available */
 /* #undef HAVE_OPEN_BDEV_EXCLUSIVE */
  • or, disabled it at the top of module/zfs/zpl_super.c directly like this.
diff --git a/module/zfs/zpl_super.c b/module/zfs/zpl_super.c
index 45639a6..14693a9 100644
--- a/module/zfs/zpl_super.c
+++ b/module/zfs/zpl_super.c
@@ -29,6 +29,10 @@
 #include <sys/zfs_ctldir.h>
 #include <sys/zpl.h>

+#ifdef HAVE_NR_CACHED_OBJECTS
+#undef HAVE_NR_CACHED_OBJECTS
+#endif
+

 static struct inode *
 zpl_inode_alloc(struct super_block *sb)

We'll look in to a proper long term fix for this.

@behlendorf behlendorf modified the milestones: 0.6.4, 0.7.0 Jul 25, 2014
@ldonzis
Copy link
Author

ldonzis commented Jul 25, 2014

At the moment, it's installed on Ubuntu via apt-get from the repository, so I'll read through the instructions on how to build it from the source.

Just out of cusiority, is this something recent? My sense is that it was very stable on 0.6.2. The thing with disappearing filesystems still occurred, but it didn't hang, or at least it doesn't seem like it did.

Thanks!

@ldonzis
Copy link
Author

ldonzis commented Jul 25, 2014

ok, I modified the source and recompiled it, so we'll see how it goes.

Sounds like there are two unrelated things going on, though, one was that it could hang, and the other is zfs send/recv destroying the filesystem.

Thanks,
lew

@behlendorf
Copy link
Contributor

@ldonzis The HAVE_NR_CACHED_OBJECTS support isn't recent. But the lockup is pretty subtle so it was likely accidentally introduced by some refactoring in 0.6.3. That's why we test but still unlikely cases like this can be missed . You're the first to report this and 0.6.3 has been out since Jun 12th.

@ldonzis
Copy link
Author

ldonzis commented Jul 25, 2014

Here's some news... I can reproduce the "lost filesystem" thing pretty much at will. I created two VMs and found that if I reboot the target while it's in the middle of a receive, then a subsequent receive destroys one or more filesystems.

I think at this point, I could produce a couple of scripts that would demonstrate it pretty readily, and/or package up the VMs themselves if that's easier.

@behlendorf
Copy link
Contributor

@ldonzis if you could provide a script as a test case that would be ideal.

@dweeezil
Copy link
Contributor

@ldonzis Are the file systems mounted on the receiving end? If so, does the problem happen when receiving if they're not mounted?

@ldonzis
Copy link
Author

ldonzis commented Jul 26, 2014

Here's a script that appears to demonstrate the problem pretty reliably. It works by creating a filesystem with several descendent filesytems, replicating that to a destination system, and then looping on sending incremental streams. If you reboot the destination host while the send/receive is in progress, then after the destination host reboots, when the source host attempts the next transfer, that's when the filesystem(s) get destroyed.

(You can also reproduce the problem without rebooting the destination. While the send/receive is in process, just Ctrl-Z to suspend the process and kill %1 to terminate the job.)

After reducing this to a simpler test, this whole problem may be a simple matter of sending an incremental stream to a filesystem which doesn't have the origin snapshot.

It appears that the entire send/receive is not completely atomic, i.e., when sending an incremental recursive stream, as each filesystem receives its part of the stream, it gets updated with the latest snapshot right away. So if the process is interrupted somehow (by rebooting or the system crashing etc.), the destination system has the latest snapshot on some of the descendent filesystems but not all of them. If a new incremental send/receive is then attempted, the filesystems that don't have the origin snapshot get destroyed entirely. This is exacerbated somewhat by the fact that "zfs send" doesn't exit with a non-zero return code when it gets an error, otherwise the script would have aborted.

This non-atomic behavior is slightly unfortunate outside of this exercise, and I don't know whether it's proper since I’ve never seen documentation that guarantees an all-or-nothing result. If you gracefully abort a send/receive operation, then it does appear to back out all of the work that had been done, but that doesn’t help with the unexpected case of a machine rebooting.

Interestingly, if the source system still has the older snapshot that does exist on all descendent filesystems at the destination, then creating a third snapshot and attempting to send an incremental stream from the second to the third (where the second doesn’t exist on all descendent filesystems), results in an error:

cannot receive incremental stream: most recent snapshot of tank/fs2 does not match incremental source

However, if the first snapshot is destroyed, then we see the problem of it destroying the filesystem on the destination:

cannot receive incremental stream: destination 'tank/fs2' does not exist

For example, if we have:

source:
tank
tank@snap1
tank@snap2
tank@snap3
tank/fs1
tank/fs1@snap1
tank/fs1@snap2
tank/fs1@snap3
tank/fs2
tank/fs2@snap1
tank/fs2@snap2
tank/fs2@snap3

destination:
tank
tank@snap1
tank@snap2
tank/fs1
tank/fs1@snap1
tank/fs1@snap2
tank/fs2
tank/fs2@snap1

(notice that tank/fs2@snap2 does not exist because a previous receive was aborted ungracefully.)

Now if we attempt “zfs send -R -I tank@snap2 tank@snap3 | ssh xxx zfs receive -F tank”, then we’ll get an error that the destination doesn’t have the proper origin snapshot, and no harm is done.

However, if we “zfs destroy -r tank@snap1”, and then attempt the same transfer, then tank/ds2 on the destination will get destroyed.

At least that’s my observation.

Here's the script:

#!/bin/bash

ssh=rsh
dst=206.210.221.103

echo "Creating local pool and filesystems"

Initialize local stuff

if ! zpool destroy tank 2>/dev/null ||
! zpool create tank sdb ||
! zfs create tank/fs1 ||
! zfs create tank/fs2 ||
! zfs create tank/fs3 ||
! zfs create tank/fs4 ||
! zfs create tank/fs5; then
exit 1
fi

Initialize the remote side

echo "Creating remote pool and filesystems"
if ! $ssh $dst zpool destroy tank 2>/dev/null ||
! $ssh $dst zpool create tank sdb ||
! $ssh $dst zfs create tank/fs1 ||
! $ssh $dst zfs create tank/fs2 ||
! $ssh $dst zfs create tank/fs3 ||
! $ssh $dst zfs create tank/fs4 ||
! $ssh $dst zfs create tank/fs5; then
exit 1
fi

If we want the remote un-mounted

#if ! $ssh $dst zfs umount -a; then exit 1; fi

while true; do
echo "Writing data"
dd if=/dev/zero of=/tank/zeroes bs=1024k count=100
dd if=/dev/zero of=/tank/fs1/zeroes bs=1024k count=100
dd if=/dev/zero of=/tank/fs2/zeroes bs=1024k count=100
dd if=/dev/zero of=/tank/fs3/zeroes bs=1024k count=100
dd if=/dev/zero of=/tank/fs4/zeroes bs=1024k count=100
dd if=/dev/zero of=/tank/fs5/zeroes bs=1024k count=100
new="$(date +%F-%T.%N)"
echo "Creating snapshot $new"
if ! zfs snapshot -r tank@$new; then break; fi
if [ -n "$old" ]; then
echo "Sending $old -> $new"
if ! zfs send -v -R -I "tank@$old" "tank@$new" | $ssh $dst zfs receive -F tank; then break; fi
echo "Removing old snapshot $old"
zfs destroy -r "tank@$old"
else
echo "Sending initial sync to $new"
if ! zfs send -v -R "tank@$new" | $ssh $dst zfs receive -F tank; then break; fi
fi
old="$new"
done

@ldonzis
Copy link
Author

ldonzis commented Jul 26, 2014

To dweeezil's question, it appears to make no difference whether the destination filesystems are mounted or not.

@ldonzis
Copy link
Author

ldonzis commented Jul 26, 2014

And I stand corrected, it appears that just Ctrl-C'ing the send leaves the destination with some filesystems updated to the latest snapshot and some not, so it's not inconsistent at all.

It just seems like if you attempt to send an incremental stream and the destination doesn't have the origin snapshot, it shouldn't just destroy the filesystem? I note the comment about receive -F which states that snapshots and filesystems that don't exist at the source are destroyed at the destination, but this doesn't seem to match that case. In this case, it's destroying a filesystem that does exist on the source, but the destination doesn't have the origin snapshot.

@dweeezil
Copy link
Contributor

@ldonzis In my testing, the problem seems to happen when a file system on the receiving end has no snapshots at all. Apparently if the "does not have fromsnap" error is never triggered, the destination file system is deleted even if it exists on the source. This behavior matches your example above in which you did zfs destroy -r tank@snap1, destroying the last snapshot on tank/fs2.

As to your question of atomicity: a complete zfs -R operation is definitely not atomic. All the various sub-parts of the work are done in user space (by the zfs program). The reception of a single send stream, however, is atomic.

@ldonzis
Copy link
Author

ldonzis commented Jul 26, 2014

Ah, that's a very good point... certainly an even simpler test case!

I presume, by the way, that this is not "good" behavior, i.e., this isn't my fault for attempting to do such a thing.

I think I understand what you're saying about atomicity... that for a given filesystem, a send/receive is guaranteed atomic, but when you are using -R with descendent filesystems, the entire operation is not atomic. That all sounds logical, and is pretty much what I had suspected. I guess it would be super cool if the whole operation could guaranteed atomic, because then my scripts wouldn't have to consider a partially completed transfer. But anyway, it's not that big of a deal, as long as we understand it.

Thanks!

@dweeezil
Copy link
Contributor

I'd say this behavior is at best, a violation of the POLA, and at worst, an outright bug. This problem, of course, doesn't have anything to do with the deadlock condition you discovered.

I suppose this should be split into two issues: one for the "zfs receive -F erroneously destroys file systems" problem and the other for the deadlock.

@ldonzis
Copy link
Author

ldonzis commented Jul 27, 2014

lol. Yes, in retrospect, the two obviously have nothing to do with each other. Brian already commented about working around the deadlock, so we're running with that change applied.

I am curious as to whether this is specifically a ZFS on Linux issue or if it would happen on any port. If it's a general ZFS issue, I'd be especially "astonished". (Mainly because ZFS is just so fantastically cool and practically always does what you expect.)

Another (third) possible issue is that which I mentioned at the very beginning of the thread, which is that there was apparently some way for a filesystem to be destroyed such that "zfs create" was rendered inop. It didn't give any error, it's just that the filesystem wasn't created, and if you tried to destroy it, it said it didn't exist. However, I haven't seen this behavior in any of the recent testing with 0.6.3, so maybe it's no longer a problem.

@behlendorf behlendorf removed this from the 0.6.4 milestone Oct 30, 2014
@behlendorf
Copy link
Contributor

Closing as out of date.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants