Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recovery import (zpool import -F) issue with encrypted pool #6916

Closed
phreaker0 opened this issue Dec 1, 2017 · 16 comments · Fixed by #7163
Closed

recovery import (zpool import -F) issue with encrypted pool #6916

phreaker0 opened this issue Dec 1, 2017 · 16 comments · Fixed by #7163
Assignees
Milestone

Comments

@phreaker0
Copy link

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 16.04.3 LTS
Linux Kernel 4.14.0-pf4+
Architecture x86_64
ZFS Version v0.7.0-198_g56d8d8a
SPL Version v0.7.0-21_ged19bcc

Describe the problem you're observing

I'm using an ZFS pool on an external USB disk for backup and also for data. Everything
is encrypted by using a top level dataset with activated encryption (aes-256-gcm). Sometimes
the usb disk disconnects (because the cable loosens, ...), in that case the pool i/o will be suspended. After reconnecting the drive I can resume the pool by some zpool command (i think I always uses the zpool clear one). Until the last time it disconnected, i reconnected the drive and uses the command again to resume pool i/o, but it just hanged. After a day or so I rebooted the system (all the zfs/zpool commands were stuck at this point).
I tried to import the zfs pool but it couldn't because of corrupted metadata and suggested recovery import to rollback the last transactions. Attempting to do so freezes the import command and nothing is happening after some panics in the log. (I tried with the zfs_recover =1 option)

What worked is importing the pool in read only mount with recovery: zpool import -F -N -o readonly=on ext
I don't care about the broken pool, as I could send all the snapshots of the data in readonly mode to another pool and most of the backup data too (one dataset snapshot send/recv failed to complete, I don't know why). But maybe there is an issue with recovery code in case of encryption?

I always use the latest kernel and zfs master version on this machine (root filesystem is also on an ZFS SSD pool)

Describe how to reproduce the problem

not easily possible I guess

Include any warning/errors/backtraces from the system logs

root@token:~# zpool import
   pool: ext
     id: 15081949372965998973
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

        ext                                      FAULTED  corrupted data
          usb-ORICO_6518US3_0123456789ABCDE-0:0  ONLINE
root@token:~# zpool import ext
cannot import 'ext': I/O error
        Recovery is possible, but will result in some data loss.
        Returning the pool to its state as of Mon 27 Nov 2017 09:05:11 CET
        should correct the problem.  Approximately 61 seconds of data
        must be discarded, irreversibly.  Recovery can be attempted
        by executing 'zpool import -F ext'.  A scrub of the pool
        is strongly recommended after recovery.
root@token:~# zpool import -N -F ext

Nov 30 11:48:25 token kernel: SPL: Loaded module v0.7.0-21_ged19bcc
Nov 30 11:48:26 token kernel: ZFS: Loaded module v0.7.0-198_g56d8d8a, ZFS pool version 5000, ZFS filesystem version 5
Nov 30 11:49:06 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:06 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:06 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:06 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:48 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:48 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:48 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:48 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Nov 30 11:49:53 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, zio->io_bookmark.zb_objset, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Nov 30 11:49:53 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, zio->io_bookmark.zb_objset, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Nov 30 11:49:53 token kernel: PANIC at zio.c:3705:zio_encrypt()
Nov 30 11:49:53 token kernel: Showing stack for process 3705
Nov 30 11:49:53 token kernel: CPU: 0 PID: 3705 Comm: z_wr_iss Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:49:53 token kernel: PANIC at zio.c:3705:zio_encrypt()
Nov 30 11:49:53 token kernel: Hardware name: Acer Aspire V5-572G/Dazzle_CX , BIOS V2.07 07/12/2013
Nov 30 11:49:53 token kernel: Call Trace:
Nov 30 11:49:53 token kernel: Showing stack for process 3704
Nov 30 11:49:53 token kernel:  dump_stack+0x63/0x8b
Nov 30 11:49:53 token kernel:  spl_dumpstack+0x42/0x50 [spl]
Nov 30 11:49:53 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, zio->io_bookmark.zb_objset, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Nov 30 11:49:53 token kernel: PANIC at zio.c:3705:zio_encrypt()
Nov 30 11:49:53 token kernel: Showing stack for process 3703
Nov 30 11:49:53 token kernel:  spl_panic+0xc8/0x110 [spl]
Nov 30 11:49:53 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Nov 30 11:49:53 token kernel:  zio_encrypt+0x5d0/0x6b0 [zfs]
Nov 30 11:49:53 token kernel:  ? zio_buf_free+0x58/0x60 [zfs]
Nov 30 11:49:53 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Nov 30 11:49:53 token kernel:  ? __schedule+0x3ca/0x890
Nov 30 11:49:53 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Nov 30 11:49:53 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Nov 30 11:49:53 token kernel:  ? wake_up_q+0x70/0x70
Nov 30 11:49:53 token kernel:  ? zio_reexecute+0x3d0/0x3d0 [zfs]
Nov 30 11:49:53 token kernel:  kthread+0x108/0x140
Nov 30 11:49:53 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 30 11:49:53 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:49:53 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:49:53 token kernel:  ret_from_fork+0x25/0x30
Nov 30 11:49:53 token kernel: CPU: 1 PID: 3704 Comm: z_wr_iss Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:49:53 token kernel: Hardware name: Acer Aspire V5-572G/Dazzle_CX , BIOS V2.07 07/12/2013
Nov 30 11:49:53 token kernel: Call Trace:
Nov 30 11:49:53 token kernel:  dump_stack+0x63/0x8b
Nov 30 11:49:53 token kernel:  spl_dumpstack+0x42/0x50 [spl]
Nov 30 11:49:53 token kernel:  spl_panic+0xc8/0x110 [spl]
Nov 30 11:49:53 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Nov 30 11:49:53 token kernel:  zio_encrypt+0x5d0/0x6b0 [zfs]
Nov 30 11:49:53 token kernel:  ? zio_buf_free+0x58/0x60 [zfs]
Nov 30 11:49:53 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Nov 30 11:49:53 token kernel:  ? __schedule+0x3ca/0x890
Nov 30 11:49:53 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Nov 30 11:49:53 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Nov 30 11:49:53 token kernel:  ? wake_up_q+0x70/0x70
Nov 30 11:49:53 token kernel:  ? zio_reexecute+0x3d0/0x3d0 [zfs]
Nov 30 11:49:53 token kernel:  kthread+0x108/0x140
Nov 30 11:49:53 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 30 11:49:53 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:49:53 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:49:53 token kernel:  ret_from_fork+0x25/0x30
Nov 30 11:49:53 token kernel: CPU: 2 PID: 3703 Comm: z_wr_iss Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:49:53 token kernel: Hardware name: Acer Aspire V5-572G/Dazzle_CX , BIOS V2.07 07/12/2013
Nov 30 11:49:53 token kernel: Call Trace:
Nov 30 11:49:53 token kernel:  dump_stack+0x63/0x8b
Nov 30 11:49:53 token kernel:  spl_dumpstack+0x42/0x50 [spl]
Nov 30 11:49:53 token kernel:  spl_panic+0xc8/0x110 [spl]
Nov 30 11:49:53 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Nov 30 11:49:53 token kernel:  zio_encrypt+0x5d0/0x6b0 [zfs]
Nov 30 11:49:53 token kernel:  ? zio_buf_free+0x58/0x60 [zfs]
Nov 30 11:49:53 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Nov 30 11:49:53 token kernel:  ? __schedule+0x3ca/0x890
Nov 30 11:49:53 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Nov 30 11:49:53 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Nov 30 11:49:53 token kernel:  ? wake_up_q+0x70/0x70
Nov 30 11:49:53 token kernel:  ? zio_reexecute+0x3d0/0x3d0 [zfs]
Nov 30 11:49:53 token kernel:  kthread+0x108/0x140
Nov 30 11:49:53 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 30 11:49:53 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:49:53 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:49:53 token kernel:  ret_from_fork+0x25/0x30
Nov 30 11:53:37 token kernel: INFO: task zpool:3532 blocked for more than 120 seconds.
Nov 30 11:53:37 token kernel:       Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:53:37 token kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 11:53:37 token kernel: zpool           D    0  3532   2766 0x00000000
Nov 30 11:53:37 token kernel: Call Trace:
Nov 30 11:53:37 token kernel:  __schedule+0x3c2/0x890
Nov 30 11:53:37 token kernel:  schedule+0x36/0x80
Nov 30 11:53:37 token kernel:  cv_wait_common+0x101/0x140 [spl]
Nov 30 11:53:37 token kernel:  ? wait_woken+0x80/0x80
Nov 30 11:53:37 token kernel:  __cv_wait+0x15/0x20 [spl]
Nov 30 11:53:37 token kernel:  txg_wait_synced+0xdc/0x130 [zfs]
Nov 30 11:53:37 token kernel:  spa_load+0x1920/0x20d0 [zfs]
Nov 30 11:53:37 token kernel:  spa_load_best+0x182/0x280 [zfs]
Nov 30 11:53:37 token kernel:  spa_import+0x208/0x740 [zfs]
Nov 30 11:53:37 token kernel:  zfs_ioc_pool_import+0x130/0x140 [zfs]
Nov 30 11:53:37 token kernel:  zfsdev_ioctl+0x590/0x620 [zfs]
Nov 30 11:53:37 token kernel:  do_vfs_ioctl+0xa1/0x5e0
Nov 30 11:53:37 token kernel:  ? putname+0x53/0x60
Nov 30 11:53:37 token kernel:  SyS_ioctl+0x79/0x90
Nov 30 11:53:37 token kernel:  entry_SYSCALL_64_fastpath+0x1e/0xa9
Nov 30 11:53:37 token kernel: RIP: 0033:0x7fdf0bcdcf07
Nov 30 11:53:37 token kernel: RSP: 002b:00007fff01d9fbe8 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
Nov 30 11:53:37 token kernel: RAX: ffffffffffffffda RBX: 00000000017e7420 RCX: 00007fdf0bcdcf07
Nov 30 11:53:37 token kernel: RDX: 00007fff01d9fc50 RSI: 0000000000005a02 RDI: 0000000000000003
Nov 30 11:53:37 token kernel: RBP: 0000000000000000 R08: 00000000017f6230 R09: 00000000017f4de0
Nov 30 11:53:37 token kernel: R10: 000000000000027a R11: 0000000000000246 R12: 0000000000000000
Nov 30 11:53:37 token kernel: R13: 000000000000002d R14: 00000000017e7910 R15: 00000000017e7420
Nov 30 11:53:37 token kernel: INFO: task z_wr_iss:3703 blocked for more than 120 seconds.
Nov 30 11:53:37 token kernel:       Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:53:37 token kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 11:53:37 token kernel: z_wr_iss        D    0  3703      2 0x80000000
Nov 30 11:53:37 token kernel: Call Trace:
Nov 30 11:53:37 token kernel:  __schedule+0x3c2/0x890
Nov 30 11:53:37 token kernel:  schedule+0x36/0x80
Nov 30 11:53:37 token kernel:  spl_panic+0xfa/0x110 [spl]
Nov 30 11:53:37 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Nov 30 11:53:37 token kernel:  zio_encrypt+0x5d0/0x6b0 [zfs]
Nov 30 11:53:37 token kernel:  ? zio_buf_free+0x58/0x60 [zfs]
Nov 30 11:53:37 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Nov 30 11:53:37 token kernel:  ? __schedule+0x3ca/0x890
Nov 30 11:53:37 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Nov 30 11:53:37 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Nov 30 11:53:37 token kernel:  ? wake_up_q+0x70/0x70
Nov 30 11:53:37 token kernel:  ? zio_reexecute+0x3d0/0x3d0 [zfs]
Nov 30 11:53:37 token kernel:  kthread+0x108/0x140
Nov 30 11:53:37 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ret_from_fork+0x25/0x30
Nov 30 11:53:37 token kernel: INFO: task z_wr_iss:3704 blocked for more than 120 seconds.
Nov 30 11:53:37 token kernel:       Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:53:37 token kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 11:53:37 token kernel: z_wr_iss        D    0  3704      2 0x80000000
Nov 30 11:53:37 token kernel: Call Trace:
Nov 30 11:53:37 token kernel:  __schedule+0x3c2/0x890
Nov 30 11:53:37 token kernel:  schedule+0x36/0x80
Nov 30 11:53:37 token kernel:  spl_panic+0xfa/0x110 [spl]
Nov 30 11:53:37 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Nov 30 11:53:37 token kernel:  zio_encrypt+0x5d0/0x6b0 [zfs]
Nov 30 11:53:37 token kernel:  ? zio_buf_free+0x58/0x60 [zfs]
Nov 30 11:53:37 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Nov 30 11:53:37 token kernel:  ? __schedule+0x3ca/0x890
Nov 30 11:53:37 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Nov 30 11:53:37 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Nov 30 11:53:37 token kernel:  ? wake_up_q+0x70/0x70
Nov 30 11:53:37 token kernel:  ? zio_reexecute+0x3d0/0x3d0 [zfs]
Nov 30 11:53:37 token kernel:  kthread+0x108/0x140
Nov 30 11:53:37 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ret_from_fork+0x25/0x30
Nov 30 11:53:37 token kernel: INFO: task z_wr_iss:3705 blocked for more than 120 seconds.
Nov 30 11:53:37 token kernel:       Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:53:37 token kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 11:53:37 token kernel: z_wr_iss        D    0  3705      2 0x80000000
Nov 30 11:53:37 token kernel: Call Trace:
Nov 30 11:53:37 token kernel:  __schedule+0x3c2/0x890
Nov 30 11:53:37 token kernel:  schedule+0x36/0x80
Nov 30 11:53:37 token kernel:  spl_panic+0xfa/0x110 [spl]
Nov 30 11:53:37 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Nov 30 11:53:37 token kernel:  zio_encrypt+0x5d0/0x6b0 [zfs]
Nov 30 11:53:37 token kernel:  ? zio_buf_free+0x58/0x60 [zfs]
Nov 30 11:53:37 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Nov 30 11:53:37 token kernel:  ? __schedule+0x3ca/0x890
Nov 30 11:53:37 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Nov 30 11:53:37 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Nov 30 11:53:37 token kernel:  ? wake_up_q+0x70/0x70
Nov 30 11:53:37 token kernel:  ? zio_reexecute+0x3d0/0x3d0 [zfs]
Nov 30 11:53:37 token kernel:  kthread+0x108/0x140
Nov 30 11:53:37 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ret_from_fork+0x25/0x30
Nov 30 11:53:37 token kernel: INFO: task txg_sync:3850 blocked for more than 120 seconds.
Nov 30 11:53:37 token kernel:       Tainted: P           OE   4.14.0-pf4+ #1
Nov 30 11:53:37 token kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
Nov 30 11:53:37 token kernel: txg_sync        D    0  3850      2 0x80000000
Nov 30 11:53:37 token kernel: Call Trace:
Nov 30 11:53:37 token kernel:  __schedule+0x3c2/0x890
Nov 30 11:53:37 token kernel:  ? zio_nowait+0xb6/0x150 [zfs]
Nov 30 11:53:37 token kernel:  schedule+0x36/0x80
Nov 30 11:53:37 token kernel:  io_schedule+0x16/0x40
Nov 30 11:53:37 token kernel:  cv_wait_common+0xb2/0x140 [spl]
Nov 30 11:53:37 token kernel:  ? wait_woken+0x80/0x80
Nov 30 11:53:37 token kernel:  __cv_wait_io+0x18/0x20 [spl]
Nov 30 11:53:37 token kernel:  zio_wait+0xfd/0x1b0 [zfs]
Nov 30 11:53:37 token kernel:  dsl_pool_sync+0x1d1/0x420 [zfs]
Nov 30 11:53:37 token kernel:  spa_sync+0x41d/0xda0 [zfs]
Nov 30 11:53:37 token kernel:  txg_sync_thread+0x2d4/0x4a0 [zfs]
Nov 30 11:53:37 token kernel:  ? txg_delay+0x170/0x170 [zfs]
Nov 30 11:53:37 token kernel:  thread_generic_wrapper+0x72/0x80 [spl]
Nov 30 11:53:37 token kernel:  kthread+0x108/0x140
Nov 30 11:53:37 token kernel:  ? __thread_exit+0x20/0x20 [spl]
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ? kthread_create_on_node+0x70/0x70
Nov 30 11:53:37 token kernel:  ret_from_fork+0x25/0x30
@tcaputi
Copy link
Contributor

tcaputi commented Dec 3, 2017

I'll try to get a USB drive on Monday and look into this. The error here is that the code is attempting to encrypt new data before you've loaded your keys. The zpool import -F code must be attempting to write some data somewhere as part of the recovery process. I'll see if I can figure out what's going on.

@tcaputi
Copy link
Contributor

tcaputi commented Dec 4, 2017

I think I know what's going on. This should be a simple 1 line fix. Unfortunately, the problem is very hard for me to reproduce and I have not been able to test if it works yet. I will add it to #6864

@phreaker0
Copy link
Author

That sounds good, thx for your work. I still have the original pool and also a dd backup of it, so I can test any patch you provide.

@tcaputi
Copy link
Contributor

tcaputi commented Dec 4, 2017

If you have it and have some time to test it: https://pastebin.com/wBJV59a8

@phreaker0
Copy link
Author

I will, but the disk is at work so it takes till tomorrow.

@phreaker0
Copy link
Author

Unfortunately I get the same Panic again with the patch applied:

Dez 05 09:19:53 token kernel: SPL: Loaded module v0.7.0-21_ged19bcc
Dez 05 09:19:53 token kernel: icp: module license 'CDDL' taints kernel.
Dez 05 09:19:53 token kernel: Disabling lock debugging due to kernel taint
Dez 05 09:19:54 token kernel: ZFS: Loaded module v0.7.0-204_g7b34070, ZFS pool version 5000, ZFS filesystem version 5
Dez 05 09:21:01 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Dez 05 09:21:08 token kernel: ZFS: Unable to set "noop" scheduler for /dev/disk/by-id/usb-ORICO_6518US3_0123456789ABCDE-0:0-part1 (sdb): 256
Dez 05 09:22:20 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, zio->io_bookmark.zb_objset, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Dez 05 09:22:20 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, zio->io_bookmark.zb_objset, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Dez 05 09:22:20 token kernel: PANIC at zio.c:3705:zio_encrypt()
Dez 05 09:22:20 token kernel: Showing stack for process 3575
Dez 05 09:22:20 token kernel: CPU: 2 PID: 3575 Comm: z_wr_iss Tainted: P           OE   4.14.0-pf5+ #1
Dez 05 09:22:20 token kernel: Hardware name: Acer Aspire V5-572G/Dazzle_CX , BIOS V2.07 07/12/2013
Dez 05 09:22:20 token kernel: PANIC at zio.c:3705:zio_encrypt()
Dez 05 09:22:20 token kernel: Call Trace:
Dez 05 09:22:20 token kernel: Showing stack for process 3574
Dez 05 09:22:20 token kernel:  dump_stack+0x63/0x8b
Dez 05 09:22:20 token kernel:  spl_dumpstack+0x42/0x50 [spl]
Dez 05 09:22:20 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, zio->io_bookmark.zb_objset, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Dez 05 09:22:20 token kernel: PANIC at zio.c:3705:zio_encrypt()
Dez 05 09:22:20 token kernel: Showing stack for process 3576
Dez 05 09:22:20 token kernel:  spl_panic+0xc8/0x110 [spl]
Dez 05 09:22:20 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Dez 05 09:22:20 token kernel:  zio_encrypt+0x5d0/0x6b0 [zfs]
Dez 05 09:22:20 token kernel:  ? zio_buf_free+0x58/0x60 [zfs]
Dez 05 09:22:20 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Dez 05 09:22:20 token kernel:  ? __schedule+0x3ca/0x890
Dez 05 09:22:20 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Dez 05 09:22:20 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Dez 05 09:22:20 token kernel:  ? wake_up_q+0x70/0x70
Dez 05 09:22:20 token kernel:  ? zio_reexecute+0x3d0/0x3d0 [zfs]
Dez 05 09:22:20 token kernel:  kthread+0x108/0x140
Dez 05 09:22:20 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Dez 05 09:22:20 token kernel:  ? kthread_create_on_node+0x70/0x70
Dez 05 09:22:20 token kernel:  ret_from_fork+0x25/0x30
...

@tcaputi
Copy link
Contributor

tcaputi commented Dec 5, 2017

Hmmmm. And you're sure that you are using the new module? That's unfortunate. I was really pretty sure that this would have fixed the problem..... I'll try to come up with some steps to reproduce it.

@phreaker0
Copy link
Author

I just recompiled to make sure I was, but same result. I'm using the DKMS method and I verified that the
source for dkms (/usr/src/zfs-0.7.0/module/zfs/zil.c) contains the patch.

@tcaputi
Copy link
Contributor

tcaputi commented Dec 5, 2017

Ok. And I'm assuming you re-inserted the module after installing. DKMS doesn't actually do this for you so you would still be running the old code. But if you rebooted it should load the (new) installed version.

I will see if I can reproduce the issue locally.

@phreaker0
Copy link
Author

Yeah, I rebooted to make sure.

tcaputi pushed a commit to datto/zfs that referenced this issue Dec 6, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Dec 21, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Dec 30, 2017
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 2, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 2, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 2, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 2, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 3, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 3, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 3, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 4, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 12, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 16, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 16, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
prometheanfire pushed a commit to prometheanfire/zfs that referenced this issue Jan 17, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 17, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 19, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 24, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 24, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
behlendorf pushed a commit to behlendorf/zfs that referenced this issue Jan 25, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 26, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Jan 31, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Signed-off-by: Tom Caputi <tcaputi@datto.com>
@phreaker0
Copy link
Author

I tested with the latest master version (v0.7.0-277_g0d23f5e) and unfortunately I still have the same issue:

root@token:~$ zpool import
   pool: ext
     id: 15081949372965998973
  state: FAULTED
 status: The pool metadata is corrupted.
 action: The pool cannot be imported due to damaged devices or data.
        The pool may be active on another system, but can be imported using
        the '-f' flag.
   see: http://zfsonlinux.org/msg/ZFS-8000-72
 config:

        ext         FAULTED  corrupted data
          sdb       ONLINE
root@token:~$ zpool import ext
cannot import 'ext': I/O error
        Recovery is possible, but will result in some data loss.
        Returning the pool to its state as of Mo 27 Nov 2017 09:05:11 CET
        should correct the problem.  Approximately 61 seconds of data
        must be discarded, irreversibly.  Recovery can be attempted
        by executing 'zpool import -F ext'.  A scrub of the pool
        is strongly recommended after recovery.
root@token:~$ zpool import -N -F ext
Feb 05 10:26:42 token kernel:  sdb: sdb1 sdb9
Feb 05 10:26:42 token kernel: sd 6:0:0:0: [sdb] Attached SCSI disk
Feb 05 10:26:42 token kernel: SPL: Loaded module v0.7.0-24_g23602fd
Feb 05 10:26:42 token kernel: icp: module license 'CDDL' taints kernel.
Feb 05 10:26:42 token kernel: Disabling lock debugging due to kernel taint
Feb 05 10:26:44 token kernel: ZFS: Loaded module v0.7.0-277_g0d23f5e, ZFS pool version 5000, ZFS filesystem version 5
Feb 05 10:27:15 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:28:07 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:28:07 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:28:07 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:28:07 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:29:01 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:29:01 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:29:01 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:29:02 token kernel: ZFS: Unable to set "noop" scheduler for /dev/sdb1 (sdb): 256
Feb 05 10:29:10 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, dsobj, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Feb 05 10:29:10 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, dsobj, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Feb 05 10:29:10 token kernel: PANIC at zio.c:3844:zio_encrypt()
Feb 05 10:29:10 token kernel: Showing stack for process 26845
Feb 05 10:29:10 token kernel: PANIC at zio.c:3844:zio_encrypt()
Feb 05 10:29:10 token kernel: CPU: 3 PID: 26845 Comm: z_wr_iss Tainted: P           OE    4.15.0-pf1+ #1
Feb 05 10:29:10 token kernel: Hardware name: Acer Aspire V5-572G/Dazzle_CX , BIOS V2.07 07/12/2013
Feb 05 10:29:10 token kernel: Showing stack for process 26846
Feb 05 10:29:10 token kernel: Call Trace:
Feb 05 10:29:10 token kernel:  dump_stack+0x63/0x8b
Feb 05 10:29:10 token kernel:  spl_dumpstack+0x42/0x50 [spl]
Feb 05 10:29:10 token kernel:  spl_panic+0xc8/0x110 [spl]
Feb 05 10:29:10 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Feb 05 10:29:10 token kernel:  ? spl_kmem_cache_free+0x139/0x1c0 [spl]
Feb 05 10:29:10 token kernel:  zio_encrypt+0x68a/0x770 [zfs]
Feb 05 10:29:10 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Feb 05 10:29:10 token kernel:  ? __schedule+0x3cd/0x890
Feb 05 10:29:10 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Feb 05 10:29:10 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Feb 05 10:29:10 token kernel:  ? wake_up_q+0x70/0x70
Feb 05 10:29:10 token kernel:  ? zio_reexecute+0x430/0x430 [zfs]
Feb 05 10:29:10 token kernel:  kthread+0x102/0x140
Feb 05 10:29:10 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 05 10:29:10 token kernel:  ? kthread_associate_blkcg+0x90/0x90
Feb 05 10:29:10 token kernel:  ? kthread_associate_blkcg+0x90/0x90
Feb 05 10:29:10 token kernel:  ret_from_fork+0x35/0x40
Feb 05 10:29:10 token kernel: CPU: 1 PID: 26846 Comm: z_wr_iss Tainted: P           OE    4.15.0-pf1+ #1
Feb 05 10:29:10 token kernel: Hardware name: Acer Aspire V5-572G/Dazzle_CX , BIOS V2.07 07/12/2013
Feb 05 10:29:10 token kernel: Call Trace:
Feb 05 10:29:10 token kernel:  dump_stack+0x63/0x8b
Feb 05 10:29:10 token kernel:  spl_dumpstack+0x42/0x50 [spl]
Feb 05 10:29:10 token kernel:  spl_panic+0xc8/0x110 [spl]
Feb 05 10:29:10 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Feb 05 10:29:10 token kernel:  ? spl_kmem_cache_free+0x139/0x1c0 [spl]
Feb 05 10:29:10 token kernel:  zio_encrypt+0x68a/0x770 [zfs]
Feb 05 10:29:10 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Feb 05 10:29:10 token kernel:  ? __schedule+0x3cd/0x890
Feb 05 10:29:10 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Feb 05 10:29:10 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Feb 05 10:29:10 token kernel:  ? wake_up_q+0x70/0x70
Feb 05 10:29:10 token kernel:  ? zio_reexecute+0x430/0x430 [zfs]
Feb 05 10:29:10 token kernel:  kthread+0x102/0x140
Feb 05 10:29:10 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 05 10:29:10 token kernel:  ? kthread_associate_blkcg+0x90/0x90
Feb 05 10:29:10 token kernel:  ret_from_fork+0x35/0x40
Feb 05 10:29:10 token kernel: VERIFY3(0 == spa_do_crypt_abd(B_TRUE, spa, dsobj, bp, zio->io_txg, psize, zio->io_abd, eabd, iv, mac, salt, &no_crypt)) failed (0 == 2)
Feb 05 10:29:10 token kernel: PANIC at zio.c:3844:zio_encrypt()
Feb 05 10:29:10 token kernel: Showing stack for process 26847
Feb 05 10:29:10 token kernel: CPU: 1 PID: 26847 Comm: z_wr_iss Tainted: P           OE    4.15.0-pf1+ #1
Feb 05 10:29:10 token kernel: Hardware name: Acer Aspire V5-572G/Dazzle_CX , BIOS V2.07 07/12/2013
Feb 05 10:29:10 token kernel: Call Trace:
Feb 05 10:29:10 token kernel:  dump_stack+0x63/0x8b
Feb 05 10:29:10 token kernel:  spl_dumpstack+0x42/0x50 [spl]
Feb 05 10:29:10 token kernel:  spl_panic+0xc8/0x110 [spl]
Feb 05 10:29:10 token kernel:  ? spa_do_crypt_abd+0x80/0x2f0 [zfs]
Feb 05 10:29:10 token kernel:  ? spl_kmem_cache_free+0x139/0x1c0 [spl]
Feb 05 10:29:10 token kernel:  zio_encrypt+0x68a/0x770 [zfs]
Feb 05 10:29:10 token kernel:  ? zio_write_compress+0x496/0x6a0 [zfs]
Feb 05 10:29:10 token kernel:  ? __wake_up_common_lock+0x8e/0xc0
Feb 05 10:29:10 token kernel:  zio_execute+0x8a/0xf0 [zfs]
Feb 05 10:29:10 token kernel:  taskq_thread+0x2ab/0x4d0 [spl]
Feb 05 10:29:10 token kernel:  ? wake_up_q+0x70/0x70
Feb 05 10:29:10 token kernel:  ? zio_reexecute+0x430/0x430 [zfs]
Feb 05 10:29:10 token kernel:  kthread+0x102/0x140
Feb 05 10:29:10 token kernel:  ? taskq_thread_should_stop+0x70/0x70 [spl]
Feb 05 10:29:10 token kernel:  ? kthread_associate_blkcg+0x90/0x90
Feb 05 10:29:10 token kernel:  ret_from_fork+0x35/0x40

@tcaputi tcaputi added this to the 0.8.0 milestone Feb 9, 2018
@tcaputi tcaputi reopened this Feb 10, 2018
@tcaputi
Copy link
Contributor

tcaputi commented Feb 10, 2018

@phreaker0
Hmmmm. Can i ask how large this pool is and how sensitive the data on it is? At this point, I have checked all of the code paths that I can think of related to pool import and I can't find anything that might be causing this. Without being able to reproduce the issue myself, I'm a bit lost on what to do. If I could poke around with it I might be able to figure it out.

Feel free to email me at tcaputi@datto.com as well.

@phreaker0
Copy link
Author

@tcaputi
The pool is 1TB in size with about 80% used. It only contains encrypted datasets so I have no problem if you need access to it. But I think it is too big for sending the whole thing. Maybe I can share the disk as block device via iSCSI or something (I never used iSCSI but it sounds like it would be useful here) or do you think it would be too slow?

@tcaputi
Copy link
Contributor

tcaputi commented Feb 12, 2018

ISCSI will almost definitely not work over WAN. The only way I can think to do this is if you can plug the drive into a spare machine somewhere that you don't mind me having root access to (maybe passthrough to a VM). You could then expose the port over the WAN and privately email me ssh credentials to the VM so I can access it. I will need to be able to reboot this VM a few times to figure out what's going on. I understand if this is too much to ask.

@phreaker0
Copy link
Author

no problem, i will setup a vm and email you the details when ready

@tcaputi
Copy link
Contributor

tcaputi commented Feb 12, 2018

Ok. I have found the cause of the issue and will be making a PR for it shortly. Basically, ZIL claiming was accidentally causing user accounting to occur even though the keys weren't loaded yet. Thanks for the help with this @phreaker0 .

Nasf-Fan pushed a commit to Nasf-Fan/zfs that referenced this issue Feb 13, 2018
When performing zil_claim() at pool import time, it is
important that encrypted datasets set os_next_write_raw
before writing to the zil_header_t. This prevents the code
from attempting to re-authenticate the objset_phys_t when
it writes it out, which is unnecessary because the
zil_header_t is not protected by either objset MAC and
impossible since the keys aren't loaded yet. Unfortunately,
one of the code paths did not set this flag, which causes
failed ASSERTs during 'zpool import -F'. This patch corrects
this issue.

Reviewed-by: Jorgen Lundman <lundman@lundman.net>
Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes openzfs#6864
Closes openzfs#6916
tcaputi pushed a commit to datto/zfs that referenced this issue Feb 13, 2018
Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned.

Fixes: openzfs#6916

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Feb 13, 2018
Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned.

Fixes: openzfs#6916

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Feb 14, 2018
Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned.

Fixes: openzfs#6916

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Feb 14, 2018
Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned.

Fixes: openzfs#6916

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Feb 16, 2018
Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned for write. As part of this work, the
zfsvfs and zvol code was updated so that it no longer lies
about owning objsets readonly.

Fixes: openzfs#6916

Signed-off-by: Tom Caputi <tcaputi@datto.com>
tcaputi pushed a commit to datto/zfs that referenced this issue Feb 20, 2018
Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned for write. As part of this work, the
zfsvfs and zvol code was updated so that it no longer lies
about owning objsets readonly.

Fixes: openzfs#6916

Signed-off-by: Tom Caputi <tcaputi@datto.com>
behlendorf pushed a commit that referenced this issue Feb 21, 2018
Currently, ZIL claiming dirties objsets which causes
dsl_pool_sync() to attempt to perform user accounting on
them. This causes problems for encrypted datasets that were
raw received before the system went offline since they
cannot perform user accounting until they have their keys
loaded. This triggers an ASSERT in zio_encrypt(). Since
encryption was added, the code now depends on the fact that
data should only be written when objsets are owned. This
patch adds a check in dmu_objset_do_userquota_updates()
to ensure that useraccounting is only done when the objsets
are actually owned for write. As part of this work, the
zfsvfs and zvol code was updated so that it no longer lies
about owning objsets readonly.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Tom Caputi <tcaputi@datto.com>
Closes #6916 
Closes #7163
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants