Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

With zinject faults enabled, "ASSERT(!(zio->io_flags & (ZIO_FLAG_IO_REPAIR | ZIO_FLAG_IO_RETRY)));" in zio_dva_throttle_done() fails. #6383

Closed
sanjeevbagewadi opened this issue Jul 21, 2017 · 0 comments

Comments

@sanjeevbagewadi
Copy link
Contributor

System information

Type Version/Name
Distribution Name CentOS
Distribution Version el6
Linux Kernel 4.4.14-1.el6
Architecture Intel
ZFS Version 0.7.0-rc3
SPL Version 0.7.0-rc3

Describe the problem you're observing

With zinject faults enabled for a device \:

# zinject -a -d /dev/sdad -e io testpool -T all

Hit the panic following panic \:
-- snip --
PID: 8949   TASK: ffff8802b25a1540  CPU: 0   COMMAND: "z_wr_int_7"
 #0 [ffff8802b1f737c0] machine_kexec at ffffffff8105d4f0
 #1 [ffff8802b1f73830] crash_kexec at ffffffff8110c288
 #2 [ffff8802b1f73900] oops_end at ffffffff8101a976
 #3 [ffff8802b1f73930] die at ffffffff8101ae2b
 #4 [ffff8802b1f73960] do_trap at ffffffff81017e5f
 #5 [ffff8802b1f739c0] do_error_trap at ffffffff8101810d
 #6 [ffff8802b1f73a80] do_invalid_op at ffffffff81018260
 #7 [ffff8802b1f73a90] invalid_op at ffffffff816d12de
    [exception RIP: spl_panic+194]
    RIP: ffffffffa07cfff2  RSP: ffff8802b1f73b48  RFLAGS: 00010282
    RAX: 00000000ffffffff  RBX: ffff8802b1f73b78  RCX: ffff8802b1f738f8
    RDX: 00000001000dd72c  RSI: 0000000000000001  RDI: 0000000000000282
    RBP: ffff8802b1f73cd8   R8: 66666666663c5b20   R9: 0000000000000787
    R10: 3f205d3e30373532  R11: 0000000000000787  R12: ffffffffa0a0cb80
    R13: 0000000000000edb  R14: ffffffffa0a6c003  R15: ffff8802b25a1540
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8802b1f73ce0] zio_dva_throttle_done at ffffffffa09af382 [zfs]
 #9 [ffff8802b1f73d20] zio_done at ffffffffa09b6413 [zfs]
#10 [ffff8802b1f73db0] zio_execute at ffffffffa09b020d [zfs]
#11 [ffff8802b1f73e00] taskq_thread at ffffffffa07cd76c [spl]
#12 [ffff8802b1f73ec0] kthread at ffffffff810a263c
#13 [ffff8802b1f73f50] ret_from_fork at ffffffff816cfacf
-- snip --

We have paniced at :
-- snip --
[ 1207.168219] VERIFY(!(zio->io_flags & (ZIO_FLAG_IO_REPAIR | ZIO_FLAG_IO_RETRY))) failed
[ 1207.168359] PANIC at zio.c:3803:zio_dva_throttle_done()
-- snip --

Looking at the flags :
-- snip --
crash> zio.io_flags,io_error ffff880257c19c80
  io_flags = (ZIO_FLAG_CANFAIL | ZIO_FLAG_DONT_CACHE | ZIO_FLAG_IO_ALLOCATING | ZIO_FLAG_IO_RETRY | ZIO_FLAG_DONT_QUEUE | ZIO_FLAG_DONT_PROPAGATE)
  io_error = 5
-- snip --

The ZIO_FLAG_IO_RETRY is set on this zio by zio_handle_device_injection() because of the zinject record added above.

Probably such ZIOs should be treated special in zio_dva_throttle_done() and the ASSERT() applied conditionally.

Describe how to reproduce the problem

Inject fault using : # zinject -a -d /dev/sdad -e io testpool -T all
And generate some IO on testpool and that should induce this panic on debug builds.

Include any warning/errors/backtraces from the system logs

-- snip --
PID: 8949   TASK: ffff8802b25a1540  CPU: 0   COMMAND: "z_wr_int_7"
 #0 [ffff8802b1f737c0] machine_kexec at ffffffff8105d4f0
 #1 [ffff8802b1f73830] crash_kexec at ffffffff8110c288
 #2 [ffff8802b1f73900] oops_end at ffffffff8101a976
 #3 [ffff8802b1f73930] die at ffffffff8101ae2b
 #4 [ffff8802b1f73960] do_trap at ffffffff81017e5f
 #5 [ffff8802b1f739c0] do_error_trap at ffffffff8101810d
 #6 [ffff8802b1f73a80] do_invalid_op at ffffffff81018260
 #7 [ffff8802b1f73a90] invalid_op at ffffffff816d12de
    [exception RIP: spl_panic+194]
    RIP: ffffffffa07cfff2  RSP: ffff8802b1f73b48  RFLAGS: 00010282
    RAX: 00000000ffffffff  RBX: ffff8802b1f73b78  RCX: ffff8802b1f738f8
    RDX: 00000001000dd72c  RSI: 0000000000000001  RDI: 0000000000000282
    RBP: ffff8802b1f73cd8   R8: 66666666663c5b20   R9: 0000000000000787
    R10: 3f205d3e30373532  R11: 0000000000000787  R12: ffffffffa0a0cb80
    R13: 0000000000000edb  R14: ffffffffa0a6c003  R15: ffff8802b25a1540
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #8 [ffff8802b1f73ce0] zio_dva_throttle_done at ffffffffa09af382 [zfs]
 #9 [ffff8802b1f73d20] zio_done at ffffffffa09b6413 [zfs]
#10 [ffff8802b1f73db0] zio_execute at ffffffffa09b020d [zfs]
#11 [ffff8802b1f73e00] taskq_thread at ffffffffa07cd76c [spl]
#12 [ffff8802b1f73ec0] kthread at ffffffff810a263c
#13 [ffff8802b1f73f50] ret_from_fork at ffffffff816cfacf
-- snip --
sanjeevbagewadi pushed a commit to sanjeevbagewadi/zfs that referenced this issue Jul 21, 2017
…penzfs#6383)

zinject enables ZIO_FLAG_IO_RETRY to ensure FMA events are generated. These ZIOs
could have ZIO_FLAG_IO_ALOCATING set. Hence, conditionally allow such ZIOs in
zio_dva_throttle_done() and donot fail the ASSERT() for ZIO_FLAG_IO_RETRY.

Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
sanjeevbagewadi pushed a commit to sanjeevbagewadi/zfs that referenced this issue Aug 10, 2017
zinject enables ZIO_FLAG_IO_RETRY to ensure FMA events are generated.
Allow such ZIOs in zio_dva_throttle_done() and donot fail the ASSERT()
for ZIO_FLAG_IO_RETRY.

Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
tonyhutter pushed a commit that referenced this issue Aug 22, 2017
If fault injection is enabled, the ZIO_FLAG_IO_RETRY could be set by
zio_handle_device_injection() to generate the FMA events and update
stats. Hence, ignore the flag and process such zios.

A better fix would be to add another flag in the zio_t to indicate that
the zio is failed because of a zinject rule. However, considering the
fact that we do this in debug bits, we could do with the crude check
using the global flag zio_injection_enabled which is set to 1 when
zinject records are added.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Closes #6383 
Closes #6384
Fabian-Gruenbichler pushed a commit to Fabian-Gruenbichler/zfs that referenced this issue Sep 29, 2017
If fault injection is enabled, the ZIO_FLAG_IO_RETRY could be set by
zio_handle_device_injection() to generate the FMA events and update
stats. Hence, ignore the flag and process such zios.

A better fix would be to add another flag in the zio_t to indicate that
the zio is failed because of a zinject rule. However, considering the
fact that we do this in debug bits, we could do with the crude check
using the global flag zio_injection_enabled which is set to 1 when
zinject records are added.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Sanjeev Bagewadi <sanjeev.bagewadi@gmail.com>
Closes openzfs#6383 
Closes openzfs#6384
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant