Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Curious zvol behavior oddities in GH Actions since 20211004.1 #12644

Closed
rincebrain opened this issue Oct 14, 2021 · 4 comments
Closed

Curious zvol behavior oddities in GH Actions since 20211004.1 #12644

rincebrain opened this issue Oct 14, 2021 · 4 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@rincebrain
Copy link
Contributor

rincebrain commented Oct 14, 2021

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version 18.04/20.04
Kernel Version various
Architecture amd64
OpenZFS Version whatever git master you like

Describe the problem you're observing

Since Github rolled out 20211004.1 images, a bunch of tests have started failing ~100% of the time (some of them aren't 100%, but many are):

    FAIL cli_root/zfs_destroy/zfs_destroy_003_pos (expected PASS)
    FAIL cli_root/zfs_destroy/zfs_destroy_004_pos (expected PASS)
    FAIL cli_root/zfs_destroy/zfs_destroy_008_pos (expected PASS)
    FAIL cli_root/zfs_destroy/zfs_destroy_009_pos (expected PASS)
    FAIL cli_root/zfs_destroy/zfs_destroy_010_pos (expected PASS)
    FAIL cli_root/zfs_destroy/zfs_destroy_011_pos (expected PASS)
    FAIL cli_root/zfs_destroy/zfs_destroy_012_pos (expected PASS)
    FAIL cli_root/zfs_destroy/zfs_destroy_013_neg (expected PASS)
    FAIL cli_root/zfs_load-key/zfs_load-key_all (expected PASS)
    FAIL cli_root/zfs_receive/zfs_receive_002_pos (expected PASS)
    FAIL cli_root/zfs_receive/zfs_receive_005_neg (expected PASS)
    FAIL cli_root/zfs_unmount/zfs_unmount_008_neg (expected PASS)
    FAIL cli_root/zfs_upgrade/zfs_upgrade_001_pos (expected PASS)
    FAIL cli_root/zpool_destroy/zpool_destroy_001_pos (expected PASS)
    FAIL cli_root/zpool_detach/setup (expected PASS)
    SKIP cli_root/zpool_detach/zpool_detach_001_neg (expected PASS)
    FAIL reservation/reservation_003_pos (expected PASS)
    FAIL reservation/reservation_014_pos (expected PASS)
    FAIL snapused/snapused_002_pos (expected PASS)
    FAIL snapused/snapused_005_pos (expected PASS)

Having looked into them, basically all of them are of the form:

  • something tries to manipulate a zvol (zfs set reservation/zfs destroy -r/zfs promote/...)
  • command returns, something tries to zpool export/zpool destroy/zfs destroy/etc (writing more space than the pool should allow is one additional example)
  • EBUSY/ENOSPC/etc comes back like your previous change did not do what you expected or what the command return code indicated

Curiously, the kernel/userland doesn't have any flamingly obvious differences, dmesg from working/nonworking don't have obviously consistent differences, config.log doesn't have any obvious differences, and it doesn't reproduce on the same kernel in a stock Ubuntu VM of 18.04/20.04, so...?

I'm iterating on a PR that applies udev_waits and zpool syncs like there's no tomorrow that cleans these up, but I got to the zfs_destroy failures, and that seems like it might be a bit stranger...since it seems to be incorrectly zfs destroying...

Describe how to reproduce the problem

Just run a Github Action workflow with Ubuntu 18.04/20.04 now, since you can't run old versions. :D

Include any warning/errors/backtraces from the system logs

Curiously, this one fails somewhere that shouldn't have any outstanding references, and yet...

An example from reservation_014_pos (no code changes, but I modified reservation_014_pos to set zfs_flags to 512, echo the various env variables used before running zfs set quota, and try setting it twice):

ASSERTION: Verify cannot set reservation larger than quota
SUCCESS: zfs create -V 279068672 testpool/testvol21794
SUCCESS: zfs create -s -V 4465328128 testpool/testvol2-21794
279068672 4465328128 464756906 1394270720 5242880
NAME             PROPERTY  VALUE  SOURCE
testpool         used      1.85G  -
testpool/testfs  used      24K    -
cannot set property for 'testpool/testfs': size is less than current used or reserved space
NAME             PROPERTY  VALUE  SOURCE
testpool         used      1.85G  -
testpool/testfs  used      24K    -
cannot set property for 'testpool/testfs': size is less than current used or reserved space
ERROR: zfs set quota=464756906 testpool/testfs exited 1
NOTE: Performing test-fail callback (/usr/share/zfs/zfs-tests/callbacks/zfs_dbgmsg.ksh)
=================================================================
 Tailing last 500 lines of zfs_dbgmsg log
=================================================================
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217816   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   dsl_prop.c:102:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zvol.c:962:zvol_first_open(): error 0
1634217817   zvol_os.c:646:zvol_ioctl(): error 0
1634217817   zvol_os.c:646:zvol_ioctl(): error 18446744073709551591
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   dsl_prop.c:102:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   dsl_prop.c:102:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_prop.c:150:dsl_prop_get_dd(): error 2
1634217817   zvol.c:962:zvol_first_open(): error 0
1634217817   zvol_os.c:646:zvol_ioctl(): error 0
1634217817   zvol_os.c:646:zvol_ioctl(): error 18446744073709551591
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   dbuf.c:2957:dbuf_findbp(): error 2
1634217817   zap_leaf.c:489:zap_entry_read(): error 75
1634217817   zap_leaf.c:510:zap_entry_read_name(): error 75
1634217817   zap_leaf.c:470:zap_leaf_lookup_closest(): error 2
1634217817   zap_micro.c:1612:zap_cursor_retrieve(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_dataset.c:2666:dsl_get_prev_snap(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_dir.c:1074:dsl_dir_get_filesystem_count(): error 2
1634217817   dsl_dir.c:1086:dsl_dir_get_snapshot_count(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_dataset.c:798:dsl_dataset_hold_flags(): error 2
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_dataset.c:798:dsl_dataset_hold_flags(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   dsl_prop.c:55:dodefault(): error 2
1634217817   spa_misc.c:2575:spa_scan_get_stats(): error 2
1634217817   vdev_removal.c:2342:spa_removal_get_stats(): error 2
1634217817   spa_checkpoint.c:167:spa_checkpoint_get_stats(): error 1026
1634217817   zap_micro.c:984:zap_lookup_impl(): error 2
1634217817   dsl_dir.c:1684:dsl_dir_set_quota_check(): error 28
1634217817   dsl_dir.c:1684:dsl_dir_set_quota_check(): error 28
=================================================================
 End of zfs_dbgmsg log
=================================================================
@rincebrain rincebrain added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 14, 2021
@rincebrain
Copy link
Contributor Author

The culprit appears to be the upgrade of the cloud-init package on Ubuntu - it added a udev rule to trigger a hook for every disk add/remove event, and shortcutting the udev rule or removing it entirely reverts to the previous behavior.

You can see removing the udev rule run to completion here. If that seems like too heavy-handed a solution, you could just shove KERNEL=="zd*", GOTO="cloudinit_end" into /lib/udev/rules.d/10-cloud-init-hook-hotplug.rules if it exists.

I've reported this upstream here.

@behlendorf Not that #12663 is a bad idea, but something like this could be less invasive, perhaps.

@behlendorf
Copy link
Contributor

That explains a lot. Thanks for getting to the real root cause of this change in behavior. Hopefully, upstream will not only fix this is 21.4 but also make the change in 18.04 and 20.04.

My feeling is we should do both. Could you open a PR for your less heavy handed KERNEL=="zd*", GOTO="cloudinit_end" workaround? That will at least get us back to where we were. I'd like to still proceed with #12663 since I think there is merit in standardizing the cleanup and adding the retry to the most racy tests. One nice thing which may come out of this is a small reduction in false positive test failures. While it's rare, I have seen some of these tests fail on other distributions just because they got unlucky.

@rincebrain
Copy link
Contributor Author

Oh sure, I'm absolutely not suggesting not going forward with that, and if I came across as doing that, I'm sorry, that's a failure on my part - just that it might make sense if this turns out to involve whack-a-mole beyond the initial 100% failures to do something like this.

They did say it'll be fixed in the next release and have a PR out to make it opt-in not always-on, though there's no promise that various cloud providers don't all opt-in by default...

Do you want the workaround just for the GH workflows, or in the wider package install script? Normally I'd be aggressively advocating for the latter "just in case", but since they said they're fixing it in the next release, and cloud-init is basically only installed on cloud providers, it feels like such a short-lived workaround might be fine to put in the former, in this one case.

@behlendorf
Copy link
Contributor

Why don't we just make the change for now in the GH workflows since it's something we're expecting will be fixed. It's also hopefully something very few people would ever encounter since I'd like to think creating and immediately destroying a volume is a pretty uncommon activity.

rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 22, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Closes: openzfs#12644

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 22, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Closes: openzfs#12644

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 22, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Closes: openzfs#12644

Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
@jwk404 jwk404 closed this as completed in 731fbb5 Oct 25, 2021
rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 28, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12644
Closes openzfs#12669
rincebrain added a commit to rincebrain/zfs that referenced this issue Oct 28, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12644
Closes openzfs#12669
tonyhutter pushed a commit that referenced this issue Nov 1, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669
tonyhutter pushed a commit that referenced this issue Nov 2, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes #12644
Closes #12669
ghost pushed a commit to truenas/zfs that referenced this issue Nov 3, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12644
Closes openzfs#12669
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 13, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12644
Closes openzfs#12669
tonyhutter pushed a commit to tonyhutter/zfs that referenced this issue Nov 13, 2021
cloud-init added a hook which triggers on every device add/rm
event, which results in holding open devices for a while after
they're created/destroyed.

So let's shove an exclusion rule for that into the GH workflows
until it gets fixed.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Reviewed-by: John Kennedy <john.kennedy@delphix.com>
Signed-off-by: Rich Ercolani <rincebrain@gmail.com>
Closes openzfs#12644
Closes openzfs#12669
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

2 participants