Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dangling reference from dmu_objset_upgrade #11368

Merged
merged 1 commit into from
Dec 21, 2020

Conversation

citrus-it
Copy link
Contributor

After porting the fix for #5295
over to illumos, we started hitting an assertion failure when running the
testsuite:

assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

dsl_dataset_long_hold+0x59
dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15
dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get this
to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in illumos),
slows down dmu_objset_disown() enough to avoid the condition.

Motivation and Context

Fix dangling dataset long hold encountered on illumos:

> ::stack
vpanic()
0xfffffffffbdcd395()
zfs_refcount_destroy_many+0x30(fffffe172cbad768, 0)
zfs_refcount_destroy+0x10(fffffe172cbad768)
dsl_dataset_evict_async+0x12a(fffffe172cbad440)
taskq_thread+0x315(fffffe170cee0eb0)
thread_start+0xb()
> fffffe172cbad768::zfs_refcount
zfs_refcount_t at fffffe172cbad768 has 1 current holds, 3 recently released holds
current holds:
reference with tag fffffffff79decca "upgrade_tag", held at:
fffffe1c6a649300 is allocated from reference_cache:
            ADDR          BUFADDR        TIMESTAMP           THREAD
                            CACHE          LASTLOG         CONTENTS
fffffe1c67d829d0 fffffe1c6a649300      4e89b292133 fffffe1c9e3387c0
                 fffffe170ce92888 fffffe16e20e9cc0 fffffe16f4b6d850
                 kmem_cache_alloc_debug+0x2fc
                 kmem_cache_alloc+0x135
                 zfs_refcount_add_many+0x82
                 zfs_refcount_add+0x13
                 dsl_dataset_long_hold+0x59
                 dmu_objset_upgrade+0x73
                 dmu_objset_id_quota_upgrade+0x15
                 dmu_objset_own+0x14f
                 zfsvfs_create+0x6c
                 zfs_domount+0x50
                 zfs_mount+0x2a7
                 fsop_mount+0x14
                 domount+0xa78
                 mount+0xfe
                 syscall_ap+0x98

> fffffe170cee0eb0::taskq
ADDR             NAME                             ACT/THDS Q'ED  MAXQ INST
fffffe170cee0eb0 dbu_evict                          1/   1    1   390    -

How Has This Been Tested?

The fix has been tested on illumos by running their ZFS testsuite in a loop on multiple systems booted in a DEBUG environment (which enables assertions like the one triggered here).

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Library ABI change (libzfs, libzfs_core, libnvpair, libuutil and libzfsbootenv)
  • Documentation (a change to man pages or other documentation)

Checklist:

@ghost
Copy link

ghost commented Dec 18, 2020

Thanks, if you rebase this on the current master we can get a clean run from the CI.

After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Signed-off-by: Andy Fiddaman <andy@omnios.org>
@citrus-it
Copy link
Contributor Author

Thanks, if you rebase this on the current master we can get a clean run from the CI.

Done - not sure how it wasn't already on the latest master, sorry about that.

@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Dec 18, 2020
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for fixing this!

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Dec 20, 2020
@behlendorf behlendorf merged commit 39372fa into openzfs:master Dec 21, 2020
ghost pushed a commit to zfsonfreebsd/ZoF that referenced this pull request Dec 23, 2020
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes openzfs#11368
behlendorf pushed a commit that referenced this pull request Dec 23, 2020
After porting the fix for #5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes #11368
jsai20 pushed a commit to jsai20/zfs that referenced this pull request Mar 30, 2021
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes openzfs#11368
sempervictus pushed a commit to sempervictus/zfs that referenced this pull request May 31, 2021
After porting the fix for openzfs#5295
over to illumos, we started hitting an assertion failure when running
the testsuite:

	assertion failed: rc->rc_count == number, file: .../refcount.c

and the unexpected hold has this stack:

	dsl_dataset_long_hold+0x59 dmu_objset_upgrade+0x73
dmu_objset_id_quota_upgrade+0x15 dmu_objset_own+0x14f

The simplest reproducer for this in illumos is

    zpool create -f -O version=1 testpool c3t0d0; zpool destroy testpool

which is run as part of the zpool_create_tempname test, but I can't get
this to trigger on FreeBSD. This appears to be because of the call to
txg_wait_synced() in dmu_objset_upgrade_stop() (which was missing in
illumos), slows down dmu_objset_disown() enough to avoid the condition.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Signed-off-by: Andy Fiddaman <andy@omnios.org>
Closes openzfs#11368
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants