Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix zil replay panic when TX_REMOVE followed by TX_CREATE #9145

Merged
merged 1 commit into from
Aug 28, 2019

Conversation

tuxoko
Copy link
Contributor

@tuxoko tuxoko commented Aug 9, 2019

If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Closes #7151
Closes #8910
Closes #9123
Signed-off-by: Chunwei Chen david.chen@nutanix.com

Motivation and Context

Description

How Has This Been Tested?

#!/bin/bash

POOLNAME=pp
TESTDIR=/pp

if [ "$(ls $TESTDIR | wc -l)" -lt "1000" ]; then
	for i in $(seq 1000); do touch $TESTDIR/file.$i; done
	sleep 2
	zpool sync $POOLNAME
fi

touch $TESTDIR && sync
zpool sync $POOLNAME
zpool freeze $POOLNAME

rm -f $TESTDIR/*
for i in $(seq 1000); do touch $TESTDIR/file.$i; done
sync

zpool export $POOLNAME
zpool import $POOLNAME

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

@tuxoko tuxoko force-pushed the ziltxg branch 3 times, most recently from a8fd598 to 698ba75 Compare August 10, 2019 00:22
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Aug 12, 2019
Copy link
Contributor

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! You've definitely identified the root cause, it looks like this was accidentally introduced with the large dnode change. A few comments inline.

module/zfs/zfs_replay.c Outdated Show resolved Hide resolved
module/zfs/zfs_replay.c Outdated Show resolved Hide resolved
module/zfs/dnode.c Show resolved Hide resolved
tests/zfs-tests/tests/functional/slog/slog_replay_fs.ksh Outdated Show resolved Hide resolved
tests/zfs-tests/tests/functional/slog/slog_replay_fs.ksh Outdated Show resolved Hide resolved
@tuxoko tuxoko force-pushed the ziltxg branch 2 times, most recently from 9d54188 to 15b839a Compare August 21, 2019 17:06
module/zfs/dnode.c Outdated Show resolved Hide resolved
module/zfs/dnode.c Show resolved Hide resolved
module/zfs/zfs_replay.c Outdated Show resolved Hide resolved
@codecov
Copy link

codecov bot commented Aug 22, 2019

Codecov Report

Merging #9145 into master will increase coverage by 0.06%.
The diff coverage is 71.42%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #9145      +/-   ##
==========================================
+ Coverage   79.19%   79.26%   +0.06%     
==========================================
  Files         400      400              
  Lines      122002   122009       +7     
==========================================
+ Hits        96625    96714      +89     
+ Misses      25377    25295      -82
Flag Coverage Δ
#kernel 79.79% <71.42%> (-0.01%) ⬇️
#user 67.14% <41.17%> (+0.2%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4302698...636a114. Read the comment docs.

module/zfs/dnode.c Outdated Show resolved Hide resolved
module/zfs/dnode.c Outdated Show resolved Hide resolved
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
@behlendorf
Copy link
Contributor

@prakashsurya @tcaputi if you get a chance it would be great to get a second reviewer for this fix.

Copy link
Contributor

@tcaputi tcaputi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Aug 27, 2019
@behlendorf behlendorf merged commit 035e961 into openzfs:master Aug 28, 2019
ghost pushed a commit to zfsonfreebsd/ZoF that referenced this pull request Aug 28, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Closes openzfs#9145
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 17, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Closes openzfs#9145
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Closes openzfs#9145
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Closes openzfs#9145
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 18, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Closes openzfs#9145
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 19, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Closes openzfs#9145
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request Sep 23, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes openzfs#7151
Closes openzfs#8910
Closes openzfs#9123
Closes openzfs#9145
tonyhutter pushed a commit that referenced this pull request Sep 26, 2019
If TX_REMOVE is followed by TX_CREATE on the same object id, we need to
make sure the object removal is completely finished before creation. The
current implementation relies on dnode_hold_impl with
DNODE_MUST_BE_ALLOCATED returning ENOENT. While this check seems to work
fine before, in current version it does not guarantee the object removal
is completed.

We fix this by checking if DNODE_MUST_BE_FREE returns successful
instead. Also add test and remove dead code in dnode_hold_impl.

Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed-by: Tom Caputi <tcaputi@datto.com>
Signed-off-by: Chunwei Chen <david.chen@nutanix.com>
Closes #7151
Closes #8910
Closes #9123
Closes #9145
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Unable to mount a dataset after power failure VERIFY3(0 == dmu_object_claim_dnsize()) failed (0 == 28)
4 participants