Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix ziltest deadlock #5758

Merged
merged 2 commits into from
Feb 9, 2017
Merged

Fix ziltest deadlock #5758

merged 2 commits into from
Feb 9, 2017

Conversation

behlendorf
Copy link
Contributor

Description

This patch stack addresses two issues.

The first is a deadlock exposed by ziltest.sh which was aggravated by the proposed increase in the indirect block size in #5679. The larger block size resulted is multiple TXs being required during replay due to quota limits triggering a deadlock in iput() which was used incorrectly while constructing a TX. The zfs_iput_async() function must be used in this context.

The second is some code cleanup which move the stand-alone ziltest.sh script in to the ZFS Test Suite. This test case was only ever separate because it predates the test suite.

How Has This Been Tested?

Locally with the original ziltest.sh and the updated test case. The changes from #5679 were apply to make it easy to hit the iput() issue.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)

@mention-bot
Copy link

@behlendorf, thanks for your PR! By analyzing the history of the files in this pull request, we identified @slashdd, @ahrens and @tuxoko to be potential reviewers.

@behlendorf behlendorf requested a review from don-brady February 8, 2017 02:58
@behlendorf
Copy link
Contributor Author

For reference one example of the iput() deadlock triggered. Notice that iput() is called in the error path of zfs_remove() where dmu_tx_assign() fails. Since this is last reference on the iput() it re-enters the filesystem before dmu_tx_abort() can be called thereby holding open the txg indefinitely. Making the iput() asynchronous in this case resolves the deadlock.

[<ffffffffa06db60d>] cv_wait_common+0x14d/0x2a0 [spl]
[<ffffffffa06db775>] __cv_wait+0x15/0x20 [spl]
[<ffffffffa0bfeef3>] txg_wait_open+0xe3/0x1b0 [zfs]
[<ffffffffa0ba2ea5>] dmu_tx_wait+0x465/0x4d0 [zfs]
[<ffffffffa0ba2fd9>] dmu_tx_assign+0xc9/0x720 [zfs]
[<ffffffffa0b8b4b1>] dmu_free_long_range+0x341/0x560 [zfs]
[<ffffffffa0c30784>] zfs_rmnode+0x2f4/0x450 [zfs]
[<ffffffffa0c58e78>] zfs_zinactive+0xd8/0x140 [zfs]
[<ffffffffa0c4e93b>] zfs_inactive+0x8b/0x300 [zfs]
[<ffffffffa0c74c33>] zpl_evict_inode+0x43/0xa0 [zfs]
[<ffffffff8121a237>] evict+0xa7/0x170
[<ffffffff8121aad5>] iput+0xf5/0x180
[<ffffffffa0c51d5b>] zfs_remove+0x9cb/0xab0 [zfs]
[<ffffffffa0c40167>] zfs_replay_remove+0xc7/0x100 [zfs]
[<ffffffffa0c5b5b0>] zil_replay_log_record+0xd0/0x1a0 [zfs]
[<ffffffffa0c5cf34>] zil_parse+0x6b4/0xac0 [zfs]
[<ffffffffa0c5dcc2>] zil_replay+0xc2/0x160 [zfs]
[<ffffffffa0c464b8>] zfs_sb_setup+0x188/0x1a0 [zfs]
[<ffffffffa0c4788a>] zfs_domount+0x2ca/0x3a0 [zfs]
[<ffffffffa0c74b61>] zpl_fill_super+0x31/0xc0 [zfs]
[<ffffffff81201a6d>] mount_nodev+0x4d/0xb0
[<ffffffffa0c747e2>] zpl_mount+0x52/0x80 [zfs]
[<ffffffff81202419>] mount_fs+0x39/0x1b0
[<ffffffff8121e19f>] vfs_kern_mount+0x5f/0xf0
[<ffffffff812206fe>] do_mount+0x24e/0xaa0
[<ffffffff81220fe6>] SyS_mount+0x96/0xf0

Copy link
Member

@gmelikov gmelikov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thank you!

Copy link
Contributor

@don-brady don-brady left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for pulling ziltest into test suite!

Copy link
Contributor

@tuxoko tuxoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Notice that iput() is called in the error path of zfs_remove() where dmu_tx_assign() fails

Are you sure about this? From the stack trace it's already doing zfs_rmnode, so that means the link count should already be zero?

if (xzp)
iput(ZTOI(xzp));
zfs_iput_async(ZTOI(xzp));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we just put the move the iput after dmu_tx_abort?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work fine. We need to do it for both the ERESTART and !ERESTART cases though which felt a bit ugly. But I agree it would be nice to keep these as iput's if possible. I'll rework it and we'll see.

@@ -1848,7 +1848,7 @@ zfs_remove(struct inode *dip, char *name, cred_t *cr, int flags)
zfs_unlinked_add(zp, tx);
mutex_exit(&zp->z_lock);
zfs_inode_update(zp);
iput(ip);
zfs_iput_async(ip);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this makes delete_now not really delete now, maybe we can just remove this path.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to move the iput() after the commit.

Copy link
Contributor Author

@behlendorf behlendorf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll rework this to preserve the iput()'s and place them after the commit/abort.

if (xzp)
iput(ZTOI(xzp));
zfs_iput_async(ZTOI(xzp));
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That should work fine. We need to do it for both the ERESTART and !ERESTART cases though which felt a bit ugly. But I agree it would be nice to keep these as iput's if possible. I'll rework it and we'll see.

@@ -1848,7 +1848,7 @@ zfs_remove(struct inode *dip, char *name, cred_t *cr, int flags)
zfs_unlinked_add(zp, tx);
mutex_exit(&zp->z_lock);
zfs_inode_update(zp);
iput(ip);
zfs_iput_async(ip);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should be able to move the iput() after the commit.

@behlendorf
Copy link
Contributor Author

behlendorf commented Feb 8, 2017

@tuxoko refreshed to minimize the use of zfs_iput_async(). Please take an extra close look at zfs_remove() to make sure I got it right. I stuck with iput() for all of these cases except for zfs_rename_unlock() which I felt that was to disruptive to the existing code.

Copy link
Contributor

@tuxoko tuxoko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, one thing I don't like about zfs_iput_async is the racy check in it.

@behlendorf
Copy link
Contributor Author

Right, that and needless contention of the iput taskq lock. I agree, better to do it synchronsouly when if we can and use zfs_iput_async() sparingly when there aren't better options. This wasn't as ugly as I initially feared.

As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
The ziltest.sh script is a test case designed to verify the correct
functioning of the ZIL.  For historical reasons it was never added
to the test suite and was always run independantly.

This change rectifies that.  The existing ziltest.sh has been
translated in to `slog_015_pos.ksh` and added to the existing
slog test cases.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
@behlendorf behlendorf merged commit b0eac56 into openzfs:master Feb 9, 2017
behlendorf added a commit that referenced this pull request Feb 9, 2017
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5758
gmelikov pushed a commit to gmelikov/zfs that referenced this pull request Feb 9, 2017
The ziltest.sh script is a test case designed to verify the correct
functioning of the ZIL.  For historical reasons it was never added
to the test suite and was always run independantly.

This change rectifies that.  The existing ziltest.sh has been
translated in to `slog_015_pos.ksh` and added to the existing
slog test cases.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
wli5 pushed a commit to wli5/zfs that referenced this pull request Feb 28, 2017
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
wli5 pushed a commit to wli5/zfs that referenced this pull request Feb 28, 2017
The ziltest.sh script is a test case designed to verify the correct
functioning of the ZIL.  For historical reasons it was never added
to the test suite and was always run independantly.

This change rectifies that.  The existing ziltest.sh has been
translated in to `slog_015_pos.ksh` and added to the existing
slog test cases.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
wli5 pushed a commit to wli5/zfs that referenced this pull request Feb 28, 2017
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
wli5 pushed a commit to wli5/zfs that referenced this pull request Feb 28, 2017
The ziltest.sh script is a test case designed to verify the correct
functioning of the ZIL.  For historical reasons it was never added
to the test suite and was always run independantly.

This change rectifies that.  The existing ziltest.sh has been
translated in to `slog_015_pos.ksh` and added to the existing
slog test cases.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
l1k pushed a commit to l1k/zfs that referenced this pull request Apr 17, 2017
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
l1k pushed a commit to l1k/zfs that referenced this pull request Apr 17, 2017
The ziltest.sh script is a test case designed to verify the correct
functioning of the ZIL.  For historical reasons it was never added
to the test suite and was always run independantly.

This change rectifies that.  The existing ziltest.sh has been
translated in to `slog_015_pos.ksh` and added to the existing
slog test cases.

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
@behlendorf behlendorf deleted the ziltest branch May 1, 2017 18:09
tonyhutter pushed a commit to tonyhutter/zfs that referenced this pull request May 18, 2017
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes openzfs#5758
tonyhutter pushed a commit that referenced this pull request Jun 9, 2017
As explicitly stated in section 2 of the 'Programming rules'
comments at the top of zfs_vnops.c.

  If you must call iput() within a tx then use zfs_iput_async().

Move iput() calls after dmu_tx_commit() / dmu_tx_abort when
possible.  When not possible convert the iput() calls to
zfs_iput_async().

Reviewed-by: Don Brady <don.brady@intel.com>
Reviewed-by: Chunwei Chen <david.chen@osnexus.com>
Reviewed-by: George Melikov <mail@gmelikov.ru>
Signed-off-by: Brian Behlendorf <behlendorf1@llnl.gov>
Closes #5758
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants