Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

zfs: Fix lockdep between ds_lock and dd_lock in dsl_dataset_namelen() #8413

Merged
merged 1 commit into from
Mar 11, 2019

Conversation

mzhivich
Copy link
Contributor

@mzhivich mzhivich commented Feb 15, 2019

Motivation and Context

Fix potential locking issue reported by debug kernel build.

Description

Booting debug kernel found an inconsistent lock dependency between
dataset's ds_lock and its directory's dd_lock.

[ 32.215336] ======================================================
[ 32.221859] WARNING: possible circular locking dependency detected
[ 32.221861] 4.14.90+ #8 Tainted: G O
[ 32.221862] ------------------------------------------------------
[ 32.221863] dynamic_kernel_/4667 is trying to acquire lock:
[ 32.221864] (&ds->ds_lock){+.+.}, at: []
dsl_dataset_check_quota+0x9e/0x8a0 [zfs]
[ 32.221941] but task is already holding lock:
[ 32.221941] (&dd->dd_lock){+.+.}, at: []
dsl_dir_tempreserve_space+0x3b9/0x1290 [zfs]
[ 32.221983] which lock already depends on the new lock.
[ 32.221983] the existing dependency chain (in reverse order) is:
[ 32.221984] -> #1 (&dd->dd_lock){+.+.}:
[ 32.221992] __mutex_lock+0xef/0x14c0
[ 32.222049] dsl_dir_namelen+0xd4/0x2d0 [zfs]
[ 32.222093] dsl_dataset_namelen+0x2f1/0x430 [zfs]
[ 32.222142] verify_dataset_name_len+0xd/0x40 [zfs]
[ 32.222184] dmu_objset_find_dp_impl+0x5f5/0xef0 [zfs]
[ 32.222226] dmu_objset_find_dp_cb+0x40/0x60 [zfs]
[ 32.222235] taskq_thread+0x969/0x1460 [spl]
[ 32.222238] kthread+0x2fb/0x400
[ 32.222241] ret_from_fork+0x3a/0x50

[ 32.222241] -> #0 (&ds->ds_lock){+.+.}:
[ 32.222246] lock_acquire+0x14f/0x390
[ 32.222248] __mutex_lock+0xef/0x14c0
[ 32.222291] dsl_dataset_check_quota+0x9e/0x8a0 [zfs]
[ 32.222355] dsl_dir_tempreserve_space+0x5d2/0x1290 [zfs]
[ 32.222392] dmu_tx_assign+0xa61/0xdb0 [zfs]
[ 32.222436] zfs_create+0x4e6/0x11d0 [zfs]
[ 32.222481] zpl_create+0x194/0x340 [zfs]
[ 32.222484] lookup_open+0xa86/0x16f0
[ 32.222486] path_openat+0xe56/0x2490
[ 32.222488] do_filp_open+0x17f/0x260
[ 32.222490] do_sys_open+0x195/0x310
[ 32.222491] SyS_open+0xbf/0xf0
[ 32.222494] do_syscall_64+0x191/0x4f0
[ 32.222496] entry_SYSCALL_64_after_hwframe+0x42/0xb7

[ 32.222497] other info that might help us debug this:

[ 32.222497] Possible unsafe locking scenario:
[ 32.222498] CPU0 CPU1
[ 32.222498] ---- ----
[ 32.222499] lock(&dd->dd_lock);
[ 32.222500] lock(&ds->ds_lock);
[ 32.222502] lock(&dd->dd_lock);
[ 32.222503] lock(&ds->ds_lock);
[ 32.222504] *** DEADLOCK ***
[ 32.222505] 3 locks held by dynamic_kernel_/4667:
[ 32.222506] #0: (sb_writers#9){.+.+}, at:
[] mnt_want_write+0x3c/0xa0
[ 32.222511] #1: (&type->i_mutex_dir_key#8){++++}, at:
[] path_openat+0xe2e/0x2490
[ 32.222515] #2: (&dd->dd_lock){+.+.}, at:
[]
dsl_dir_tempreserve_space+0x3b9/0x1290 [zfs]

The issue is caused by dsl_dataset_namelen() holding ds_lock,
followed by acquiring dd_lock on ds->ds_dir in dsl_dir_namelen().

However, ds->ds_dir should not be protected by ds_lock,
so releasing it before call to dsl_dir_namelen() prevents
the lockdep issue.

Signed-off-by: Michael Zhivich mzhivich@akamai.com

How Has This Been Tested?

Booting debug kernel with the patch does not elicit lockdep error.
Tested using Linux 4.14.90 on x86_64 with ZFS 0.7.12.

Types of changes

  • [ x ] Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Performance enhancement (non-breaking change which improves efficiency)
  • Code cleanup (non-breaking change which makes code smaller or more readable)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation (a change to man pages or other documentation)

Checklist:

Booting debug kernel found an inconsistent lock dependency between
dataset's ds_lock and its directory's dd_lock.

[ 32.215336] ======================================================
[ 32.221859] WARNING: possible circular locking dependency detected
[ 32.221861] 4.14.90+ openzfs#8 Tainted: G           O
[ 32.221862] ------------------------------------------------------
[ 32.221863] dynamic_kernel_/4667 is trying to acquire lock:
[ 32.221864]  (&ds->ds_lock){+.+.}, at: [<ffffffffc10a4bde>] dsl_dataset_check_quota+0x9e/0x8a0 [zfs]
[ 32.221941] but task is already holding lock:
[ 32.221941]  (&dd->dd_lock){+.+.}, at: [<ffffffffc10cd8e9>] dsl_dir_tempreserve_space+0x3b9/0x1290 [zfs]
[ 32.221983] which lock already depends on the new lock.
[ 32.221983] the existing dependency chain (in reverse order) is:
[ 32.221984] -> openzfs#1 (&dd->dd_lock){+.+.}:
[ 32.221992] 	__mutex_lock+0xef/0x14c0
[ 32.222049] 	dsl_dir_namelen+0xd4/0x2d0 [zfs]
[ 32.222093] 	dsl_dataset_namelen+0x2f1/0x430 [zfs]
[ 32.222142] 	verify_dataset_name_len+0xd/0x40 [zfs]
[ 32.222184] 	dmu_objset_find_dp_impl+0x5f5/0xef0 [zfs]
[ 32.222226] 	dmu_objset_find_dp_cb+0x40/0x60 [zfs]
[ 32.222235] 	taskq_thread+0x969/0x1460 [spl]
[ 32.222238] 	kthread+0x2fb/0x400
[ 32.222241] 	ret_from_fork+0x3a/0x50

[ 32.222241] -> #0 (&ds->ds_lock){+.+.}:
[ 32.222246] 	lock_acquire+0x14f/0x390
[ 32.222248] 	__mutex_lock+0xef/0x14c0
[ 32.222291] 	dsl_dataset_check_quota+0x9e/0x8a0 [zfs]
[ 32.222355] 	dsl_dir_tempreserve_space+0x5d2/0x1290 [zfs]
[ 32.222392] 	dmu_tx_assign+0xa61/0xdb0 [zfs]
[ 32.222436] 	zfs_create+0x4e6/0x11d0 [zfs]
[ 32.222481] 	zpl_create+0x194/0x340 [zfs]
[ 32.222484] 	lookup_open+0xa86/0x16f0
[ 32.222486] 	path_openat+0xe56/0x2490
[ 32.222488] 	do_filp_open+0x17f/0x260
[ 32.222490] 	do_sys_open+0x195/0x310
[ 32.222491] 	SyS_open+0xbf/0xf0
[ 32.222494] 	do_syscall_64+0x191/0x4f0
[ 32.222496] 	entry_SYSCALL_64_after_hwframe+0x42/0xb7

[ 32.222497] other info that might help us debug this:

[ 32.222497] Possible unsafe locking scenario:
[ 32.222498] CPU0 			CPU1
[ 32.222498] ---- 			----
[ 32.222499] lock(&dd->dd_lock);
[ 32.222500] 				lock(&ds->ds_lock);
[ 32.222502] 				lock(&dd->dd_lock);
[ 32.222503] lock(&ds->ds_lock);
[ 32.222504] *** DEADLOCK ***
[ 32.222505] 3 locks held by dynamic_kernel_/4667:
[ 32.222506] #0: (sb_writers#9){.+.+}, at: [<ffffffffaf68933c>] mnt_want_write+0x3c/0xa0
[ 32.222511] openzfs#1: (&type->i_mutex_dir_key#8){++++}, at: [<ffffffffaf652cde>] path_openat+0xe2e/0x2490
[ 32.222515] openzfs#2: (&dd->dd_lock){+.+.}, at: [<ffffffffc10cd8e9>] dsl_dir_tempreserve_space+0x3b9/0x1290 [zfs]

The issue is caused by dsl_dataset_namelen() holding ds_lock, followed by
acquiring dd_lock on ds->ds_dir in dsl_dir_namelen().

However, ds->ds_dir should not be protected by ds_lock, so releasing it before
call to dsl_dir_namelen() prevents the lockdep issue.

Signed-off-by:  Michael Zhivich <mzhivich@akamai.com>
@behlendorf behlendorf added the Status: Code Review Needed Ready for review and testing label Feb 15, 2019
Copy link
Contributor

@chrisrd chrisrd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Copy link
Contributor

@alek-p alek-p left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and this seems to be more consistent with the comment in dmu_impl.h regarding the locking strategy.
I guess the reason this doesn't blow up more often is we don't end up in the dmu_objset_find_dp() -> ... -> dsl_dir_namelen() path all that much.

@brandonhaberfeld
Copy link

brandonhaberfeld commented Feb 25, 2019 via email

@behlendorf behlendorf added Status: Accepted Ready to integrate (reviewed, tested) and removed Status: Code Review Needed Ready for review and testing labels Mar 11, 2019
@behlendorf behlendorf merged commit 1118f99 into openzfs:master Mar 11, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Status: Accepted Ready to integrate (reviewed, tested)
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants