Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Linux lockdep issue with dbuf_find() #15111

Closed
jasimmons1973 opened this issue Jul 26, 2023 · 1 comment
Closed

Linux lockdep issue with dbuf_find() #15111

jasimmons1973 opened this issue Jul 26, 2023 · 1 comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@jasimmons1973
Copy link

System information

Type Version/Name
Distribution Name Enterprise RedHat
Distribution Version RHEL8.5
Kernel Version Special 4.18.0rh8.5-debug kernel with various debug options enabled.
Architecture x86_64
OpenZFS Version v2.1.0-1

Describe the problem you're observing

The linux kernel with lockdep enabled reports a circular locking dependency,

Describe how to reproduce the problem

Importing ZFS targets will trigger a lockdep trace. This is done using the Lustre maloo test suite.
When bringing up a Lustre file system using a ZFS backend will report this lockdep issue.

Include any warning/errors/backtraces from the system logs

[ 278.949677] ZFS: Loaded module v2.1.0-1, ZFS pool version 5000, ZFS filesystem version 5
[ 294.483979] vdc: vdc1 vdc9
[ 294.516686] vdc: vdc1 vdc9
[ 294.982927]
[ 294.984075] ======================================================
[ 294.986439] WARNING: possible circular locking dependency detected
[ 294.988942] 4.18.0rh8.5-debug #2 Tainted: G O --------- - -
[ 294.992057] ------------------------------------------------------
[ 294.995272] txg_sync/3186 is trying to acquire lock:
[ 294.997737] ffffffffa1c7e0a8 (&h->hash_mutexes[i]){+.+.}-{3:3}, at: dbuf_find+0xb0/0x430 [zfs]
[ 295.002548]
[ 295.002548] but task is already holding lock:
[ 295.005195] ffff88800e175330 (&db->db_mtx){+.+.}-{3:3}, at: dbuf_find+0x17e/0x430 [zfs]
[ 295.009069]
[ 295.009069] which lock already depends on the new lock.
[ 295.009069]
[ 295.013166]
[ 295.013166] the existing dependency chain (in reverse order) is:
[ 295.016853]
[ 295.016853] -> #1 (&db->db_mtx){+.+.}-{3:3}:
[ 295.019341] __lock_acquire+0x67f/0xe40
[ 295.021573] lock_acquire+0x16a/0x6f0
[ 295.023646] __mutex_lock+0xc9/0x10c0
[ 295.025696] mutex_lock_nested+0x27/0x30
[ 295.028148] dbuf_create+0x500/0xf30 [zfs]
[ 295.030595] dbuf_hold_impl+0x394/0xbc0 [zfs]
[ 295.032758] dbuf_hold+0x34/0x70 [zfs]
[ 295.034596] dnode_hold_impl+0x158/0x1550 [zfs]
[ 295.036656] dmu_object_claim_dnsize+0xa7/0x180 [zfs]
[ 295.038899] zap_create_claim_norm_dnsize+0x45/0xf0 [zfs]
[ 295.041343] zap_create_claim+0x20/0x30 [zfs]
[ 295.043173] dsl_pool_create+0xc8/0x600 [zfs]
[ 295.045015] spa_create+0xbf7/0x1580 [zfs]
[ 295.046981] zfs_ioc_pool_create+0x370/0x480 [zfs]
[ 295.048908] zfsdev_ioctl_common+0x859/0xc20 [zfs]
[ 295.051010] zfsdev_ioctl+0x6b/0x130 [zfs]
[ 295.052818] do_vfs_ioctl+0xad/0xc40
[ 295.054168] ksys_ioctl+0x84/0xd0
[ 295.055576] __x64_sys_ioctl+0x1e/0x30
[ 295.057054] do_syscall_64+0xd4/0x5a0
[ 295.058171] entry_SYSCALL_64_after_hwframe+0x6a/0xdf
[ 295.060130]
[ 295.060130] -> #0 (&h->hash_mutexes[i]){+.+.}-{3:3}:
[ 295.062108] check_prev_add+0x69/0x820
[ 295.063499] validate_chain+0x9cd/0xf70
[ 295.064637] __lock_acquire+0x67f/0xe40
[ 295.065852] lock_acquire+0x16a/0x6f0
[ 295.067076] __mutex_lock+0xc9/0x10c0
[ 295.068348] mutex_lock_nested+0x27/0x30
[ 295.069658] dbuf_find+0xb0/0x430 [zfs]
[ 295.070991] dnode_increase_indirection.isra.7+0xf2/0x7d0 [zfs]
[ 295.073393] dnode_sync+0xe8c/0x12f0 [zfs]
[ 295.075094] dmu_objset_sync+0x1f4/0x6e0 [zfs]
[ 295.076957] dsl_dataset_sync+0x7d/0x310 [zfs]
[ 295.078887] dsl_pool_sync+0x130/0xb10 [zfs]
[ 295.080791] spa_sync_iterate_to_convergence+0xfc/0x6f0 [zfs]
[ 295.083072] spa_sync+0x8a1/0x12c0 [zfs]
[ 295.084631] txg_sync_thread+0x417/0xb50 [zfs]
[ 295.086423] thread_generic_wrapper+0xac/0x100 [spl]
[ 295.088566] kthread+0x1ae/0x1e0
[ 295.089900] ret_from_fork+0x24/0x50
[ 295.091545]
[ 295.091545] other info that might help us debug this:
[ 295.091545]
[ 295.094729] Possible unsafe locking scenario:
[ 295.094729]
[ 295.098105] CPU0 CPU1
[ 295.100362] ---- ----
[ 295.102877] lock(&db->db_mtx);
[ 295.104403] lock(&h->hash_mutexes[i]);
[ 295.107365] lock(&db->db_mtx);
[ 295.109499] lock(&h->hash_mutexes[i]);
[ 295.110778]
[ 295.110778] *** DEADLOCK ***
[ 295.110778]
[ 295.112614] 1 lock held by txg_sync/3186:
[ 295.113961] #0: ffff88800e175330 (&db->db_mtx){+.+.}-{3:3}, at: dbuf_find+0x17e/0x430 [zfs]
[ 295.116760]
[ 295.116760] stack backtrace:
[ 295.118237] CPU: 3 PID: 3186 Comm: txg_sync Kdump: loaded Tainted: G O --------- - - 4.18.0rh8.5-debug #2
[ 295.121898] Hardware name: Bochs Bochs, BIOS Bochs 01/01/2011
[ 295.123757] Call Trace:
[ 295.124502] ? dump_stack+0x119/0x18e
[ 295.125654] ? print_circular_bug.isra.28.cold.41+0x239/0x253
[ 295.127162] ? check_noncircular+0x1c4/0x200
[ 295.128429] ? __change_page_attr+0x117e/0x1350
[ 295.130032] ? mark_lock+0x5f/0x750
[ 295.131149] ? check_prev_add+0x69/0x820
[ 295.132396] ? mark_lock+0x5f/0x750
[ 295.133539] ? validate_chain+0x9cd/0xf70
[ 295.134877] ? __lock_acquire+0x67f/0xe40
[ 295.136222] ? lock_acquire+0x16a/0x6f0
[ 295.137704] ? dbuf_find+0xb0/0x430 [zfs]
[ 295.139234] ? dbuf_find+0xb0/0x430 [zfs]
[ 295.140672] ? dbuf_find+0xb0/0x430 [zfs]
[ 295.141954] ? __mutex_lock+0xc9/0x10c0
[ 295.143282] ? dbuf_find+0xb0/0x430 [zfs]
[ 295.144601] ? dbuf_find+0x3d4/0x430 [zfs]
[ 295.145943] ? kvm_sched_clock_read+0x2c/0x50
[ 295.147384] ? dbuf_find+0x414/0x430 [zfs]
[ 295.148558] ? mutex_lock_nested+0x27/0x30
[ 295.149836] ? _raw_spin_unlock+0x3f/0x60
[ 295.150985] ? mutex_lock_nested+0x27/0x30
[ 295.152474] ? dbuf_find+0xb0/0x430 [zfs]
[ 295.153890] ? dnode_increase_indirection.isra.7+0xf2/0x7d0 [zfs]
[ 295.155966] ? dnode_sync+0x3f3/0x12f0 [zfs]
[ 295.157450] ? dnode_sync+0xe8c/0x12f0 [zfs]
[ 295.158908] ? dmu_objset_sync+0x1c2/0x6e0 [zfs]
[ 295.160406] ? dmu_objset_sync+0x1f4/0x6e0 [zfs]
[ 295.161819] ? kvm_sched_clock_read+0x2c/0x50
[ 295.163324] ? dsl_dataset_sync+0x7d/0x310 [zfs]
[ 295.164954] ? dsl_pool_sync+0x130/0xb10 [zfs]
[ 295.166489] ? spa_sync_iterate_to_convergence+0xfc/0x6f0 [zfs]
[ 295.168603] ? spa_sync+0x8a1/0x12c0 [zfs]
[ 295.170081] ? spa_txg_history_init_io+0xfe/0x140 [zfs]
[ 295.171990] ? txg_sync_thread+0x417/0xb50 [zfs]
[ 295.173529] ? kvm_sched_clock_read+0x2c/0x50
[ 295.175177] ? txg_quiesce_thread+0xe80/0xe80 [zfs]
[ 295.176835] ? __thread_exit+0x30/0x30 [spl]
[ 295.178311] ? thread_generic_wrapper+0xac/0x100 [spl]
[ 295.180068] ? kthread+0x1ae/0x1e0
[ 295.181369] ? kthread_create_worker+0x90/0x90
[ 295.183016] ? ret_from_fork+0x24/0x50
[ 304.272096] vde: vde1 vde9
[ 314.421275] vdf: vdf1 vdf9

You can see the full test logs here:

http://testing.linuxhacker.ru/lustre-reports/32842/results-retry2.html

@jasimmons1973 jasimmons1973 added the Type: Defect Incorrect behavior (e.g. crash, hang) label Jul 26, 2023
@jasimmons1973
Copy link
Author

git commit abec7dc resolves this issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

1 participant