-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Call commit callbacks from the tail of the list #6986
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for opening this PR and linking to the upstream Lustre issue. We should be able to make this change since this interface is currently only used by Lustre and we never made any promises about the calling order.
@lidongyang @bzzz77 am I correct in assuming the primary benefit from this change is due to reduced lock contention in Lustre thanks to the new fast path in tgt_cb_last_committed()
? Can you share any performance numbers, does this change improve the overall metadata rates?
I also have a few small review comments which need to be addressed.
-
There's a nice comment in
include/sys/dmu.h
above thedmu_tx_callback_register
prototype which describes the callback interface. Please add a few lines which describing the order in which the callbacks are now run. -
Massage the commit message to pass the style checker. You can run
make checkstyle
locally until it passes. -
I happened to notice the
dmu_tx_callback_register()
prototype is declared redundantly ininclude/sys/dmu_tx.h
. Please remove it frominclude/sys/dmu_tx.h
. While you're there also move thedmu_tx_do_callbacks
prototype frominclude/sys/dmu_tx.h
toinclude/sys/dmu_tx.h
so these are all in one place and only one place.
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au>
@behlendorf yep you are right about tgt_cb_last_committed(), and it did improve performance numbers, we got them using mdtest:
Note that mdt will get soft lockup and eventually crash. After the patch:
For the operations which will generate a transaction and a new last_commited, like creation and removal we can see a improvement. for others like stat and read, nothing really changed. It makes sense since the patch only changes trans callbacks. Thanks |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good, thanks for the quick turn around and performance results.
Codecov Report
@@ Coverage Diff @@
## master #6986 +/- ##
==========================================
+ Coverage 75.33% 75.36% +0.02%
==========================================
Files 296 296
Lines 95454 95454
==========================================
+ Hits 71915 71935 +20
+ Misses 23539 23519 -20
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes openzfs#6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes openzfs#6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes openzfs#6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes openzfs#6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes openzfs#6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes openzfs#6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes #6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata workloads while handling transaction callbacks from osd_zfs. The problem is zfs is not taking advantage of the fast path in Lustre's trans callback handling, where Lustre will skip the calls to ptlrpc_commit_replies() when it already saw a higher transaction number. This patch corrects this, it also has a positive impact on metadata performance on Lustre with osd_zfs, plus some cleanup in the headers. A similar issue for ext4/ldiskfs is described on: https://jira.hpdd.intel.com/browse/LU-6527 Reviewed-by: Olaf Faaland <faaland1@llnl.gov> Reviewed-by: Brian Behlendorf <behlendorf1@llnl.gov> Signed-off-by: Li Dongyang <dongyang.li@anu.edu.au> Closes openzfs#6986
Our zfs backed Lustre MDT had soft lockups while under heavy metadata
workload:
[ 3597.867291] NMI watchdog: BUG: soft lockup - CPU#20 stuck for 22s! [tx_commit_cb:67888]
[ 3597.867329] Modules linked in: osp(OE) mdd(OE) lod(OE) mdt(OE) lfsck(OE) mgs(OE) mgc(OE) osd_zfs(OE) lquota(OE) fid(OE) fld(OE) ptlrpc(OE) obdclass(OE) ko2iblnd(OE) lnet(OE) libcfs(OE) bonding rdma_ucm(OE) ib_ucm(OE) rdma_cm(OE) iw_cm(OE) ib_ipoib(OE) ib_cm(OE) ib_uverbs(OE) ib_umad(OE) mlx4_en(OE) mlx4_ib(OE) mlx4_core(OE) dm_mirror dm_region_hash dm_log sb_edac zfs(POE) edac_core intel_powerclamp coretemp zunicode(POE) zavl(POE) intel_rapl icp(POE) iosf_mbi kvm_intel zcommon(POE) znvpair(POE) spl(OE) kvm iTCO_wdt iTCO_vendor_support irqbypass dm_round_robin crc32_pclmul ghash_clmulni_intel sg ipmi_si hpilo hpwdt aesni_intel ipmi_devintf lrw gf128mul glue_helper ipmi_msghandler ioatdma ablk_helper pcspkr i2c_i801 wmi cryptd nfsd dm_multipath lpc_ich shpchp dca acpi_cpufreq acpi_power_meter dm_mod
[ 3597.867344] auth_rpcgss nfs_acl lockd grace sunrpc knem(OE) ip_tables xfs libcrc32c mlx5_ib(OE) ib_core(OE) sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3 mlx5_core(OE) devlink mlx_compat(OE) crct10dif_pclmul ptp crct10dif_common serio_raw crc32c_intel i2c_core pps_core hpsa(OE) scsi_transport_sas
[ 3597.867346] CPU: 20 PID: 67888 Comm: tx_commit_cb Tainted: P OE ------------ 3.10.0-693.2.2.el7.x86_64 #1
[ 3597.867348] Hardware name: HP ProLiant DL360 Gen9/ProLiant DL360 Gen9, BIOS P89 04/25/2017
[ 3597.867349] task: ffff889c3d6f1fa0 ti: ffff8898b7dec000 task.ti: ffff8898b7dec000
[ 3597.867357] RIP: 0010:[] [] native_queued_spin_lock_slowpath+0x116/0x1e0
[ 3597.867358] RSP: 0018:ffff8898b7defcd0 EFLAGS: 00000246
[ 3597.867358] RAX: 0000000000000000 RBX: 000000010001a588 RCX: 0000000000a10000
[ 3597.867359] RDX: ffff88befed97880 RSI: 0000000000c10001 RDI: ffff885e58a78138
[ 3597.867359] RBP: ffff8898b7defcd0 R08: ffff88befec97880 R09: 0000000000000000
[ 3597.867360] R10: 00000000ea2b3a01 R11: ffffea017ba8acc0 R12: ffff885efbf32f40
[ 3597.867361] R13: ffff88bebc38a48d R14: 0000000000016d39 R15: ffffffff811de591
[ 3597.867361] FS: 0000000000000000(0000) GS:ffff88befec80000(0000) knlGS:0000000000000000
[ 3597.867362] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 3597.867362] CR2: 00007fda2a9a2090 CR3: 0000005eef36d000 CR4: 00000000003407e0
[ 3597.867363] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 3597.867363] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 3597.867364] Stack:
[ 3597.867365] ffff8898b7defce0 ffffffff8169e61f ffff8898b7defcf0 ffffffff816abb70
[ 3597.867366] ffff8898b7defd68 ffffffffc136314e ffff885e58a78138 ffff885c6376a650
[ 3597.867367] ffff8898b7defd10 ffff8898b7defd10 0000000000000000 0000000000000000
[ 3597.867367] Call Trace:
[ 3597.867375] [] queued_spin_lock_slowpath+0xb/0xf
[ 3597.867378] [] _raw_spin_lock+0x20/0x30
[ 3597.867445] [] ptlrpc_commit_replies+0x7e/0x380 [ptlrpc]
[ 3597.867481] [] tgt_cb_last_committed+0x2c2/0x3d0 [ptlrpc]
[ 3597.867489] [] osd_trans_commit_cb+0x14b/0x490 [osd_zfs]
[ 3597.867529] [] dmu_tx_do_callbacks+0x44/0x70 [zfs]
[ 3597.867554] [] txg_do_callbacks+0x14/0x30 [zfs]
[ 3597.867561] [] taskq_thread+0x246/0x470 [spl]
[ 3597.867564] [] ? wake_up_state+0x20/0x20
[ 3597.867568] [] ? taskq_thread_spawn+0x60/0x60 [spl]
[ 3597.867571] [] kthread+0xcf/0xe0
[ 3597.867573] [] ? insert_kthread_work+0x40/0x40
[ 3597.867576] [] ret_from_fork+0x58/0x90
[ 3597.867577] [] ? insert_kthread_work+0x40/0x40
[ 3597.867588] Code: 0d 48 98 83 e2 30 48 81 c2 80 78 01 00 48 03 14 c5 e0 fd b0 81 4c 89 02 41 8b 40 08 85 c0 75 0f 0f 1f 44 00 00 f3 90 41 8b 40 08 <85> c0 74 f6 4d 8b 08 4d 85 c9 74 04 41 0f 18 09 8b 17 0f b7 c2
This patch makes zfs commit callbacks work on highest transaction
number first, saving the subsequent calls to ptlrpc_commit_replies
in Lustre, which makes the problem go away.
A similar issue for ext4/ldiskfs is described on:
https://jira.hpdd.intel.com/browse/LU-6527
Signed-off-by: Li Dongyang dongyang.li@anu.edu.au
Description
Motivation and Context
How Has This Been Tested?
Types of changes
Checklist:
Signed-off-by
.