Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spl_taskq_expand() scheduling while atomic #12696

Closed
behlendorf opened this issue Oct 26, 2021 · 14 comments
Closed

spl_taskq_expand() scheduling while atomic #12696

behlendorf opened this issue Oct 26, 2021 · 14 comments
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)

Comments

@behlendorf
Copy link
Contributor

System information

Type Version/Name
Distribution Name Ubuntu
Distribution Version
Kernel Version 5.8.0-59-lowlatency #66~20.04.1-Ubuntu
Architecture
OpenZFS Version 2.1

Describe the problem you're observing

Reported in this #12664 (comment). This issue was introduced by the taskq hotplug change, 60a4c7d, which calls kthread_create() from an atomic context. This debug warning is enabled when the kernel is built with CONFIG_DEBUG_ATOMIC_SLEEP.

Oct 25 12:37:48 foxbox kernel: [103815.516588] CPU: 1 PID: 15 Comm: cpuhp/1 Tainted: P           OE     5
.8.0-59-lowlatency #66~20.04.1-Ubuntu
Oct 25 12:37:48 foxbox kernel: [103815.516589] Hardware name: Gigabyte Technology Co., Ltd. Z68MA-D2H-B3/
Z68MA-D2H-B3, BIOS F10 02/23/2012
Oct 25 12:37:48 foxbox kernel: [103815.516590] Call Trace:
Oct 25 12:37:48 foxbox kernel: [103815.516599]  dump_stack+0x74/0x92
Oct 25 12:37:48 foxbox kernel: [103815.516602]  __schedule_bug.cold+0x4c/0x5d
Oct 25 12:37:48 foxbox kernel: [103815.516606]  __schedule+0x789/0xa70
Oct 25 12:37:48 foxbox kernel: [103815.516609]  schedule+0x6e/0xe0
Oct 25 12:37:48 foxbox kernel: [103815.516611]  schedule_timeout+0x132/0x180
Oct 25 12:37:48 foxbox kernel: [103815.516615]  ? ttwu_queue_wakelist+0x9b/0xd0
Oct 25 12:37:48 foxbox kernel: [103815.516618]  wait_for_completion_killable+0x8a/0x140
Oct 25 12:37:48 foxbox kernel: [103815.516622]  __kthread_create_on_node+0xd6/0x1c0
Oct 25 12:37:48 foxbox kernel: [103815.516630]  ? task_done+0xb0/0xb0 [spl]
Oct 25 12:37:48 foxbox kernel: [103815.516633]  ? string_nocheck+0x4f/0x60
Oct 25 12:37:48 foxbox kernel: [103815.516639]  ? task_done+0xb0/0xb0 [spl]
Oct 25 12:37:48 foxbox kernel: [103815.516645]  ? taskq_thread_create+0xf0/0xf0 [spl]
Oct 25 12:37:48 foxbox kernel: [103815.516647]  kthread_create_on_node+0x49/0x60
Oct 25 12:37:48 foxbox kernel: [103815.516652]  ? task_done+0xb0/0xb0 [spl]
Oct 25 12:37:48 foxbox kernel: [103815.516658]  spl_kthread_create+0x88/0xd0 [spl]
Oct 25 12:37:48 foxbox kernel: [103815.516665]  taskq_thread_create+0x6a/0xf0 [spl]
Oct 25 12:37:48 foxbox kernel: [103815.516671]  spl_taskq_expand+0xc0/0xd0 [spl]
Oct 25 12:37:48 foxbox kernel: [103815.516675]  cpuhp_invoke_callback+0x248/0x630
Oct 25 12:37:48 foxbox kernel: [103815.516678]  cpuhp_thread_fun+0xb8/0x120
Oct 25 12:37:48 foxbox kernel: [103815.516681]  smpboot_thread_fn+0xfc/0x1f0
Oct 25 12:37:48 foxbox kernel: [103815.516683]  ? sort_range+0x30/0x30
Oct 25 12:37:48 foxbox kernel: [103815.516685]  kthread+0x129/0x170
Oct 25 12:37:48 foxbox kernel: [103815.516686]  ? kthread_park+0x90/0x90
Oct 25 12:37:48 foxbox kernel: [103815.516690]  ret_from_fork+0x1f/0x30
Oct 25 12:37:48 foxbox kernel: [103815.517031] BUG: scheduling while atomic: cpuhp/1/15/0x00000000

Describe how to reproduce the problem

It should be possible to reproduce this warning on any kernel with CONFIG_DEBUG_ATOMIC_SLEEP enabled when handling a hotplug event. The issue is that spl_taskq_expand() takes the taskq's tq_lock spinlock then calls spl_kthread_create() which may sleep. The thread creation needs to be moved outside the atomic spinlock context to resolve this. Based on the documentation I was able to find sleeping in the hotplug callback should be fine.

@behlendorf behlendorf added the Type: Defect Incorrect behavior (e.g. crash, hang) label Oct 26, 2021
@pcd1193182
Copy link
Contributor

I believe this diff should solve the problem; I'll post a PR once I have time to run the manual tests:

diff --git a/module/os/linux/spl/spl-taskq.c b/module/os/linux/spl/spl-taskq.c
index 61631256c8..7d1a73c738 100644
--- a/module/os/linux/spl/spl-taskq.c
+++ b/module/os/linux/spl/spl-taskq.c
@@ -1298,8 +1298,10 @@ spl_taskq_expand(unsigned int cpu, struct hlist_node *node)
 	ASSERT(tq);
 	spin_lock_irqsave_nested(&tq->tq_lock, flags, tq->tq_lock_class);
 
-	if (!(tq->tq_flags & TASKQ_ACTIVE))
-		goto out;
+	if (!(tq->tq_flags & TASKQ_ACTIVE)) {
+		spin_unlock_irqrestore(&tq->tq_lock, flags);
+		return (err);
+	}
 
 	ASSERT(tq->tq_flags & TASKQ_THREADS_CPU_PCT);
 	int nthreads = MIN(tq->tq_cpu_pct, 100);
@@ -1308,13 +1310,11 @@ spl_taskq_expand(unsigned int cpu, struct hlist_node *node)
 
 	if (!((tq->tq_flags & TASKQ_DYNAMIC) && spl_taskq_thread_dynamic) &&
 	    tq->tq_maxthreads > tq->tq_nthreads) {
-		ASSERT3U(tq->tq_maxthreads, ==, tq->tq_nthreads + 1);
+		spin_unlock_irqrestore(&tq->tq_lock, flags);
 		taskq_thread_t *tqt = taskq_thread_create(tq);
 		if (tqt == NULL)
 			err = -1;
+		return (err);
 	}
-
-out:
 	spin_unlock_irqrestore(&tq->tq_lock, flags);
 	return (err);
 }

@adamdmoss
Copy link
Contributor

Wild, thanks for the quick attention! I'll give the patch a try.

@pcd1193182
Copy link
Contributor

pcd1193182 commented Oct 27, 2021

No problem. It may be a bit before I have time to set up and run the manual cpu hotplug tests, with the dev summit coming up, but it's on my todo list. If you have a setup where you can test that yourself, that would also work, but if not I'll get to it.

@AttilaFueloep
Copy link
Contributor

Is #12642 also an incarnation of this?

@adamdmoss
Copy link
Contributor

With a couple of quick suspend/resumes this appears to have fixed the problem - no 'BUG:' whinging in logs and resume is successful.

If you have a setup where you can test that yourself

Afraid not. I think I'm only seeing this at all because (presumably) Linux soft-unplugs and replugs the CPU(s) over suspend-resume.

@adamdmoss
Copy link
Contributor

Is #12642 also an incarnation of this?

Sure looks like it to me.

@behlendorf
Copy link
Contributor Author

It sure is, I've closed it as a duplicate of this issue. Thanks for noticing that.

@ff1bit
Copy link

ff1bit commented Oct 28, 2021

I compiled the latest zfs (head) with the patch in #12696 (comment) and the logs are now clean (no more scheduling while atomic errors) after resuming from suspend (kernel 5.14.11-arch1-1)
Thank you everyone for the quick fix

@sunilvaltix
Copy link

Is this marked to land in 2.1.x in near future?

@RJVB
Copy link

RJVB commented Jan 27, 2022

Other inquiring minds like to know this too. I'm seeing what I presume is the same bug after resuming from sleep: I get the backtrace 3 or 4 times in the syslog, curiously always for CPU (core) 3. I have yet to notice any other effects, it looks like the log entry is just a (debug) message about something that is being handled.

I do also see this before each backtrace:

Jan 27 11:48:07 Bola kernel: [259714.830485] Preemption disabled at:
Jan 27 11:48:07 Bola kernel: [259714.830491] [<          (null)>]           (null)
Jan 27 11:48:07 Bola kernel: [259714.830503] CPU: 3 PID: 26 Comm: cpuhp/3 Tainted: P        W  OE   4.14.23-ck1-mainline-core2-rjvb #1

I suppose I would have noticed if preemption remained disabled...?

@satmandu
Copy link
Contributor

Enabling zswap seems to make this appear much more quickly.

[ 3403.549232] BUG: scheduling while atomic: cpuhp/7/54/0x00000000
[ 3403.550228] Modules linked in: rfcomm snd_seq_dummy snd_hrtimer nls_utf8 cifs cifs_arc4 cifs_md4 fscache netfs qrtr cmac algif_hash algif_skcipher af_alg bnep nls_iso8859_1 intel_rapl_msr mei_pxp intel_rapl_common mei_hdcp joydev x86_pkg_temp_thermal intel_powerclamp snd_hda_codec_cirrus snd_hda_codec_generic ledtrig_audio snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg snd_intel_sdw_acpi snd_hda_codec applesmc snd_hda_core kvm_intel snd_hwdep snd_pcm kvm btusb wl(POE) snd_seq_midi btrtl snd_seq_midi_event snd_rawmidi btbcm btintel btmtk snd_seq bluetooth input_leds ecdh_generic rapl snd_seq_device ecc intel_cstate snd_timer bcm5974 apple_mfi_fastcharge cfg80211 snd mei_me efi_pstore mei soundcore sbs acpi_als sbshc industrialio_triggered_buffer apple_gmux kfifo_buf industrialio mac_hid apple_bl sch_fq_codel coretemp ipmi_devintf ipmi_msghandler msr parport_pc ppdev lp parport ip_tables x_tables autofs4 zfs(POE) zunicode(POE) zzstd(OE) zlua(OE) zcommon(POE) znvpair(POE) zavl(POE)
[ 3403.614873]  icp(POE) spl(OE) hid_apple hid_generic uas usbhid usb_storage hid nouveau mxm_wmi wmi i2c_algo_bit drm_ttm_helper ttm drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops cec rc_core crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel crypto_simd cryptd nvme i2c_i801 drm nvme_core thunderbolt xhci_pci lpc_ich i2c_smbus xhci_pci_renesas video z3fold lz4 lz4_compress
[ 3403.643031] CPU: 7 PID: 54 Comm: cpuhp/7 Tainted: P        W  OE     5.17.0-051700rc5-generic #202202202230
[ 3403.644774] Hardware name: Apple Inc. MacBookPro11,3/Mac-2BD1B31983FE1663, BIOS 432.60.3.0.0 10/27/2021
[ 3403.645714] Call Trace:
[ 3403.646263]  <TASK>
[ 3403.646889]  show_stack+0x52/0x58
[ 3403.649106]  dump_stack_lvl+0x4c/0x63
[ 3403.650946]  dump_stack+0x10/0x12
[ 3403.652576]  __schedule_bug.cold+0x50/0x5e
[ 3403.654744]  __schedule+0x463/0x5a0
[ 3403.657158]  schedule+0x55/0xd0
[ 3403.658982]  schedule_timeout+0x114/0x150
[ 3403.661395]  ? try_to_wake_up+0x210/0x600
[ 3403.663623]  wait_for_completion+0x8b/0xf0
[ 3403.666105]  devtmpfs_submit_req+0x73/0x90
[ 3403.668257]  devtmpfs_create_node+0x94/0xc0
[ 3403.671162]  device_add+0x72f/0x7b0
[ 3403.674197]  device_create_groups_vargs+0xd4/0xf0
[ 3403.676667]  ? msr_devnode+0x30/0x30 [msr]
[ 3403.679056]  device_create+0x49/0x60
[ 3403.681509]  ? _raw_spin_lock_irqsave+0xe/0x10
[ 3403.683717]  ? spl_taskq_expand+0x8c/0xd0 [spl]
[ 3403.692882]  msr_device_create+0x2b/0x40 [msr]
[ 3403.695259]  cpuhp_invoke_callback+0x108/0x4c0
[ 3403.697167]  ? taskq_thread_create+0xf0/0xf0 [spl]
[ 3403.706734]  cpuhp_thread_fun+0xc0/0x1b0
[ 3403.708975]  ? smpboot_register_percpu_thread+0x130/0x130
[ 3403.710969]  smpboot_thread_fn+0xe0/0x1d0
[ 3403.713042]  kthread+0xeb/0x120
[ 3403.714955]  ? kthread_complete_and_exit+0x20/0x20
[ 3403.717390]  ret_from_fork+0x1f/0x30
[ 3403.720924]  </TASK>

@RJVB
Copy link

RJVB commented Feb 25, 2022

I have zswap enabled and can confirm that #12696 is a sufficient fix even in that case.

@behlendorf
Copy link
Contributor Author

Thanks for noticing this wasn't in the 2.1.x patch stack, we'll add it to the list.

@behlendorf
Copy link
Contributor Author

Closing since this was fixed in master.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Type: Defect Incorrect behavior (e.g. crash, hang)
Projects
None yet
Development

No branches or pull requests

8 participants