-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem 5.4.21-bpi-r2-main #69
Comments
you have already installed this kernel-version...you need to uninstall first to install again |
Problem with fixdep is maybe because it's not a armhf binary...remove it and maybe start compilation of kernel so that this binary is build directly on r2 have no idea why wireguard takes 4.19.x source instead of 5.4... |
ghost
closed this as completed
Apr 23, 2020
frank-w
pushed a commit
that referenced
this issue
Nov 2, 2020
[ Upstream commit 0ca9454 ] In list_add, the first variable is the new node and the second is the list head. The function is called with a wrong order causing NULL dereference: [ 15.527030] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 15.542317] Mem abort info: [ 15.545152] ESR = 0x96000044 [ 15.548248] EC = 0x25: DABT (current EL), IL = 32 bits [ 15.553624] SET = 0, FnV = 0 [ 15.556715] EA = 0, S1PTW = 0 [ 15.559892] Data abort info: [ 15.562799] ISV = 0, ISS = 0x00000044 [ 15.566678] CM = 0, WnR = 1 [ 15.569683] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001373f0000 [ 15.576196] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [ 15.583101] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 15.588747] Modules linked in: mtk_mdp(+) cfg80211 v4l2_mem2mem videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common vide odev mt8173_rt5650 smsc95xx usbnet ecdh_generic ecc snd_soc_rt5645 mc mt8173_afe_pcm rfkill cros_ec_sensors snd_soc_mtk_common elan_i2c crct10dif_ce cros_ec_se nsors_core snd_soc_rl6231 elants_i2c industrialio_triggered_buffer kfifo_buf mtk_vpu cros_ec_chardev cros_usbpd_charger cros_usbpd_logger sbs_battery display_c onnector pwm_bl ip_tables x_tables ipv6 [ 15.634295] CPU: 0 PID: 188 Comm: systemd-udevd Not tainted 5.9.0-rc2+ #69 [ 15.641242] Hardware name: Google Elm (DT) [ 15.645381] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) [ 15.651022] pc : mtk_mdp_probe+0x134/0x3a8 [mtk_mdp] [ 15.656041] lr : mtk_mdp_probe+0x128/0x3a8 [mtk_mdp] [ 15.661055] sp : ffff80001255b910 [ 15.669548] x29: ffff80001255b910 x28: 0000000000000000 [ 15.679973] x27: ffff800009089bf8 x26: ffff0000fafde800 [ 15.690347] x25: ffff0000ff7d2768 x24: ffff800009089010 [ 15.700670] x23: ffff0000f01a7cd8 x22: ffff0000fafde810 [ 15.710940] x21: ffff0000f01a7c80 x20: ffff0000f0c3c180 [ 15.721148] x19: ffff0000ff7f1618 x18: 0000000000000010 [ 15.731289] x17: 0000000000000000 x16: 0000000000000000 [ 15.741375] x15: 0000000000aaaaaa x14: 0000000000000020 [ 15.751399] x13: 00000000ffffffff x12: 0000000000000020 [ 15.761363] x11: 0000000000000028 x10: 0101010101010101 [ 15.771279] x9 : 0000000000000004 x8 : 7f7f7f7f7f7f7f7f [ 15.781148] x7 : 646bff6171606b2b x6 : 0000000000806d65 [ 15.790981] x5 : ffff0000ff7f8360 x4 : 0000000000000000 [ 15.800767] x3 : 0000000000000004 x2 : 0000000000000001 [ 15.810501] x1 : 0000000000000005 x0 : 0000000000000000 [ 15.820171] Call trace: [ 15.826944] mtk_mdp_probe+0x134/0x3a8 [mtk_mdp] [ 15.835908] platform_drv_probe+0x54/0xa8 [ 15.844247] really_probe+0xe4/0x3b0 [ 15.852104] driver_probe_device+0x58/0xb8 [ 15.860457] device_driver_attach+0x74/0x80 [ 15.868854] __driver_attach+0x58/0xe0 [ 15.876770] bus_for_each_dev+0x70/0xc0 [ 15.884726] driver_attach+0x24/0x30 [ 15.892374] bus_add_driver+0x14c/0x1f0 [ 15.900295] driver_register+0x64/0x120 [ 15.908168] __platform_driver_register+0x48/0x58 [ 15.916864] mtk_mdp_driver_init+0x20/0x1000 [mtk_mdp] [ 15.925943] do_one_initcall+0x54/0x1b4 [ 15.933662] do_init_module+0x54/0x200 [ 15.941246] load_module+0x1cf8/0x22d0 [ 15.948798] __do_sys_finit_module+0xd8/0xf0 [ 15.956829] __arm64_sys_finit_module+0x20/0x30 [ 15.965082] el0_svc_common.constprop.0+0x6c/0x168 [ 15.973527] do_el0_svc+0x24/0x90 [ 15.980403] el0_sync_handler+0x90/0x198 [ 15.987867] el0_sync+0x158/0x180 [ 15.994653] Code: 9400014b 2a0003fc 35000920 f9400280 (f9000417) [ 16.004299] ---[ end trace 76fee0203f9898e5 ]--- Fixes: 86698b9 ("media: mtk-mdp: convert mtk_mdp_dev.comp array to list") Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 2, 2020
[ Upstream commit bad60b8 ] The idx in __ath10k_htt_rx_ring_fill_n function lives in consistent dma region writable by the device. Malfunctional or malicious device could manipulate such idx to have a OOB write. Either by htt->rx_ring.netbufs_ring[idx] = skb; or by ath10k_htt_set_paddrs_ring(htt, paddr, idx); The idx can also be negative as it's signed, giving a large memory space to write to. It's possibly exploitable by corruptting a legit pointer with a skb pointer. And then fill skb with payload as rougue object. Part of the log here. Sometimes it appears as UAF when writing to a freed memory by chance. [ 15.594376] BUG: unable to handle page fault for address: ffff887f5c1804f0 [ 15.595483] #PF: supervisor write access in kernel mode [ 15.596250] #PF: error_code(0x0002) - not-present page [ 15.597013] PGD 0 P4D 0 [ 15.597395] Oops: 0002 [#1] SMP KASAN PTI [ 15.597967] CPU: 0 PID: 82 Comm: kworker/u2:2 Not tainted 5.6.0 #69 [ 15.598843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 15.600438] Workqueue: ath10k_wq ath10k_core_register_work [ath10k_core] [ 15.601389] RIP: 0010:__ath10k_htt_rx_ring_fill_n (linux/drivers/net/wireless/ath/ath10k/htt_rx.c:173) ath10k_core Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200623221105.3486-1-bruceshenzk@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 3, 2020
[ Upstream commit 0ca9454 ] In list_add, the first variable is the new node and the second is the list head. The function is called with a wrong order causing NULL dereference: [ 15.527030] Unable to handle kernel NULL pointer dereference at virtual address 0000000000000008 [ 15.542317] Mem abort info: [ 15.545152] ESR = 0x96000044 [ 15.548248] EC = 0x25: DABT (current EL), IL = 32 bits [ 15.553624] SET = 0, FnV = 0 [ 15.556715] EA = 0, S1PTW = 0 [ 15.559892] Data abort info: [ 15.562799] ISV = 0, ISS = 0x00000044 [ 15.566678] CM = 0, WnR = 1 [ 15.569683] user pgtable: 4k pages, 48-bit VAs, pgdp=00000001373f0000 [ 15.576196] [0000000000000008] pgd=0000000000000000, p4d=0000000000000000 [ 15.583101] Internal error: Oops: 96000044 [#1] PREEMPT SMP [ 15.588747] Modules linked in: mtk_mdp(+) cfg80211 v4l2_mem2mem videobuf2_vmalloc videobuf2_dma_contig videobuf2_memops videobuf2_v4l2 videobuf2_common vide odev mt8173_rt5650 smsc95xx usbnet ecdh_generic ecc snd_soc_rt5645 mc mt8173_afe_pcm rfkill cros_ec_sensors snd_soc_mtk_common elan_i2c crct10dif_ce cros_ec_se nsors_core snd_soc_rl6231 elants_i2c industrialio_triggered_buffer kfifo_buf mtk_vpu cros_ec_chardev cros_usbpd_charger cros_usbpd_logger sbs_battery display_c onnector pwm_bl ip_tables x_tables ipv6 [ 15.634295] CPU: 0 PID: 188 Comm: systemd-udevd Not tainted 5.9.0-rc2+ #69 [ 15.641242] Hardware name: Google Elm (DT) [ 15.645381] pstate: 20000005 (nzCv daif -PAN -UAO BTYPE=--) [ 15.651022] pc : mtk_mdp_probe+0x134/0x3a8 [mtk_mdp] [ 15.656041] lr : mtk_mdp_probe+0x128/0x3a8 [mtk_mdp] [ 15.661055] sp : ffff80001255b910 [ 15.669548] x29: ffff80001255b910 x28: 0000000000000000 [ 15.679973] x27: ffff800009089bf8 x26: ffff0000fafde800 [ 15.690347] x25: ffff0000ff7d2768 x24: ffff800009089010 [ 15.700670] x23: ffff0000f01a7cd8 x22: ffff0000fafde810 [ 15.710940] x21: ffff0000f01a7c80 x20: ffff0000f0c3c180 [ 15.721148] x19: ffff0000ff7f1618 x18: 0000000000000010 [ 15.731289] x17: 0000000000000000 x16: 0000000000000000 [ 15.741375] x15: 0000000000aaaaaa x14: 0000000000000020 [ 15.751399] x13: 00000000ffffffff x12: 0000000000000020 [ 15.761363] x11: 0000000000000028 x10: 0101010101010101 [ 15.771279] x9 : 0000000000000004 x8 : 7f7f7f7f7f7f7f7f [ 15.781148] x7 : 646bff6171606b2b x6 : 0000000000806d65 [ 15.790981] x5 : ffff0000ff7f8360 x4 : 0000000000000000 [ 15.800767] x3 : 0000000000000004 x2 : 0000000000000001 [ 15.810501] x1 : 0000000000000005 x0 : 0000000000000000 [ 15.820171] Call trace: [ 15.826944] mtk_mdp_probe+0x134/0x3a8 [mtk_mdp] [ 15.835908] platform_drv_probe+0x54/0xa8 [ 15.844247] really_probe+0xe4/0x3b0 [ 15.852104] driver_probe_device+0x58/0xb8 [ 15.860457] device_driver_attach+0x74/0x80 [ 15.868854] __driver_attach+0x58/0xe0 [ 15.876770] bus_for_each_dev+0x70/0xc0 [ 15.884726] driver_attach+0x24/0x30 [ 15.892374] bus_add_driver+0x14c/0x1f0 [ 15.900295] driver_register+0x64/0x120 [ 15.908168] __platform_driver_register+0x48/0x58 [ 15.916864] mtk_mdp_driver_init+0x20/0x1000 [mtk_mdp] [ 15.925943] do_one_initcall+0x54/0x1b4 [ 15.933662] do_init_module+0x54/0x200 [ 15.941246] load_module+0x1cf8/0x22d0 [ 15.948798] __do_sys_finit_module+0xd8/0xf0 [ 15.956829] __arm64_sys_finit_module+0x20/0x30 [ 15.965082] el0_svc_common.constprop.0+0x6c/0x168 [ 15.973527] do_el0_svc+0x24/0x90 [ 15.980403] el0_sync_handler+0x90/0x198 [ 15.987867] el0_sync+0x158/0x180 [ 15.994653] Code: 9400014b 2a0003fc 35000920 f9400280 (f9000417) [ 16.004299] ---[ end trace 76fee0203f9898e5 ]--- Fixes: 86698b9 ("media: mtk-mdp: convert mtk_mdp_dev.comp array to list") Signed-off-by: Dafna Hirschfeld <dafna.hirschfeld@collabora.com> Reviewed-by: Matthias Brugger <matthias.bgg@gmail.com> Tested-by: Enric Balletbo i Serra <enric.balletbo@collabora.com> Signed-off-by: Hans Verkuil <hverkuil-cisco@xs4all.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab+huawei@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 3, 2020
[ Upstream commit bad60b8 ] The idx in __ath10k_htt_rx_ring_fill_n function lives in consistent dma region writable by the device. Malfunctional or malicious device could manipulate such idx to have a OOB write. Either by htt->rx_ring.netbufs_ring[idx] = skb; or by ath10k_htt_set_paddrs_ring(htt, paddr, idx); The idx can also be negative as it's signed, giving a large memory space to write to. It's possibly exploitable by corruptting a legit pointer with a skb pointer. And then fill skb with payload as rougue object. Part of the log here. Sometimes it appears as UAF when writing to a freed memory by chance. [ 15.594376] BUG: unable to handle page fault for address: ffff887f5c1804f0 [ 15.595483] #PF: supervisor write access in kernel mode [ 15.596250] #PF: error_code(0x0002) - not-present page [ 15.597013] PGD 0 P4D 0 [ 15.597395] Oops: 0002 [#1] SMP KASAN PTI [ 15.597967] CPU: 0 PID: 82 Comm: kworker/u2:2 Not tainted 5.6.0 #69 [ 15.598843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 15.600438] Workqueue: ath10k_wq ath10k_core_register_work [ath10k_core] [ 15.601389] RIP: 0010:__ath10k_htt_rx_ring_fill_n (linux/drivers/net/wireless/ath/ath10k/htt_rx.c:173) ath10k_core Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200623221105.3486-1-bruceshenzk@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 11, 2020
[ Upstream commit bad60b8 ] The idx in __ath10k_htt_rx_ring_fill_n function lives in consistent dma region writable by the device. Malfunctional or malicious device could manipulate such idx to have a OOB write. Either by htt->rx_ring.netbufs_ring[idx] = skb; or by ath10k_htt_set_paddrs_ring(htt, paddr, idx); The idx can also be negative as it's signed, giving a large memory space to write to. It's possibly exploitable by corruptting a legit pointer with a skb pointer. And then fill skb with payload as rougue object. Part of the log here. Sometimes it appears as UAF when writing to a freed memory by chance. [ 15.594376] BUG: unable to handle page fault for address: ffff887f5c1804f0 [ 15.595483] #PF: supervisor write access in kernel mode [ 15.596250] #PF: error_code(0x0002) - not-present page [ 15.597013] PGD 0 P4D 0 [ 15.597395] Oops: 0002 [#1] SMP KASAN PTI [ 15.597967] CPU: 0 PID: 82 Comm: kworker/u2:2 Not tainted 5.6.0 #69 [ 15.598843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 15.600438] Workqueue: ath10k_wq ath10k_core_register_work [ath10k_core] [ 15.601389] RIP: 0010:__ath10k_htt_rx_ring_fill_n (linux/drivers/net/wireless/ath/ath10k/htt_rx.c:173) ath10k_core Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200623221105.3486-1-bruceshenzk@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 13, 2020
[ Upstream commit bad60b8 ] The idx in __ath10k_htt_rx_ring_fill_n function lives in consistent dma region writable by the device. Malfunctional or malicious device could manipulate such idx to have a OOB write. Either by htt->rx_ring.netbufs_ring[idx] = skb; or by ath10k_htt_set_paddrs_ring(htt, paddr, idx); The idx can also be negative as it's signed, giving a large memory space to write to. It's possibly exploitable by corruptting a legit pointer with a skb pointer. And then fill skb with payload as rougue object. Part of the log here. Sometimes it appears as UAF when writing to a freed memory by chance. [ 15.594376] BUG: unable to handle page fault for address: ffff887f5c1804f0 [ 15.595483] #PF: supervisor write access in kernel mode [ 15.596250] #PF: error_code(0x0002) - not-present page [ 15.597013] PGD 0 P4D 0 [ 15.597395] Oops: 0002 [#1] SMP KASAN PTI [ 15.597967] CPU: 0 PID: 82 Comm: kworker/u2:2 Not tainted 5.6.0 #69 [ 15.598843] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.12.1-0-ga5cab58e9a3f-prebuilt.qemu.org 04/01/2014 [ 15.600438] Workqueue: ath10k_wq ath10k_core_register_work [ath10k_core] [ 15.601389] RIP: 0010:__ath10k_htt_rx_ring_fill_n (linux/drivers/net/wireless/ath/ath10k/htt_rx.c:173) ath10k_core Signed-off-by: Zekun Shen <bruceshenzk@gmail.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Link: https://lore.kernel.org/r/20200623221105.3486-1-bruceshenzk@gmail.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Aug 21, 2021
Avoids the following WARN: [ 3.009556] ------------[ cut here ]------------ [ 3.014306] WARNING: CPU: 7 PID: 109 at drivers/gpu/drm/drm_dp_helper.c:1796 drm_dp_aux_register+0xa4/0xac [ 3.024209] Modules linked in: [ 3.027351] CPU: 7 PID: 109 Comm: kworker/7:8 Not tainted 5.10.47 #69 [ 3.033958] Hardware name: Google Lazor (rev1 - 2) (DT) [ 3.039323] Workqueue: events deferred_probe_work_func [ 3.044596] pstate: 60c00009 (nZCv daif +PAN +UAO -TCO BTYPE=--) [ 3.050761] pc : drm_dp_aux_register+0xa4/0xac [ 3.055329] lr : dp_aux_register+0x40/0x88 [ 3.059538] sp : ffffffc010ad3920 [ 3.062948] x29: ffffffc010ad3920 x28: ffffffa64196ac70 [ 3.067239] mmc1: Command Queue Engine enabled [ 3.068406] x27: ffffffa64196ac68 x26: 0000000000000001 [ 3.068407] x25: 0000000000000002 x24: 0000000000000060 [ 3.068409] x23: ffffffa642ab3400 x22: ffffffe126c10e5b [ 3.068410] x21: ffffffa641dc3188 x20: ffffffa641963c10 [ 3.068412] x19: ffffffa642aba910 x18: 00000000ffff0a00 [ 3.068414] x17: 000000476f8e002a x16: 00000000000000b8 [ 3.073008] mmc1: new HS400 Enhanced strobe MMC card at address 0001 [ 3.078448] x15: ffffffffffffffff x14: ffffffffffffffff [ 3.078450] x13: 0000000000000030 x12: 0000000000000030 [ 3.078452] x11: 0101010101010101 x10: ffffffe12647a914 [ 3.078453] x9 : ffffffe12647a8cc x8 : 0000000000000000 [ 3.084452] mmcblk1: mmc1:0001 DA4032 29.1 GiB [ 3.089372] [ 3.089372] x7 : 6c6064717372fefe x6 : ffffffa642b11494 [ 3.089374] x5 : 0000000000000000 x4 : 6d006c657869ffff [ 3.089375] x3 : 000000006c657869 x2 : 000000000000000c [ 3.089376] x1 : ffffffe126c3ae3c x0 : ffffffa642aba910 [ 3.089381] Call trace: [ 3.094931] mmcblk1boot0: mmc1:0001 DA4032 partition 1 4.00 MiB [ 3.100291] drm_dp_aux_register+0xa4/0xac [ 3.100292] dp_aux_register+0x40/0x88 [ 3.100294] dp_display_bind+0x64/0xcc [ 3.100295] component_bind_all+0xdc/0x210 [ 3.100298] msm_drm_bind+0x1e8/0x5d4 [ 3.100301] try_to_bring_up_master+0x168/0x1b0 [ 3.105861] mmcblk1boot1: mmc1:0001 DA4032 partition 2 4.00 MiB [ 3.112282] __component_add+0xa0/0x158 [ 3.112283] component_add+0x1c/0x28 [ 3.112284] dp_display_probe+0x33c/0x380 [ 3.112286] platform_drv_probe+0x9c/0xbc [ 3.112287] really_probe+0x140/0x35c [ 3.112289] driver_probe_device+0x84/0xc0 [ 3.112292] __device_attach_driver+0x94/0xb0 [ 3.117967] mmcblk1rpmb: mmc1:0001 DA4032 partition 3 16.0 MiB, chardev (239:0) [ 3.123201] bus_for_each_drv+0x8c/0xd8 [ 3.123202] __device_attach+0xc4/0x150 [ 3.123204] device_initial_probe+0x1c/0x28 [ 3.123205] bus_probe_device+0x3c/0x9c [ 3.123206] deferred_probe_work_func+0x90/0xcc [ 3.123211] process_one_work+0x218/0x3ec [ 3.131976] mmcblk1: p1 p2 p3 p4 p5 p6 p7 p8 p9 p10 p11 p12 [ 3.134123] worker_thread+0x288/0x3e8 [ 3.134124] kthread+0x148/0x1b0 [ 3.134127] ret_from_fork+0x10/0x30 [ 3.134128] ---[ end trace cfb9fce3f70f824d ]--- Signed-off-by: Sean Paul <seanpaul@chromium.org> Reviewed-by: Abhinav Kumar <abhinavk@codeaurora.org> Link: https://lore.kernel.org/r/20210714152910.55093-1-sean@poorly.run Signed-off-by: Rob Clark <robdclark@chromium.org>
frank-w
pushed a commit
that referenced
this issue
Oct 4, 2021
[ Upstream commit fe4d557 ] lockdep complains that in omap-aes, the list_lock is taken both with softirqs enabled at probe time, and also in softirq context, which could lead to a deadlock: ================================ WARNING: inconsistent lock state 5.14.0-rc1-00035-gc836005b01c5-dirty #69 Not tainted -------------------------------- inconsistent {SOFTIRQ-ON-W} -> {IN-SOFTIRQ-W} usage. ksoftirqd/0/7 [HC0[0]:SC1[3]:HE1:SE0] takes: bf00e014 (list_lock){+.?.}-{2:2}, at: omap_aes_find_dev+0x18/0x54 [omap_aes_driver] {SOFTIRQ-ON-W} state was registered at: _raw_spin_lock+0x40/0x50 omap_aes_probe+0x1d4/0x664 [omap_aes_driver] platform_probe+0x58/0xb8 really_probe+0xbc/0x314 __driver_probe_device+0x80/0xe4 driver_probe_device+0x30/0xc8 __driver_attach+0x70/0xf4 bus_for_each_dev+0x70/0xb4 bus_add_driver+0xf0/0x1d4 driver_register+0x74/0x108 do_one_initcall+0x84/0x2e4 do_init_module+0x5c/0x240 load_module+0x221c/0x2584 sys_finit_module+0xb0/0xec ret_fast_syscall+0x0/0x2c 0xbed90b30 irq event stamp: 111800 hardirqs last enabled at (111800): [<c02a21e4>] __kmalloc+0x484/0x5ec hardirqs last disabled at (111799): [<c02a21f0>] __kmalloc+0x490/0x5ec softirqs last enabled at (111776): [<c01015f0>] __do_softirq+0x2b8/0x4d0 softirqs last disabled at (111781): [<c0135948>] run_ksoftirqd+0x34/0x50 other info that might help us debug this: Possible unsafe locking scenario: CPU0 ---- lock(list_lock); <Interrupt> lock(list_lock); *** DEADLOCK *** 2 locks held by ksoftirqd/0/7: #0: c0f5e8c8 (rcu_read_lock){....}-{1:2}, at: netif_receive_skb+0x6c/0x260 #1: c0f5e8c8 (rcu_read_lock){....}-{1:2}, at: ip_local_deliver_finish+0x2c/0xdc stack backtrace: CPU: 0 PID: 7 Comm: ksoftirqd/0 Not tainted 5.14.0-rc1-00035-gc836005b01c5-dirty #69 Hardware name: Generic AM43 (Flattened Device Tree) [<c010e6e0>] (unwind_backtrace) from [<c010b9d0>] (show_stack+0x10/0x14) [<c010b9d0>] (show_stack) from [<c017c640>] (mark_lock.part.17+0x5bc/0xd04) [<c017c640>] (mark_lock.part.17) from [<c017d9e4>] (__lock_acquire+0x960/0x2fa4) [<c017d9e4>] (__lock_acquire) from [<c0180980>] (lock_acquire+0x10c/0x358) [<c0180980>] (lock_acquire) from [<c093d324>] (_raw_spin_lock_bh+0x44/0x58) [<c093d324>] (_raw_spin_lock_bh) from [<bf00b258>] (omap_aes_find_dev+0x18/0x54 [omap_aes_driver]) [<bf00b258>] (omap_aes_find_dev [omap_aes_driver]) from [<bf00b328>] (omap_aes_crypt+0x94/0xd4 [omap_aes_driver]) [<bf00b328>] (omap_aes_crypt [omap_aes_driver]) from [<c08ac6d0>] (esp_input+0x1b0/0x2c8) [<c08ac6d0>] (esp_input) from [<c08c9e90>] (xfrm_input+0x410/0x1290) [<c08c9e90>] (xfrm_input) from [<c08b6374>] (xfrm4_esp_rcv+0x54/0x11c) [<c08b6374>] (xfrm4_esp_rcv) from [<c0838840>] (ip_protocol_deliver_rcu+0x48/0x3bc) [<c0838840>] (ip_protocol_deliver_rcu) from [<c0838c50>] (ip_local_deliver_finish+0x9c/0xdc) [<c0838c50>] (ip_local_deliver_finish) from [<c0838dd8>] (ip_local_deliver+0x148/0x1b0) [<c0838dd8>] (ip_local_deliver) from [<c0838f5c>] (ip_rcv+0x11c/0x180) [<c0838f5c>] (ip_rcv) from [<c077e3a4>] (__netif_receive_skb_one_core+0x54/0x74) [<c077e3a4>] (__netif_receive_skb_one_core) from [<c077e588>] (netif_receive_skb+0xa8/0x260) [<c077e588>] (netif_receive_skb) from [<c068d6d4>] (cpsw_rx_handler+0x224/0x2fc) [<c068d6d4>] (cpsw_rx_handler) from [<c0688ccc>] (__cpdma_chan_process+0xf4/0x188) [<c0688ccc>] (__cpdma_chan_process) from [<c068a0c0>] (cpdma_chan_process+0x3c/0x5c) [<c068a0c0>] (cpdma_chan_process) from [<c0690e14>] (cpsw_rx_mq_poll+0x44/0x98) [<c0690e14>] (cpsw_rx_mq_poll) from [<c0780810>] (__napi_poll+0x28/0x268) [<c0780810>] (__napi_poll) from [<c0780c64>] (net_rx_action+0xcc/0x204) [<c0780c64>] (net_rx_action) from [<c0101478>] (__do_softirq+0x140/0x4d0) [<c0101478>] (__do_softirq) from [<c0135948>] (run_ksoftirqd+0x34/0x50) [<c0135948>] (run_ksoftirqd) from [<c01583b8>] (smpboot_thread_fn+0xf4/0x1d8) [<c01583b8>] (smpboot_thread_fn) from [<c01546dc>] (kthread+0x14c/0x174) [<c01546dc>] (kthread) from [<c010013c>] (ret_from_fork+0x14/0x38) ... The omap-des and omap-sham drivers appear to have a similar issue. Fix this by using spin_{,un}lock_bh() around device list access in all the probe and remove functions. Signed-off-by: Ben Hutchings <ben.hutchings@mind.be> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Oct 4, 2021
[ Upstream commit 9de71ed ] xfstest generic/587 reports a deadlock issue as below: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc1 #69 Not tainted ------------------------------------------------------ repquota/8606 is trying to acquire lock: ffff888022ac9320 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}, at: f2fs_quota_sync+0x207/0x300 [f2fs] but task is already holding lock: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sbi->quota_sem){.+.+}-{3:3}: __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_read+0x3b/0x2a0 f2fs_quota_sync+0x59/0x300 [f2fs] f2fs_quota_on+0x48/0x100 [f2fs] do_quotactl+0x5e3/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (&sbi->cp_rwsem){++++}-{3:3}: __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_read+0x3b/0x2a0 f2fs_unlink+0x353/0x670 [f2fs] vfs_unlink+0x1c7/0x380 do_unlinkat+0x413/0x4b0 __x64_sys_unlinkat+0x50/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}: check_prev_add+0xdc/0xb30 validate_chain+0xa67/0xb20 __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_write+0x39/0xc0 f2fs_quota_sync+0x207/0x300 [f2fs] do_quotactl+0xaff/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &sb->s_type->i_mutex_key#18 --> &sbi->cp_rwsem --> &sbi->quota_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sbi->quota_sem); lock(&sbi->cp_rwsem); lock(&sbi->quota_sem); lock(&sb->s_type->i_mutex_key#18); *** DEADLOCK *** 3 locks held by repquota/8606: #0: ffff88801efac0e0 (&type->s_umount_key#53){++++}-{3:3}, at: user_get_super+0xd9/0x190 #1: ffff8880084bc380 (&sbi->cp_rwsem){++++}-{3:3}, at: f2fs_quota_sync+0x3e/0x300 [f2fs] #2: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs] stack backtrace: CPU: 6 PID: 8606 Comm: repquota Not tainted 5.14.0-rc1 #69 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Call Trace: dump_stack_lvl+0xce/0x134 dump_stack+0x17/0x20 print_circular_bug.isra.0.cold+0x239/0x253 check_noncircular+0x1be/0x1f0 check_prev_add+0xdc/0xb30 validate_chain+0xa67/0xb20 __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_write+0x39/0xc0 f2fs_quota_sync+0x207/0x300 [f2fs] do_quotactl+0xaff/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f883b0b4efe The root cause is ABBA deadlock of inode lock and cp_rwsem, reorder locks in f2fs_quota_sync() as below to fix this issue: - lock inode - lock cp_rwsem - lock quota_sem Fixes: db6ec53 ("f2fs: add a rw_sem to cover quota flag changes") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Oct 4, 2021
[ Upstream commit 9de71ed ] xfstest generic/587 reports a deadlock issue as below: ====================================================== WARNING: possible circular locking dependency detected 5.14.0-rc1 #69 Not tainted ------------------------------------------------------ repquota/8606 is trying to acquire lock: ffff888022ac9320 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}, at: f2fs_quota_sync+0x207/0x300 [f2fs] but task is already holding lock: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs] which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (&sbi->quota_sem){.+.+}-{3:3}: __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_read+0x3b/0x2a0 f2fs_quota_sync+0x59/0x300 [f2fs] f2fs_quota_on+0x48/0x100 [f2fs] do_quotactl+0x5e3/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #1 (&sbi->cp_rwsem){++++}-{3:3}: __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_read+0x3b/0x2a0 f2fs_unlink+0x353/0x670 [f2fs] vfs_unlink+0x1c7/0x380 do_unlinkat+0x413/0x4b0 __x64_sys_unlinkat+0x50/0xb0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae -> #0 (&sb->s_type->i_mutex_key#18){+.+.}-{3:3}: check_prev_add+0xdc/0xb30 validate_chain+0xa67/0xb20 __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_write+0x39/0xc0 f2fs_quota_sync+0x207/0x300 [f2fs] do_quotactl+0xaff/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae other info that might help us debug this: Chain exists of: &sb->s_type->i_mutex_key#18 --> &sbi->cp_rwsem --> &sbi->quota_sem Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&sbi->quota_sem); lock(&sbi->cp_rwsem); lock(&sbi->quota_sem); lock(&sb->s_type->i_mutex_key#18); *** DEADLOCK *** 3 locks held by repquota/8606: #0: ffff88801efac0e0 (&type->s_umount_key#53){++++}-{3:3}, at: user_get_super+0xd9/0x190 #1: ffff8880084bc380 (&sbi->cp_rwsem){++++}-{3:3}, at: f2fs_quota_sync+0x3e/0x300 [f2fs] #2: ffff8880084bcde8 (&sbi->quota_sem){.+.+}-{3:3}, at: f2fs_quota_sync+0x59/0x300 [f2fs] stack backtrace: CPU: 6 PID: 8606 Comm: repquota Not tainted 5.14.0-rc1 #69 Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 Call Trace: dump_stack_lvl+0xce/0x134 dump_stack+0x17/0x20 print_circular_bug.isra.0.cold+0x239/0x253 check_noncircular+0x1be/0x1f0 check_prev_add+0xdc/0xb30 validate_chain+0xa67/0xb20 __lock_acquire+0x648/0x10b0 lock_acquire+0x128/0x470 down_write+0x39/0xc0 f2fs_quota_sync+0x207/0x300 [f2fs] do_quotactl+0xaff/0xb30 __x64_sys_quotactl+0x23a/0x4e0 do_syscall_64+0x3b/0x90 entry_SYSCALL_64_after_hwframe+0x44/0xae RIP: 0033:0x7f883b0b4efe The root cause is ABBA deadlock of inode lock and cp_rwsem, reorder locks in f2fs_quota_sync() as below to fix this issue: - lock inode - lock cp_rwsem - lock quota_sem Fixes: db6ec53 ("f2fs: add a rw_sem to cover quota flag changes") Signed-off-by: Chao Yu <chao@kernel.org> Signed-off-by: Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 28, 2022
…te() [ Upstream commit f020d95 ] When peer delete failed in a disconnect operation, use-after-free detected by KFENCE in below log. It is because for each vdev_id and address, it has only one struct ath10k_peer, it is allocated in ath10k_peer_map_event(). When connected to an AP, it has more than one HTT_T2H_MSG_TYPE_PEER_MAP reported from firmware, then the array peer_map of struct ath10k will be set muti-elements to the same ath10k_peer in ath10k_peer_map_event(). When peer delete failed in ath10k_sta_state(), the ath10k_peer will be free for the 1st peer id in array peer_map of struct ath10k, and then use-after-free happened for the 2nd peer id because they map to the same ath10k_peer. And clean up all peers in array peer_map for the ath10k_peer, then user-after-free disappeared peer map event log: [ 306.911021] wlan0: authenticate with b0:2a:43:e6:75:0e [ 306.957187] ath10k_pci 0000:01:00.0: mac vdev 0 peer create b0:2a:43:e6:75:0e (new sta) sta 1 / 32 peer 1 / 33 [ 306.957395] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 306.957404] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 306.986924] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 166 peer unmap event log: [ 435.715691] wlan0: deauthenticating from b0:2a:43:e6:75:0e by local choice (Reason: 3=DEAUTH_LEAVING) [ 435.716802] ath10k_pci 0000:01:00.0: mac vdev 0 peer delete b0:2a:43:e6:75:0e sta ffff990e0e9c2b50 (sta gone) [ 435.717177] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 435.717186] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 435.717193] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 166 use-after-free log: [21705.888627] wlan0: deauthenticating from d0:76:8f:82:be:75 by local choice (Reason: 3=DEAUTH_LEAVING) [21713.799910] ath10k_pci 0000:01:00.0: failed to delete peer d0:76:8f:82:be:75 for vdev 0: -110 [21713.799925] ath10k_pci 0000:01:00.0: found sta peer d0:76:8f:82:be:75 (ptr 0000000000000000 id 102) entry on vdev 0 after it was supposedly removed [21713.799968] ================================================================== [21713.799991] BUG: KFENCE: use-after-free read in ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.799991] [21713.799997] Use-after-free read at 0x00000000abe1c75e (in kfence-#69): [21713.800010] ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.800041] drv_sta_state+0x115/0x677 [mac80211] [21713.800059] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800076] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800093] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800110] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800137] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800153] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800161] genl_rcv_msg+0x38e/0x3be [21713.800166] netlink_rcv_skb+0x89/0xf7 [21713.800171] genl_rcv+0x28/0x36 [21713.800176] netlink_unicast+0x179/0x24b [21713.800181] netlink_sendmsg+0x3a0/0x40e [21713.800187] sock_sendmsg+0x72/0x76 [21713.800192] ____sys_sendmsg+0x16d/0x1e3 [21713.800196] ___sys_sendmsg+0x95/0xd1 [21713.800200] __sys_sendmsg+0x85/0xbf [21713.800205] do_syscall_64+0x43/0x55 [21713.800210] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [21713.800213] [21713.800219] kfence-#69: 0x000000009149b0d5-0x000000004c0697fb, size=1064, cache=kmalloc-2k [21713.800219] [21713.800224] allocated by task 13 on cpu 0 at 21705.501373s: [21713.800241] ath10k_peer_map_event+0x7e/0x154 [ath10k_core] [21713.800254] ath10k_htt_t2h_msg_handler+0x586/0x1039 [ath10k_core] [21713.800265] ath10k_htt_htc_t2h_msg_handler+0x12/0x28 [ath10k_core] [21713.800277] ath10k_htc_rx_completion_handler+0x14c/0x1b5 [ath10k_core] [21713.800283] ath10k_pci_process_rx_cb+0x195/0x1df [ath10k_pci] [21713.800294] ath10k_ce_per_engine_service+0x55/0x74 [ath10k_core] [21713.800305] ath10k_ce_per_engine_service_any+0x76/0x84 [ath10k_core] [21713.800310] ath10k_pci_napi_poll+0x49/0x144 [ath10k_pci] [21713.800316] net_rx_action+0xdc/0x361 [21713.800320] __do_softirq+0x163/0x29a [21713.800325] asm_call_irq_on_stack+0x12/0x20 [21713.800331] do_softirq_own_stack+0x3c/0x48 [21713.800337] __irq_exit_rcu+0x9b/0x9d [21713.800342] common_interrupt+0xc9/0x14d [21713.800346] asm_common_interrupt+0x1e/0x40 [21713.800351] ksoftirqd_should_run+0x5/0x16 [21713.800357] smpboot_thread_fn+0x148/0x211 [21713.800362] kthread+0x150/0x15f [21713.800367] ret_from_fork+0x22/0x30 [21713.800370] [21713.800374] freed by task 708 on cpu 1 at 21713.799953s: [21713.800498] ath10k_sta_state+0x2c6/0xb8a [ath10k_core] [21713.800515] drv_sta_state+0x115/0x677 [mac80211] [21713.800532] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800548] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800565] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800581] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800598] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800614] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800619] genl_rcv_msg+0x38e/0x3be [21713.800623] netlink_rcv_skb+0x89/0xf7 [21713.800628] genl_rcv+0x28/0x36 [21713.800632] netlink_unicast+0x179/0x24b [21713.800637] netlink_sendmsg+0x3a0/0x40e [21713.800642] sock_sendmsg+0x72/0x76 [21713.800646] ____sys_sendmsg+0x16d/0x1e3 [21713.800651] ___sys_sendmsg+0x95/0xd1 [21713.800655] __sys_sendmsg+0x85/0xbf [21713.800659] do_syscall_64+0x43/0x55 [21713.800663] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00288-QCARMSWPZ-1 Fixes: d0eeafa ("ath10k: Clean up peer when sta goes away.") Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220801141930.16794-1-quic_wgong@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 28, 2022
…te() [ Upstream commit f020d95 ] When peer delete failed in a disconnect operation, use-after-free detected by KFENCE in below log. It is because for each vdev_id and address, it has only one struct ath10k_peer, it is allocated in ath10k_peer_map_event(). When connected to an AP, it has more than one HTT_T2H_MSG_TYPE_PEER_MAP reported from firmware, then the array peer_map of struct ath10k will be set muti-elements to the same ath10k_peer in ath10k_peer_map_event(). When peer delete failed in ath10k_sta_state(), the ath10k_peer will be free for the 1st peer id in array peer_map of struct ath10k, and then use-after-free happened for the 2nd peer id because they map to the same ath10k_peer. And clean up all peers in array peer_map for the ath10k_peer, then user-after-free disappeared peer map event log: [ 306.911021] wlan0: authenticate with b0:2a:43:e6:75:0e [ 306.957187] ath10k_pci 0000:01:00.0: mac vdev 0 peer create b0:2a:43:e6:75:0e (new sta) sta 1 / 32 peer 1 / 33 [ 306.957395] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 306.957404] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 306.986924] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 166 peer unmap event log: [ 435.715691] wlan0: deauthenticating from b0:2a:43:e6:75:0e by local choice (Reason: 3=DEAUTH_LEAVING) [ 435.716802] ath10k_pci 0000:01:00.0: mac vdev 0 peer delete b0:2a:43:e6:75:0e sta ffff990e0e9c2b50 (sta gone) [ 435.717177] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 435.717186] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 435.717193] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 166 use-after-free log: [21705.888627] wlan0: deauthenticating from d0:76:8f:82:be:75 by local choice (Reason: 3=DEAUTH_LEAVING) [21713.799910] ath10k_pci 0000:01:00.0: failed to delete peer d0:76:8f:82:be:75 for vdev 0: -110 [21713.799925] ath10k_pci 0000:01:00.0: found sta peer d0:76:8f:82:be:75 (ptr 0000000000000000 id 102) entry on vdev 0 after it was supposedly removed [21713.799968] ================================================================== [21713.799991] BUG: KFENCE: use-after-free read in ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.799991] [21713.799997] Use-after-free read at 0x00000000abe1c75e (in kfence-#69): [21713.800010] ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.800041] drv_sta_state+0x115/0x677 [mac80211] [21713.800059] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800076] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800093] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800110] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800137] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800153] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800161] genl_rcv_msg+0x38e/0x3be [21713.800166] netlink_rcv_skb+0x89/0xf7 [21713.800171] genl_rcv+0x28/0x36 [21713.800176] netlink_unicast+0x179/0x24b [21713.800181] netlink_sendmsg+0x3a0/0x40e [21713.800187] sock_sendmsg+0x72/0x76 [21713.800192] ____sys_sendmsg+0x16d/0x1e3 [21713.800196] ___sys_sendmsg+0x95/0xd1 [21713.800200] __sys_sendmsg+0x85/0xbf [21713.800205] do_syscall_64+0x43/0x55 [21713.800210] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [21713.800213] [21713.800219] kfence-#69: 0x000000009149b0d5-0x000000004c0697fb, size=1064, cache=kmalloc-2k [21713.800219] [21713.800224] allocated by task 13 on cpu 0 at 21705.501373s: [21713.800241] ath10k_peer_map_event+0x7e/0x154 [ath10k_core] [21713.800254] ath10k_htt_t2h_msg_handler+0x586/0x1039 [ath10k_core] [21713.800265] ath10k_htt_htc_t2h_msg_handler+0x12/0x28 [ath10k_core] [21713.800277] ath10k_htc_rx_completion_handler+0x14c/0x1b5 [ath10k_core] [21713.800283] ath10k_pci_process_rx_cb+0x195/0x1df [ath10k_pci] [21713.800294] ath10k_ce_per_engine_service+0x55/0x74 [ath10k_core] [21713.800305] ath10k_ce_per_engine_service_any+0x76/0x84 [ath10k_core] [21713.800310] ath10k_pci_napi_poll+0x49/0x144 [ath10k_pci] [21713.800316] net_rx_action+0xdc/0x361 [21713.800320] __do_softirq+0x163/0x29a [21713.800325] asm_call_irq_on_stack+0x12/0x20 [21713.800331] do_softirq_own_stack+0x3c/0x48 [21713.800337] __irq_exit_rcu+0x9b/0x9d [21713.800342] common_interrupt+0xc9/0x14d [21713.800346] asm_common_interrupt+0x1e/0x40 [21713.800351] ksoftirqd_should_run+0x5/0x16 [21713.800357] smpboot_thread_fn+0x148/0x211 [21713.800362] kthread+0x150/0x15f [21713.800367] ret_from_fork+0x22/0x30 [21713.800370] [21713.800374] freed by task 708 on cpu 1 at 21713.799953s: [21713.800498] ath10k_sta_state+0x2c6/0xb8a [ath10k_core] [21713.800515] drv_sta_state+0x115/0x677 [mac80211] [21713.800532] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800548] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800565] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800581] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800598] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800614] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800619] genl_rcv_msg+0x38e/0x3be [21713.800623] netlink_rcv_skb+0x89/0xf7 [21713.800628] genl_rcv+0x28/0x36 [21713.800632] netlink_unicast+0x179/0x24b [21713.800637] netlink_sendmsg+0x3a0/0x40e [21713.800642] sock_sendmsg+0x72/0x76 [21713.800646] ____sys_sendmsg+0x16d/0x1e3 [21713.800651] ___sys_sendmsg+0x95/0xd1 [21713.800655] __sys_sendmsg+0x85/0xbf [21713.800659] do_syscall_64+0x43/0x55 [21713.800663] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00288-QCARMSWPZ-1 Fixes: d0eeafa ("ath10k: Clean up peer when sta goes away.") Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220801141930.16794-1-quic_wgong@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 28, 2022
…te() [ Upstream commit f020d95 ] When peer delete failed in a disconnect operation, use-after-free detected by KFENCE in below log. It is because for each vdev_id and address, it has only one struct ath10k_peer, it is allocated in ath10k_peer_map_event(). When connected to an AP, it has more than one HTT_T2H_MSG_TYPE_PEER_MAP reported from firmware, then the array peer_map of struct ath10k will be set muti-elements to the same ath10k_peer in ath10k_peer_map_event(). When peer delete failed in ath10k_sta_state(), the ath10k_peer will be free for the 1st peer id in array peer_map of struct ath10k, and then use-after-free happened for the 2nd peer id because they map to the same ath10k_peer. And clean up all peers in array peer_map for the ath10k_peer, then user-after-free disappeared peer map event log: [ 306.911021] wlan0: authenticate with b0:2a:43:e6:75:0e [ 306.957187] ath10k_pci 0000:01:00.0: mac vdev 0 peer create b0:2a:43:e6:75:0e (new sta) sta 1 / 32 peer 1 / 33 [ 306.957395] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 306.957404] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 306.986924] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 166 peer unmap event log: [ 435.715691] wlan0: deauthenticating from b0:2a:43:e6:75:0e by local choice (Reason: 3=DEAUTH_LEAVING) [ 435.716802] ath10k_pci 0000:01:00.0: mac vdev 0 peer delete b0:2a:43:e6:75:0e sta ffff990e0e9c2b50 (sta gone) [ 435.717177] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 435.717186] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 435.717193] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 166 use-after-free log: [21705.888627] wlan0: deauthenticating from d0:76:8f:82:be:75 by local choice (Reason: 3=DEAUTH_LEAVING) [21713.799910] ath10k_pci 0000:01:00.0: failed to delete peer d0:76:8f:82:be:75 for vdev 0: -110 [21713.799925] ath10k_pci 0000:01:00.0: found sta peer d0:76:8f:82:be:75 (ptr 0000000000000000 id 102) entry on vdev 0 after it was supposedly removed [21713.799968] ================================================================== [21713.799991] BUG: KFENCE: use-after-free read in ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.799991] [21713.799997] Use-after-free read at 0x00000000abe1c75e (in kfence-#69): [21713.800010] ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.800041] drv_sta_state+0x115/0x677 [mac80211] [21713.800059] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800076] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800093] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800110] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800137] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800153] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800161] genl_rcv_msg+0x38e/0x3be [21713.800166] netlink_rcv_skb+0x89/0xf7 [21713.800171] genl_rcv+0x28/0x36 [21713.800176] netlink_unicast+0x179/0x24b [21713.800181] netlink_sendmsg+0x3a0/0x40e [21713.800187] sock_sendmsg+0x72/0x76 [21713.800192] ____sys_sendmsg+0x16d/0x1e3 [21713.800196] ___sys_sendmsg+0x95/0xd1 [21713.800200] __sys_sendmsg+0x85/0xbf [21713.800205] do_syscall_64+0x43/0x55 [21713.800210] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [21713.800213] [21713.800219] kfence-#69: 0x000000009149b0d5-0x000000004c0697fb, size=1064, cache=kmalloc-2k [21713.800219] [21713.800224] allocated by task 13 on cpu 0 at 21705.501373s: [21713.800241] ath10k_peer_map_event+0x7e/0x154 [ath10k_core] [21713.800254] ath10k_htt_t2h_msg_handler+0x586/0x1039 [ath10k_core] [21713.800265] ath10k_htt_htc_t2h_msg_handler+0x12/0x28 [ath10k_core] [21713.800277] ath10k_htc_rx_completion_handler+0x14c/0x1b5 [ath10k_core] [21713.800283] ath10k_pci_process_rx_cb+0x195/0x1df [ath10k_pci] [21713.800294] ath10k_ce_per_engine_service+0x55/0x74 [ath10k_core] [21713.800305] ath10k_ce_per_engine_service_any+0x76/0x84 [ath10k_core] [21713.800310] ath10k_pci_napi_poll+0x49/0x144 [ath10k_pci] [21713.800316] net_rx_action+0xdc/0x361 [21713.800320] __do_softirq+0x163/0x29a [21713.800325] asm_call_irq_on_stack+0x12/0x20 [21713.800331] do_softirq_own_stack+0x3c/0x48 [21713.800337] __irq_exit_rcu+0x9b/0x9d [21713.800342] common_interrupt+0xc9/0x14d [21713.800346] asm_common_interrupt+0x1e/0x40 [21713.800351] ksoftirqd_should_run+0x5/0x16 [21713.800357] smpboot_thread_fn+0x148/0x211 [21713.800362] kthread+0x150/0x15f [21713.800367] ret_from_fork+0x22/0x30 [21713.800370] [21713.800374] freed by task 708 on cpu 1 at 21713.799953s: [21713.800498] ath10k_sta_state+0x2c6/0xb8a [ath10k_core] [21713.800515] drv_sta_state+0x115/0x677 [mac80211] [21713.800532] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800548] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800565] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800581] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800598] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800614] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800619] genl_rcv_msg+0x38e/0x3be [21713.800623] netlink_rcv_skb+0x89/0xf7 [21713.800628] genl_rcv+0x28/0x36 [21713.800632] netlink_unicast+0x179/0x24b [21713.800637] netlink_sendmsg+0x3a0/0x40e [21713.800642] sock_sendmsg+0x72/0x76 [21713.800646] ____sys_sendmsg+0x16d/0x1e3 [21713.800651] ___sys_sendmsg+0x95/0xd1 [21713.800655] __sys_sendmsg+0x85/0xbf [21713.800659] do_syscall_64+0x43/0x55 [21713.800663] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00288-QCARMSWPZ-1 Fixes: d0eeafa ("ath10k: Clean up peer when sta goes away.") Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220801141930.16794-1-quic_wgong@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Nov 28, 2022
…te() [ Upstream commit f020d95 ] When peer delete failed in a disconnect operation, use-after-free detected by KFENCE in below log. It is because for each vdev_id and address, it has only one struct ath10k_peer, it is allocated in ath10k_peer_map_event(). When connected to an AP, it has more than one HTT_T2H_MSG_TYPE_PEER_MAP reported from firmware, then the array peer_map of struct ath10k will be set muti-elements to the same ath10k_peer in ath10k_peer_map_event(). When peer delete failed in ath10k_sta_state(), the ath10k_peer will be free for the 1st peer id in array peer_map of struct ath10k, and then use-after-free happened for the 2nd peer id because they map to the same ath10k_peer. And clean up all peers in array peer_map for the ath10k_peer, then user-after-free disappeared peer map event log: [ 306.911021] wlan0: authenticate with b0:2a:43:e6:75:0e [ 306.957187] ath10k_pci 0000:01:00.0: mac vdev 0 peer create b0:2a:43:e6:75:0e (new sta) sta 1 / 32 peer 1 / 33 [ 306.957395] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 306.957404] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 306.986924] ath10k_pci 0000:01:00.0: htt peer map vdev 0 peer b0:2a:43:e6:75:0e id 166 peer unmap event log: [ 435.715691] wlan0: deauthenticating from b0:2a:43:e6:75:0e by local choice (Reason: 3=DEAUTH_LEAVING) [ 435.716802] ath10k_pci 0000:01:00.0: mac vdev 0 peer delete b0:2a:43:e6:75:0e sta ffff990e0e9c2b50 (sta gone) [ 435.717177] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 246 [ 435.717186] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 198 [ 435.717193] ath10k_pci 0000:01:00.0: htt peer unmap vdev 0 peer b0:2a:43:e6:75:0e id 166 use-after-free log: [21705.888627] wlan0: deauthenticating from d0:76:8f:82:be:75 by local choice (Reason: 3=DEAUTH_LEAVING) [21713.799910] ath10k_pci 0000:01:00.0: failed to delete peer d0:76:8f:82:be:75 for vdev 0: -110 [21713.799925] ath10k_pci 0000:01:00.0: found sta peer d0:76:8f:82:be:75 (ptr 0000000000000000 id 102) entry on vdev 0 after it was supposedly removed [21713.799968] ================================================================== [21713.799991] BUG: KFENCE: use-after-free read in ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.799991] [21713.799997] Use-after-free read at 0x00000000abe1c75e (in kfence-#69): [21713.800010] ath10k_sta_state+0x265/0xb8a [ath10k_core] [21713.800041] drv_sta_state+0x115/0x677 [mac80211] [21713.800059] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800076] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800093] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800110] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800137] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800153] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800161] genl_rcv_msg+0x38e/0x3be [21713.800166] netlink_rcv_skb+0x89/0xf7 [21713.800171] genl_rcv+0x28/0x36 [21713.800176] netlink_unicast+0x179/0x24b [21713.800181] netlink_sendmsg+0x3a0/0x40e [21713.800187] sock_sendmsg+0x72/0x76 [21713.800192] ____sys_sendmsg+0x16d/0x1e3 [21713.800196] ___sys_sendmsg+0x95/0xd1 [21713.800200] __sys_sendmsg+0x85/0xbf [21713.800205] do_syscall_64+0x43/0x55 [21713.800210] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [21713.800213] [21713.800219] kfence-#69: 0x000000009149b0d5-0x000000004c0697fb, size=1064, cache=kmalloc-2k [21713.800219] [21713.800224] allocated by task 13 on cpu 0 at 21705.501373s: [21713.800241] ath10k_peer_map_event+0x7e/0x154 [ath10k_core] [21713.800254] ath10k_htt_t2h_msg_handler+0x586/0x1039 [ath10k_core] [21713.800265] ath10k_htt_htc_t2h_msg_handler+0x12/0x28 [ath10k_core] [21713.800277] ath10k_htc_rx_completion_handler+0x14c/0x1b5 [ath10k_core] [21713.800283] ath10k_pci_process_rx_cb+0x195/0x1df [ath10k_pci] [21713.800294] ath10k_ce_per_engine_service+0x55/0x74 [ath10k_core] [21713.800305] ath10k_ce_per_engine_service_any+0x76/0x84 [ath10k_core] [21713.800310] ath10k_pci_napi_poll+0x49/0x144 [ath10k_pci] [21713.800316] net_rx_action+0xdc/0x361 [21713.800320] __do_softirq+0x163/0x29a [21713.800325] asm_call_irq_on_stack+0x12/0x20 [21713.800331] do_softirq_own_stack+0x3c/0x48 [21713.800337] __irq_exit_rcu+0x9b/0x9d [21713.800342] common_interrupt+0xc9/0x14d [21713.800346] asm_common_interrupt+0x1e/0x40 [21713.800351] ksoftirqd_should_run+0x5/0x16 [21713.800357] smpboot_thread_fn+0x148/0x211 [21713.800362] kthread+0x150/0x15f [21713.800367] ret_from_fork+0x22/0x30 [21713.800370] [21713.800374] freed by task 708 on cpu 1 at 21713.799953s: [21713.800498] ath10k_sta_state+0x2c6/0xb8a [ath10k_core] [21713.800515] drv_sta_state+0x115/0x677 [mac80211] [21713.800532] __sta_info_destroy_part2+0xb1/0x133 [mac80211] [21713.800548] __sta_info_flush+0x11d/0x162 [mac80211] [21713.800565] ieee80211_set_disassoc+0x12d/0x2f4 [mac80211] [21713.800581] ieee80211_mgd_deauth+0x26c/0x29b [mac80211] [21713.800598] cfg80211_mlme_deauth+0x13f/0x1bb [cfg80211] [21713.800614] nl80211_deauthenticate+0xf8/0x121 [cfg80211] [21713.800619] genl_rcv_msg+0x38e/0x3be [21713.800623] netlink_rcv_skb+0x89/0xf7 [21713.800628] genl_rcv+0x28/0x36 [21713.800632] netlink_unicast+0x179/0x24b [21713.800637] netlink_sendmsg+0x3a0/0x40e [21713.800642] sock_sendmsg+0x72/0x76 [21713.800646] ____sys_sendmsg+0x16d/0x1e3 [21713.800651] ___sys_sendmsg+0x95/0xd1 [21713.800655] __sys_sendmsg+0x85/0xbf [21713.800659] do_syscall_64+0x43/0x55 [21713.800663] entry_SYSCALL_64_after_hwframe+0x44/0xa9 Tested-on: QCA6174 hw3.2 PCI WLAN.RM.4.4.1-00288-QCARMSWPZ-1 Fixes: d0eeafa ("ath10k: Clean up peer when sta goes away.") Signed-off-by: Wen Gong <quic_wgong@quicinc.com> Signed-off-by: Kalle Valo <quic_kvalo@quicinc.com> Link: https://lore.kernel.org/r/20220801141930.16794-1-quic_wgong@quicinc.com Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Jan 22, 2023
Commit fb541ca ("md: remove lock_bdev / unlock_bdev") removes wrappers for blkdev_get/blkdev_put. However, the uninitialized local static variable of pointer type 'claim_rdev' in md_import_device() is NULL, which leads to the following warning call trace: WARNING: CPU: 22 PID: 1037 at block/bdev.c:577 bd_prepare_to_claim+0x131/0x150 CPU: 22 PID: 1037 Comm: mdadm Not tainted 6.2.0-rc3+ #69 .. RIP: 0010:bd_prepare_to_claim+0x131/0x150 .. Call Trace: <TASK> ? _raw_spin_unlock+0x15/0x30 ? iput+0x6a/0x220 blkdev_get_by_dev.part.0+0x4b/0x300 md_import_device+0x126/0x1d0 new_dev_store+0x184/0x240 md_attr_store+0x80/0xf0 kernfs_fop_write_iter+0x128/0x1c0 vfs_write+0x2be/0x3c0 ksys_write+0x5f/0xe0 do_syscall_64+0x38/0x90 entry_SYSCALL_64_after_hwframe+0x72/0xdc It turns out the md device cannot be used: md: could not open device unknown-block(259,0). md: md127 stopped. Fix the issue by declaring the local static variable of struct type and passing the pointer of the variable to blkdev_get_by_dev(). Fixes: fb541ca ("md: remove lock_bdev / unlock_bdev") Cc: Christoph Hellwig <hch@lst.de> Signed-off-by: Adrian Huang <ahuang12@lenovo.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Song Liu <song@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Aug 20, 2024
[ Upstream commit 7015843 ] Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue. The main qemu command line includes: -enable-kvm -smp 16 -cpu host The failure log looks like: $ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK> In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe. The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed. In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well. [1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Aug 20, 2024
[ Upstream commit 7015843 ] Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue. The main qemu command line includes: -enable-kvm -smp 16 -cpu host The failure log looks like: $ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK> In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe. The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed. In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well. [1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Sep 14, 2024
[ Upstream commit 7015843 ] Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue. The main qemu command line includes: -enable-kvm -smp 16 -cpu host The failure log looks like: $ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK> In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe. The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed. In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well. [1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Sep 14, 2024
[ Upstream commit 7015843 ] Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue. The main qemu command line includes: -enable-kvm -smp 16 -cpu host The failure log looks like: $ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK> In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe. The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed. In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well. [1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
frank-w
pushed a commit
that referenced
this issue
Sep 14, 2024
[ Upstream commit 7015843 ] Alexei reported that send_signal test may fail with nested CONFIG_PARAVIRT configs. In this particular case, the base VM is AMD with 166 cpus, and I run selftests with regular qemu on top of that and indeed send_signal test failed. I also tried with an Intel box with 80 cpus and there is no issue. The main qemu command line includes: -enable-kvm -smp 16 -cpu host The failure log looks like: $ ./test_progs -t send_signal [ 48.501588] watchdog: BUG: soft lockup - CPU#9 stuck for 26s! [test_progs:2225] [ 48.503622] Modules linked in: bpf_testmod(O) [ 48.503622] CPU: 9 PID: 2225 Comm: test_progs Tainted: G O 6.9.0-08561-g2c1713a8f1c9-dirty #69 [ 48.507629] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.15.0-0-g2dd4b9b3f840-prebuilt.qemu.org 04/01/2014 [ 48.511635] RIP: 0010:handle_softirqs+0x71/0x290 [ 48.511635] Code: [...] 10 0a 00 00 00 31 c0 65 66 89 05 d5 f4 fa 7e fb bb ff ff ff ff <49> c7 c2 cb [ 48.518527] RSP: 0018:ffffc90000310fa0 EFLAGS: 00000246 [ 48.519579] RAX: 0000000000000000 RBX: 00000000ffffffff RCX: 00000000000006e0 [ 48.522526] RDX: 0000000000000006 RSI: ffff88810791ae80 RDI: 0000000000000000 [ 48.523587] RBP: ffffc90000fabc88 R08: 00000005a0af4f7f R09: 0000000000000000 [ 48.525525] R10: 0000000561d2f29c R11: 0000000000006534 R12: 0000000000000280 [ 48.528525] R13: 0000000000000000 R14: 0000000000000000 R15: 0000000000000000 [ 48.528525] FS: 00007f2f2885cd00(0000) GS:ffff888237c40000(0000) knlGS:0000000000000000 [ 48.531600] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 48.535520] CR2: 00007f2f287059f0 CR3: 0000000106a28002 CR4: 00000000003706f0 [ 48.537538] Call Trace: [ 48.537538] <IRQ> [ 48.537538] ? watchdog_timer_fn+0x1cd/0x250 [ 48.539590] ? lockup_detector_update_enable+0x50/0x50 [ 48.539590] ? __hrtimer_run_queues+0xff/0x280 [ 48.542520] ? hrtimer_interrupt+0x103/0x230 [ 48.544524] ? __sysvec_apic_timer_interrupt+0x4f/0x140 [ 48.545522] ? sysvec_apic_timer_interrupt+0x3a/0x90 [ 48.547612] ? asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.547612] ? handle_softirqs+0x71/0x290 [ 48.547612] irq_exit_rcu+0x63/0x80 [ 48.551585] sysvec_apic_timer_interrupt+0x75/0x90 [ 48.552521] </IRQ> [ 48.553529] <TASK> [ 48.553529] asm_sysvec_apic_timer_interrupt+0x1a/0x20 [ 48.555609] RIP: 0010:finish_task_switch.isra.0+0x90/0x260 [ 48.556526] Code: [...] 9f 58 0a 00 00 48 85 db 0f 85 89 01 00 00 4c 89 ff e8 53 d9 bd 00 fb 66 90 <4d> 85 ed 74 [ 48.562524] RSP: 0018:ffffc90000fabd38 EFLAGS: 00000282 [ 48.563589] RAX: 0000000000000000 RBX: 0000000000000000 RCX: ffffffff83385620 [ 48.563589] RDX: ffff888237c73ae4 RSI: 0000000000000000 RDI: ffff888237c6fd00 [ 48.568521] RBP: ffffc90000fabd68 R08: 0000000000000000 R09: 0000000000000000 [ 48.569528] R10: 0000000000000001 R11: 0000000000000000 R12: ffff8881009d0000 [ 48.573525] R13: ffff8881024e5400 R14: ffff88810791ae80 R15: ffff888237c6fd00 [ 48.575614] ? finish_task_switch.isra.0+0x8d/0x260 [ 48.576523] __schedule+0x364/0xac0 [ 48.577535] schedule+0x2e/0x110 [ 48.578555] pipe_read+0x301/0x400 [ 48.579589] ? destroy_sched_domains_rcu+0x30/0x30 [ 48.579589] vfs_read+0x2b3/0x2f0 [ 48.579589] ksys_read+0x8b/0xc0 [ 48.583590] do_syscall_64+0x3d/0xc0 [ 48.583590] entry_SYSCALL_64_after_hwframe+0x4b/0x53 [ 48.586525] RIP: 0033:0x7f2f28703fa1 [ 48.587592] Code: [...] 00 00 00 0f 1f 44 00 00 f3 0f 1e fa 80 3d c5 23 14 00 00 74 13 31 c0 0f 05 <48> 3d 00 f0 [ 48.593534] RSP: 002b:00007ffd90f8cf88 EFLAGS: 00000246 ORIG_RAX: 0000000000000000 [ 48.595589] RAX: ffffffffffffffda RBX: 00007ffd90f8d5e8 RCX: 00007f2f28703fa1 [ 48.595589] RDX: 0000000000000001 RSI: 00007ffd90f8cfb0 RDI: 0000000000000006 [ 48.599592] RBP: 00007ffd90f8d2f0 R08: 0000000000000064 R09: 0000000000000000 [ 48.602527] R10: 0000000000000000 R11: 0000000000000246 R12: 0000000000000000 [ 48.603589] R13: 00007ffd90f8d608 R14: 00007f2f288d8000 R15: 0000000000f6bdb0 [ 48.605527] </TASK> In the test, two processes are communicating through pipe. Further debugging with strace found that the above splat is triggered as read() syscall could not receive the data even if the corresponding write() syscall in another process successfully wrote data into the pipe. The failed subtest is "send_signal_perf". The corresponding perf event has sample_period 1 and config PERF_COUNT_SW_CPU_CLOCK. sample_period 1 means every overflow event will trigger a call to the BPF program. So I suspect this may overwhelm the system. So I increased the sample_period to 100,000 and the test passed. The sample_period 10,000 still has the test failed. In other parts of selftest, e.g., [1], sample_freq is used instead. So I decided to use sample_freq = 1,000 since the test can pass as well. [1] https://lore.kernel.org/bpf/20240604070700.3032142-1-song@kernel.org/ Reported-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: Yonghong Song <yonghong.song@linux.dev> Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Link: https://lore.kernel.org/bpf/20240605201203.2603846-1-yonghong.song@linux.dev Signed-off-by: Sasha Levin <sashal@kernel.org>
This issue was closed.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Here I am again bothering you, but I need to install the header.
After moving to the penultimate kernel build installed with sudo dpkg -i x.deb i have rebooted and the bananapi r2 changed ip, no problem.
I logged in and proceeded to install in the same way linux-libc-dev_5.4.21-bpi-r2-main-1_armhf.deb, no problem.
I had previously installed it, but I wanted to provide a screen that it works.
I reboot again and after the wget for obtein the image and the header i try to install it in the same way, but i obtein this errors..
How can I solve it? Many thanks in advance
The text was updated successfully, but these errors were encountered: