-
Notifications
You must be signed in to change notification settings - Fork 132
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
dump IPC message payload to dmesg #118
Comments
[ Upstream commit 56bc615 ] The change protects almost the whole body of u_audio_iso_complete() function by PCM stream lock, this is mainly sufficient to avoid a race between USB request completion and stream termination, the change prevents a possibility of invalid memory access in interrupt context by memcpy(): Unable to handle kernel paging request at virtual address 00004e80 pgd = c0004000 [00004e80] *pgd=00000000 Internal error: Oops: 817 [#1] PREEMPT SMP ARM CPU: 0 PID: 3 Comm: ksoftirqd/0 Tainted: G C 3.14.54+ thesofproject#117 task: da180b80 ti: da192000 task.ti: da192000 PC is at memcpy+0x50/0x330 LR is at 0xcdd92b0e pc : [<c029ef30>] lr : [<cdd92b0e>] psr: 20000193 sp : da193ce4 ip : dd86ae26 fp : 0000b180 r10: daf81680 r9 : 00000000 r8 : d58a01ea r7 : 2c0b43e4 r6 : acdfb08b r5 : 01a271cf r4 : 87389377 r3 : 69469782 r2 : 00000020 r1 : daf82fe0 r0 : 00004e80 Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment kernel Control: 10c5387d Table: 2b70804a DAC: 00000015 Process ksoftirqd/0 (pid: 3, stack limit = 0xda192238) Also added a check for potential !runtime condition, commonly it is done by PCM_RUNTIME_CHECK(substream) in the beginning, however this does not completely prevent from oopses in u_audio_iso_complete(), because the proper protection scheme must be implemented in PCM library functions. An example of *not fixed* oops due to substream->runtime->* dereference by snd_pcm_running(substream) from snd_pcm_period_elapsed(), where substream->runtime is gone while waiting the substream lock: Unable to handle kernel paging request at virtual address 6b6b6b6b pgd = db7e4000 [6b6b6b6b] *pgd=00000000 CPU: 0 PID: 193 Comm: klogd Tainted: G C 3.14.54+ thesofproject#118 task: db5ac500 ti: db60c000 task.ti: db60c000 PC is at snd_pcm_period_elapsed+0x48/0xd8 [snd_pcm] LR is at snd_pcm_period_elapsed+0x40/0xd8 [snd_pcm] pc : [<>] lr : [<>] psr: 60000193 Flags: nZCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 10c5387d Table: 2b7e404a DAC: 00000015 Process klogd (pid: 193, stack limit = 0xdb60c238) [<>] (snd_pcm_period_elapsed [snd_pcm]) from [<>] (udc_irq+0x500/0xbbc) [<>] (udc_irq) from [<>] (ci_irq+0x280/0x304) [<>] (ci_irq) from [<>] (handle_irq_event_percpu+0xa4/0x40c) [<>] (handle_irq_event_percpu) from [<>] (handle_irq_event+0x3c/0x5c) [<>] (handle_irq_event) from [<>] (handle_fasteoi_irq+0xc4/0x110) [<>] (handle_fasteoi_irq) from [<>] (generic_handle_irq+0x20/0x30) [<>] (generic_handle_irq) from [<>] (handle_IRQ+0x80/0xc0) [<>] (handle_IRQ) from [<>] (gic_handle_irq+0x3c/0x60) [<>] (gic_handle_irq) from [<>] (__irq_svc+0x44/0x78) Signed-off-by: Vladimir Zapolskiy <vladimir_zapolskiy@mentor.com> [erosca: W/o this patch, with minimal instrumentation [1], I can consistently reproduce BUG: KASAN: use-after-free [2]] [1] Instrumentation to reproduce issue [2]: # diff --git a/drivers/usb/gadget/function/u_audio.c b/drivers/usb/gadget/function/u_audio.c # index a72295c953bb..bd0b308024fe 100644 # --- a/drivers/usb/gadget/function/u_audio.c # +++ b/drivers/usb/gadget/function/u_audio.c # @@ -16,6 +16,7 @@ # #include <sound/core.h> # #include <sound/pcm.h> # #include <sound/pcm_params.h> # +#include <linux/delay.h> # # #include "u_audio.h" # # @@ -147,6 +148,8 @@ static void u_audio_iso_complete(struct usb_ep *ep, struct usb_request *req) # # spin_unlock_irqrestore(&prm->lock, flags); # # + udelay(500); //delay here to increase probability of parallel activities # + # /* Pack USB load in ALSA ring buffer */ # pending = prm->dma_bytes - hw_ptr; [2] After applying [1], below BUG occurs on Rcar-H3-Salvator-X board: ================================================================== BUG: KASAN: use-after-free in u_audio_iso_complete+0x24c/0x520 [u_audio] Read of size 8 at addr ffff8006cafcc248 by task swapper/0/0 CPU: 0 PID: 0 Comm: swapper/0 Tainted: G WC 4.14.47+ thesofproject#160 Hardware name: Renesas Salvator-X board based on r8a7795 ES2.0+ (DT) Call trace: [<ffff2000080925ac>] dump_backtrace+0x0/0x364 [<ffff200008092924>] show_stack+0x14/0x1c [<ffff200008f8dbcc>] dump_stack+0x108/0x174 [<ffff2000083c71b8>] print_address_description+0x7c/0x32c [<ffff2000083c78e8>] kasan_report+0x324/0x354 [<ffff2000083c6114>] __asan_load8+0x24/0x94 [<ffff2000021d1b34>] u_audio_iso_complete+0x24c/0x520 [u_audio] [<ffff20000152fe50>] usb_gadget_giveback_request+0x480/0x4d0 [udc_core] [<ffff200001860ab8>] usbhsg_queue_done+0x100/0x130 [renesas_usbhs] [<ffff20000185f814>] usbhsf_pkt_handler+0x1a4/0x298 [renesas_usbhs] [<ffff20000185fb38>] usbhsf_irq_ready+0x128/0x178 [renesas_usbhs] [<ffff200001859cc8>] usbhs_interrupt+0x440/0x490 [renesas_usbhs] [<ffff2000081a0288>] __handle_irq_event_percpu+0x594/0xa58 [<ffff2000081a07d0>] handle_irq_event_percpu+0x84/0x12c [<ffff2000081a0928>] handle_irq_event+0xb0/0x10c [<ffff2000081a8384>] handle_fasteoi_irq+0x1e0/0x2ec [<ffff20000819e5f8>] generic_handle_irq+0x2c/0x44 [<ffff20000819f0d0>] __handle_domain_irq+0x190/0x194 [<ffff20000808177c>] gic_handle_irq+0x80/0xac Exception stack(0xffff200009e97c80 to 0xffff200009e97dc0) 7c80: 0000000000000000 0000000000000000 0000000000000003 ffff200008179298 7ca0: ffff20000ae1c180 dfff200000000000 0000000000000000 ffff2000081f9a88 7cc0: ffff200009eb5960 ffff200009e97cf0 0000000000001600 ffff0400041b064b 7ce0: 0000000000000000 0000000000000002 0000000200000001 0000000000000001 7d00: ffff20000842197c 0000ffff958c4970 0000000000000000 ffff8006da0d5b80 7d20: ffff8006d4678498 0000000000000000 000000126bde0a8b ffff8006d4678480 7d40: 0000000000000000 000000126bdbea64 ffff200008fd0000 ffff8006fffff980 7d60: 00000000495f0018 ffff200009e97dc0 ffff200008b6c4ec ffff200009e97dc0 7d80: ffff200008b6c4f0 0000000020000145 ffff8006da0d5b80 ffff8006d4678498 7da0: ffffffffffffffff ffff8006d4678498 ffff200009e97dc0 ffff200008b6c4f0 [<ffff200008084034>] el1_irq+0xb4/0x12c [<ffff200008b6c4f0>] cpuidle_enter_state+0x818/0x844 [<ffff200008b6c59c>] cpuidle_enter+0x18/0x20 [<ffff20000815f2e4>] call_cpuidle+0x98/0x9c [<ffff20000815f674>] do_idle+0x214/0x264 [<ffff20000815facc>] cpu_startup_entry+0x20/0x24 [<ffff200008fb09d8>] rest_init+0x30c/0x320 [<ffff2000095f1338>] start_kernel+0x570/0x5b0 ---<-snip->--- Fixes: 132fcb4 ("usb: gadget: Add Audio Class 2.0 Driver") Signed-off-by: Eugeniu Rosca <erosca@de.adit-jv.com> Signed-off-by: Felipe Balbi <felipe.balbi@linux.intel.com> Signed-off-by: Sasha Levin <alexander.levin@microsoft.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Do we still need this feature? @xiulipan |
@wenqingfu yes we do need this feature. This will be very helpful in making sure the IPC payload isnt tainted before it is sent to the DSP |
I actually wanted to have a "dump IPC" feature but for a completely different reason - to use a live system to record topology IPC messages to then create an object from those messages to be integrated with the DSP firmware image for the host-less mode. But I would actually use something different than the kernel logging for that. Something like debugfs / /proc / netlink or even a new ioctl to an existing device node... But I'm not sure how IPCs can be tainted - don't we have a much larger problem if IPC messages can be tainted on a system? @ranj063 could you clarify? |
@lyakh I do not think they are. This is related to the issue thesofproject/sof#766. |
@ranj063 @xiulipan @wenqingfu what if we do this as a debugfs file with a blocking read? I'm not a huge fan of polluting the kernel log with potentially many (dozens? hundreds?) messages per second. |
Creating a macvtap on a DSA-backed interface results in the following splat when lockdep is enabled: [ 19.638080] IPv6: ADDRCONF(NETDEV_CHANGE): lan0: link becomes ready [ 23.041198] device lan0 entered promiscuous mode [ 23.043445] device eth0 entered promiscuous mode [ 23.049255] [ 23.049557] ============================================ [ 23.055021] WARNING: possible recursive locking detected [ 23.060490] 5.0.0-rc3-00013-g56c857a1b8d3 #118 Not tainted [ 23.066132] -------------------------------------------- [ 23.071598] ip/2861 is trying to acquire lock: [ 23.076171] 00000000f61990cb (_xmit_ETHER){+...}, at: dev_set_rx_mode+0x1c/0x38 [ 23.083693] [ 23.083693] but task is already holding lock: [ 23.089696] 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70 [ 23.096774] [ 23.096774] other info that might help us debug this: [ 23.103494] Possible unsafe locking scenario: [ 23.103494] [ 23.109584] CPU0 [ 23.112093] ---- [ 23.114601] lock(_xmit_ETHER); [ 23.117917] lock(_xmit_ETHER); [ 23.121233] [ 23.121233] *** DEADLOCK *** [ 23.121233] [ 23.127325] May be due to missing lock nesting notation [ 23.127325] [ 23.134315] 2 locks held by ip/2861: [ 23.137987] #0: 000000003b766c72 (rtnl_mutex){+.+.}, at: rtnetlink_rcv_msg+0x338/0x4e0 [ 23.146231] #1: 00000000ecf0c3b4 (_xmit_ETHER){+...}, at: dev_uc_add+0x24/0x70 [ 23.153757] [ 23.153757] stack backtrace: [ 23.158243] CPU: 0 PID: 2861 Comm: ip Not tainted 5.0.0-rc3-00013-g56c857a1b8d3 #118 [ 23.166212] Hardware name: Globalscale Marvell ESPRESSOBin Board (DT) [ 23.172843] Call trace: [ 23.175358] dump_backtrace+0x0/0x188 [ 23.179116] show_stack+0x14/0x20 [ 23.182524] dump_stack+0xb4/0xec [ 23.185928] __lock_acquire+0x123c/0x1860 [ 23.190048] lock_acquire+0xc8/0x248 [ 23.193724] _raw_spin_lock_bh+0x40/0x58 [ 23.197755] dev_set_rx_mode+0x1c/0x38 [ 23.201607] dev_set_promiscuity+0x3c/0x50 [ 23.205820] dsa_slave_change_rx_flags+0x5c/0x70 [ 23.210567] __dev_set_promiscuity+0x148/0x1e0 [ 23.215136] __dev_set_rx_mode+0x74/0x98 [ 23.219167] dev_uc_add+0x54/0x70 [ 23.222575] macvlan_open+0x170/0x1d0 [ 23.226336] __dev_open+0xe0/0x160 [ 23.229830] __dev_change_flags+0x16c/0x1b8 [ 23.234132] dev_change_flags+0x20/0x60 [ 23.238074] do_setlink+0x2d0/0xc50 [ 23.241658] __rtnl_newlink+0x5f8/0x6e8 [ 23.245601] rtnl_newlink+0x50/0x78 [ 23.249184] rtnetlink_rcv_msg+0x360/0x4e0 [ 23.253397] netlink_rcv_skb+0xe8/0x130 [ 23.257338] rtnetlink_rcv+0x14/0x20 [ 23.261012] netlink_unicast+0x190/0x210 [ 23.265043] netlink_sendmsg+0x288/0x350 [ 23.269075] sock_sendmsg+0x18/0x30 [ 23.272659] ___sys_sendmsg+0x29c/0x2c8 [ 23.276602] __sys_sendmsg+0x60/0xb8 [ 23.280276] __arm64_sys_sendmsg+0x1c/0x28 [ 23.284488] el0_svc_common+0xd8/0x138 [ 23.288340] el0_svc_handler+0x24/0x80 [ 23.292192] el0_svc+0x8/0xc This looks fairly harmless (no actual deadlock occurs), and is fixed in a similar way to c6894de ("bridge: fix lockdep addr_list_lock false positive splat") by putting the addr_list_lock in its own lockdep class. Signed-off-by: Marc Zyngier <marc.zyngier@arm.com> Reviewed-by: Florian Fainelli <f.fainelli@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
@lyakh |
@xiulipan it would be a different debugfs file, sure, similar to the logger. Yes, you would have to capture it explicitly, it wouldn't be saved automatically, and it would be more difficult to synchronise IPC messages with the rest of debugging output. So, we have to weigh the pros and the cons. |
@lyakh |
@lyakh can you confirm you are working on this? there's been no update since Feb 14. I believe this is needed to help our firmware colleagues reproduce issues with the scripts. |
@plbossart @lgirdwood I'd like to have some more opinions on my proposal to implement this as a debugfs file in addition to @xiulipan's comment |
@lyakh I don't see any problems with saving the IPC payload with an appropriate format and time stamping in a debugfs file. We already have too many logs in dmesg, and adding all the content in dmesg isn't scalable/usable. |
@plbossart Mark had preference for this going via trace as ASCII hex data. |
@lgirdwood i was indicating my preference for a solution that doesn't rely on dmesg - as described in the title. we could do the debugfs first and then add the integration in the Linux trace. The latter would also need to be done for the sof-logger DSP trace btw if we want to be consistent. |
@plbossart @lgirdwood @wenqingfu @lbetlej Could we prioritize this feature? By parsing dumped payload we could recreate many sequences using our slim driver, which would speed up the process of debugging issues. The similar mechanism worked great for Windows. Of course to recreate the correct sequence the log would need to also have some additional information, like host stream buffer info, stream trigger etc. |
@tlauda what do you mean with additional information? If we dump the IPC payload in debugfs you'll get the IPC payload. If you want more context for this to be useful, then we'd have to instrument all IPC usages and define a syntax for what you want. |
…_cached_inodes teaching: labs: filesystems part 2: minfs: return inode from cache
CI: simplify for readability
Dump IPC message payload to help debug and reproduce IPC message.
The text was updated successfully, but these errors were encountered: