-
Notifications
You must be signed in to change notification settings - Fork 131
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ASoC: SOF: pm: don't return error when CTX_SAVE ipc fails #460
ASoC: SOF: pm: don't return error when CTX_SAVE ipc fails #460
Conversation
f1a88e9
to
b333a9d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
that sounds either too good to be true or fishy?
sound/soc/sof/pm.c
Outdated
@@ -318,10 +318,13 @@ static int sof_suspend(struct device *dev, bool runtime_suspend) | |||
/* notify DSP of upcoming power down */ | |||
ret = sof_send_pm_ipc(sdev, SOF_IPC_PM_CTX_SAVE); | |||
if (ret < 0) { | |||
/* | |||
* DSP will be powered off right after. | |||
* So just report the error and continue. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That sounds like kicking the can down the read. If the IPC is not successful, doesn't it indicate trouble ahead on resume?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@plbossart this PR is by no means meant to address the underlying problem. I am only avoiding the problem that occurs when this IPC timesout for reasons not known so far. So it is only a workaround until the root cause is found. We're seeing repeated IPC timeouts on this particular iPC and if it times out, we reach a point of no recovery today.
On 12/17/18 6:19 PM, Ranjani Sridharan wrote:
***@***.**** commented on this pull request.
------------------------------------------------------------------------
In sound/soc/sof/pm.c
<#460 (comment)>:
> @@ -318,10 +318,13 @@ static int sof_suspend(struct device *dev, bool runtime_suspend)
/* notify DSP of upcoming power down */
ret = sof_send_pm_ipc(sdev, SOF_IPC_PM_CTX_SAVE);
if (ret < 0) {
+ /*
+ * DSP will be powered off right after.
+ * So just report the error and continue.
@plbossart <https://github.com/plbossart> this PR is by no means meant
to address the underlying problem. I am only avoiding the problem that
occurs when this IPC timesout for reasons not known so far. So it is
only a workaround until the root cause is found. We're seeing repeated
IPC timeouts on this particular iPC and if it times out, we reach a
point of no recovery today
I have no issues merging it then, but please update the PR to make it
clear in the commit message and comment that this is a FIXME and that
future improvements are needed to solve this for real.
… |
CTX_SAVE ipc is sent right before the DSP is powered off. Returning an error if this ipc fails causes the resume sequence to fail as well. Instead just report the error and continue to power off the DSP. This is only a temporary workaround until the real issue with CTX_SAVE ipc's timing out is root caused. Signed-off-by: Ranjani Sridharan <ranjani.sridharan@linux.intel.com>
b333a9d
to
9ae9ada
Compare
@plbossart fixed the comment and commit message now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good workaround, our suspending is a special case, power off can recover kinds of errors we met and make driver more robust.
Furthermore, we can even add more ADSP reset(suspend/resume) at critical issues happening, e.g. DSP panic, IPC timeout, ...
/* | ||
* DSP will be powered off right after. | ||
* So just report the error and continue. | ||
*/ | ||
dev_err(sdev->dev, | ||
"error: ctx_save ipc error during suspend %d\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"but we will continue power off Audio DSP..."
The port->hsr is used in the hsr_handle_frame(), which is a callback of rx_handler. hsr master and slaves are initialized in hsr_add_port(). This function initializes several pointers, which includes port->hsr after registering rx_handler. So, in the rx_handler routine, un-initialized pointer would be used. In order to fix this, pointers should be initialized before registering rx_handler. Test commands: ip netns del left ip netns del right modprobe -rv veth modprobe -rv hsr killall ping modprobe hsr ip netns add left ip netns add right ip link add veth0 type veth peer name veth1 ip link add veth2 type veth peer name veth3 ip link add veth4 type veth peer name veth5 ip link set veth1 netns left ip link set veth3 netns right ip link set veth4 netns left ip link set veth5 netns right ip link set veth0 up ip link set veth2 up ip link set veth0 address fc:00:00:00:00:01 ip link set veth2 address fc:00:00:00:00:02 ip netns exec left ip link set veth1 up ip netns exec left ip link set veth4 up ip netns exec right ip link set veth3 up ip netns exec right ip link set veth5 up ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip netns exec left ip link add hsr1 type hsr slave1 veth1 slave2 veth4 ip netns exec left ip a a 192.168.100.2/24 dev hsr1 ip netns exec left ip link set hsr1 up ip netns exec left ip n a 192.168.100.1 dev hsr1 lladdr \ fc:00:00:00:00:01 nud permanent ip netns exec left ip n r 192.168.100.1 dev hsr1 lladdr \ fc:00:00:00:00:01 nud permanent for i in {1..100} do ip netns exec left ping 192.168.100.1 & done ip netns exec left hping3 192.168.100.1 -2 --flood & ip netns exec right ip link add hsr2 type hsr slave1 veth3 slave2 veth5 ip netns exec right ip a a 192.168.100.3/24 dev hsr2 ip netns exec right ip link set hsr2 up ip netns exec right ip n a 192.168.100.1 dev hsr2 lladdr \ fc:00:00:00:00:02 nud permanent ip netns exec right ip n r 192.168.100.1 dev hsr2 lladdr \ fc:00:00:00:00:02 nud permanent for i in {1..100} do ip netns exec right ping 192.168.100.1 & done ip netns exec right hping3 192.168.100.1 -2 --flood & while : do ip link add hsr0 type hsr slave1 veth0 slave2 veth2 ip a a 192.168.100.1/24 dev hsr0 ip link set hsr0 up ip link del hsr0 done Splat looks like: [ 120.954938][ C0] general protection fault, probably for non-canonical address 0xdffffc0000000006: 0000 [thesofproject#1]I [ 120.957761][ C0] KASAN: null-ptr-deref in range [0x0000000000000030-0x0000000000000037] [ 120.959064][ C0] CPU: 0 PID: 1511 Comm: hping3 Not tainted 5.6.0-rc5+ thesofproject#460 [ 120.960054][ C0] Hardware name: innotek GmbH VirtualBox/VirtualBox, BIOS VirtualBox 12/01/2006 [ 120.962261][ C0] RIP: 0010:hsr_addr_is_self+0x65/0x2a0 [hsr] [ 120.963149][ C0] Code: 44 24 18 70 73 2f c0 48 c1 eb 03 48 8d 04 13 c7 00 f1 f1 f1 f1 c7 40 04 00 f2 f2 f2 4 [ 120.966277][ C0] RSP: 0018:ffff8880d9c09af0 EFLAGS: 00010206 [ 120.967293][ C0] RAX: 0000000000000006 RBX: 1ffff1101b38135f RCX: 0000000000000000 [ 120.968516][ C0] RDX: dffffc0000000000 RSI: ffff8880d17cb208 RDI: 0000000000000000 [ 120.969718][ C0] RBP: 0000000000000030 R08: ffffed101b3c0e3c R09: 0000000000000001 [ 120.972203][ C0] R10: 0000000000000001 R11: ffffed101b3c0e3b R12: 0000000000000000 [ 120.973379][ C0] R13: ffff8880aaf80100 R14: ffff8880aaf800f2 R15: ffff8880aaf80040 [ 120.974410][ C0] FS: 00007f58e693f740(0000) GS:ffff8880d9c00000(0000) knlGS:0000000000000000 [ 120.979794][ C0] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 120.980773][ C0] CR2: 00007ffcb8b38f29 CR3: 00000000afe8e001 CR4: 00000000000606f0 [ 120.981945][ C0] Call Trace: [ 120.982411][ C0] <IRQ> [ 120.982848][ C0] ? hsr_add_node+0x8c0/0x8c0 [hsr] [ 120.983522][ C0] ? rcu_read_lock_held+0x90/0xa0 [ 120.984159][ C0] ? rcu_read_lock_sched_held+0xc0/0xc0 [ 120.984944][ C0] hsr_handle_frame+0x1db/0x4e0 [hsr] [ 120.985597][ C0] ? hsr_nl_nodedown+0x2b0/0x2b0 [hsr] [ 120.986289][ C0] __netif_receive_skb_core+0x6bf/0x3170 [ 120.992513][ C0] ? check_chain_key+0x236/0x5d0 [ 120.993223][ C0] ? do_xdp_generic+0x1460/0x1460 [ 120.993875][ C0] ? register_lock_class+0x14d0/0x14d0 [ 120.994609][ C0] ? __netif_receive_skb_one_core+0x8d/0x160 [ 120.995377][ C0] __netif_receive_skb_one_core+0x8d/0x160 [ 120.996204][ C0] ? __netif_receive_skb_core+0x3170/0x3170 [ ... ] Reported-by: syzbot+fcf5dd39282ceb27108d@syzkaller.appspotmail.com Fixes: c5a7591 ("net/hsr: Use list_head (and rcu) instead of array for slave devices.") Signed-off-by: Taehee Yoo <ap420073@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
CTX_SAVE ipc is sent right before the DSP is powered off.
Returning an error if this ipc fails causes the resume
sequence to fail as well. Instead just report the error
and continue to power off the DSP. This is only a
temporary workaround until the real issue with
CTX_SAVE ipc's timing out is root caused.
Signed-off-by: Ranjani Sridharan ranjani.sridharan@linux.intel.com