Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

could not access debugfs when DSP panic or IPC failed #233

Closed
xiulipan opened this issue Oct 31, 2018 · 11 comments · Fixed by #237
Closed

could not access debugfs when DSP panic or IPC failed #233

xiulipan opened this issue Oct 31, 2018 · 11 comments · Fixed by #237
Assignees
Labels
APL Applies to ApolloLake platform bug Something isn't working P1 Blocker bugs or important features

Comments

@xiulipan
Copy link

When DSP panic or IPC failed. If we use sof-logger or rmbox we will see

sof-audio sof-audio: error: debugFS failed to resume -13
In dmesg.

Or if we want to access any debugfs exculde trace, the dmesg shows same thing and terminal refuse to open the file

sudo cat /sys/kernel/debug/sof/etrace
cat: /sys/kernel/debug/sof/etrace: Permission denied

Analysis:
trace and etrace used different read ops
For trace we used sof_dfsentry_read

static const struct file_operations sof_dfs_fops = {
	.open = sof_dfsentry_open,
	.read = sof_dfsentry_read,
	.llseek = default_llseek,
};

And the above dmesg is coming from

dev_err(sdev->dev, "error: debugFS failed to resume %d\n",

So the guess is here, when DSP panic or IPC failed, how would our pm_runtime_get_sync pm_runtime_put behavior?
We may need some fallback handler for this case.

The debugfs is very valuable and critical for our debug when error happens. But now it could not work.

@xiulipan
Copy link
Author

@libinyang
Could you share the workaround to get etrace on UP2?

@xiulipan xiulipan added the bug Something isn't working label Oct 31, 2018
@mengdonglin mengdonglin added P1 Blocker bugs or important features APL Applies to ApolloLake platform labels Oct 31, 2018
@mengdonglin
Copy link
Collaborator

@ranj063 This issue is observed when debugging thesofproject/sof#443 on UP2. But it should be a generic issue.

@keyonjie
Copy link

when DSP is panic, we should let reading trace debugFS entries(with old value at worst case) possible, let me try some fix to it.

@ranj063
Copy link
Collaborator

ranj063 commented Oct 31, 2018

@Keyon agree with you. Let me know if you need help

@mengdonglin
Copy link
Collaborator

mengdonglin commented Nov 1, 2018

@keyonjie please check if thesofproject/sof#518 could improve the logger health for your debugging.

@xiulipan
Copy link
Author

xiulipan commented Nov 1, 2018

@keyonjie @ranj063
What about tplg load fail?
This will also make resume fail, right?
So we may also want the debugfs can be read or just disable pm when error happen.

@keyonjie
Copy link

keyonjie commented Nov 1, 2018

@xiulipan why tplg load fail? we will not destroy the entry, so it should be still readable when resume fails.

@xiulipan
Copy link
Author

xiulipan commented Nov 1, 2018

@keyonjie
The resume fail will return error code to kernel. Thought you have copy the data to the right place, the error code will forbidden the normal read or open.
#237 is the workaround we are using now.

I will close this issues when #237 is merged.

@markyang
Copy link

Summary:
This issue can be reproduced on MinnowBoard when DSP panic.

dmesg:

[   25.122512] igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[   25.122512] igb 0000:03:00.0 enp3s0: igb: enp3s0 NIC Link is Up 100 Mbps Full Duplex, Flow Control: RX/TX
[   25.123050] IPv6: ADDRCONF(NETDEV_CHANGE): enp3s0: link becomes ready
[  189.178611] random: crng init done
[  189.178987] random: 7 urandom warning(s) missed due to ratelimiting
[  244.151059] systemd-journald[218]: File /var/log/journal/eadbe628db86481c8fbc460378113bb6/user-1000.journal corrupted or uncleanly shut down, renaming and replacing.
[12972.769703] sof-audio sof-audio: error: debugFS failed to resume -13
[13528.463143] perf: interrupt took too long (2549 > 2500), lowering kernel.perf_event_max_sample_rate to 78250
[13803.194880] sof-audio sof-audio: error: debugFS failed to resume -13

Test steps:
sudo sof-logger-2cd668c -l sof-apl.ldc-master-gcc-73475e3f
CORE LEVEL COMP_ID TIMESTAMP DELTA FILE_NAME CONTENT
sudo sof-logger-2cd668c -l sof-apl.ldc-master-gcc-73475e3f -t
CORE LEVEL COMP_ID TIMESTAMP DELTA FILE_NAME CONTENT

Test env:
sof master: 73475e3
sof tool: 2cd668c
kernel sof-dev: 165b34de
tplg: sof-byt-rt5651.tplg-2cd668c

Log:
dmesg-byt.log

@plbossart
Copy link
Member

@ranj063 does pm_runtime work on MinnowBoard? If not, maybe we should remove this capability for now to unlock such blocking issues?

@plbossart
Copy link
Member

plbossart commented Nov 14, 2018

Can we check if #237 fixes this issue?

Also can I get clarity on MinnowBoard support for pm_runtime, the issue above mentions debugFS failing to resume so things are not clear to me...

bardliao pushed a commit to bardliao/linux that referenced this issue Feb 19, 2025
When trying to mmap a trace instance buffer that is attached to
reserve_mem, it would crash:

 BUG: unable to handle page fault for address: ffffe97bd00025c8
 #PF: supervisor read access in kernel mode
 #PF: error_code(0x0000) - not-present page
 PGD 2862f3067 P4D 2862f3067 PUD 0
 Oops: Oops: 0000 [#1] PREEMPT_RT SMP PTI
 CPU: 4 UID: 0 PID: 981 Comm: mmap-rb Not tainted 6.14.0-rc2-test-00003-g7f1a5e3fbf9e-dirty thesofproject#233
 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2 04/01/2014
 RIP: 0010:validate_page_before_insert+0x5/0xb0
 Code: e2 01 89 d0 c3 cc cc cc cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 90 0f 1f 44 00 00 <48> 8b 46 08 a8 01 75 67 66 90 48 89 f0 8b 50 34 85 d2 74 76 48 89
 RSP: 0018:ffffb148c2f3f968 EFLAGS: 00010246
 RAX: ffff9fa5d3322000 RBX: ffff9fa5ccff9c08 RCX: 00000000b879ed29
 RDX: ffffe97bd00025c0 RSI: ffffe97bd00025c0 RDI: ffff9fa5ccff9c08
 RBP: ffffb148c2f3f9f0 R08: 0000000000000004 R09: 0000000000000004
 R10: 0000000000000000 R11: 0000000000000200 R12: 0000000000000000
 R13: 00007f16a18d5000 R14: ffff9fa5c48db6a8 R15: 0000000000000000
 FS:  00007f16a1b54740(0000) GS:ffff9fa73df00000(0000) knlGS:0000000000000000
 CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
 CR2: ffffe97bd00025c8 CR3: 00000001048c6006 CR4: 0000000000172ef0
 Call Trace:
  <TASK>
  ? __die_body.cold+0x19/0x1f
  ? __die+0x2e/0x40
  ? page_fault_oops+0x157/0x2b0
  ? search_module_extables+0x53/0x80
  ? validate_page_before_insert+0x5/0xb0
  ? kernelmode_fixup_or_oops.isra.0+0x5f/0x70
  ? __bad_area_nosemaphore+0x16e/0x1b0
  ? bad_area_nosemaphore+0x16/0x20
  ? do_kern_addr_fault+0x77/0x90
  ? exc_page_fault+0x22b/0x230
  ? asm_exc_page_fault+0x2b/0x30
  ? validate_page_before_insert+0x5/0xb0
  ? vm_insert_pages+0x151/0x400
  __rb_map_vma+0x21f/0x3f0
  ring_buffer_map+0x21b/0x2f0
  tracing_buffers_mmap+0x70/0xd0
  __mmap_region+0x6f0/0xbd0
  mmap_region+0x7f/0x130
  do_mmap+0x475/0x610
  vm_mmap_pgoff+0xf2/0x1d0
  ksys_mmap_pgoff+0x166/0x200
  __x64_sys_mmap+0x37/0x50
  x64_sys_call+0x1670/0x1d70
  do_syscall_64+0xbb/0x1d0
  entry_SYSCALL_64_after_hwframe+0x77/0x7f

The reason was that the code that maps the ring buffer pages to user space
has:

	page = virt_to_page((void *)cpu_buffer->subbuf_ids[s]);

And uses that in:

	vm_insert_pages(vma, vma->vm_start, pages, &nr_pages);

But virt_to_page() does not work with vmap()'d memory which is what the
persistent ring buffer has. It is rather trivial to allow this, but for
now just disable mmap() of instances that have their ring buffer from the
reserve_mem option.

If an mmap() is performed on a persistent buffer it will return -ENODEV
just like it would if the .mmap field wasn't defined in the
file_operations structure.

Cc: stable@vger.kernel.org
Cc: Masami Hiramatsu <mhiramat@kernel.org>
Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com>
Cc: Vincent Donnefort <vdonnefort@google.com>
Link: https://lore.kernel.org/20250214115547.0d7287d3@gandalf.local.home
Fixes: 9b7bdf6 ("tracing: Have trace_printk not use binary prints if boot buffer")
Signed-off-by: Steven Rostedt (Google) <rostedt@goodmis.org>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
APL Applies to ApolloLake platform bug Something isn't working P1 Blocker bugs or important features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

6 participants