Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] error trace debugfs entry (etrace) is always empty #3265

Closed
keyonjie opened this issue Jul 31, 2020 · 17 comments · Fixed by #3278
Closed

[BUG] error trace debugfs entry (etrace) is always empty #3265

keyonjie opened this issue Jul 31, 2020 · 17 comments · Fixed by #3278
Assignees
Labels
bug Something isn't working as expected P1 Blocker bugs or important features

Comments

@keyonjie
Copy link
Contributor

The error trace debugfs entry (/sys/kernel/debug/sof/etrace) is always empty which lead to we can't get any error trace with sof-logger (sof-logger -l *.ldc), this is bad for debugging when DMA trace doesn't work because of FW hanged or crashed.

To Reproduce
Do anything can lead to FW reporting tr_err() or comp_err(), and check the "sof-logger -l *.ldc".

Reproduction Rate
100%

Expected behavior
We should see the error trace logs in sof-logger output.

Impact
This will make it difficult for FW debugging when FW is hang or crashed.

Environment
Any Intel platforms

@keyonjie keyonjie added bug Something isn't working as expected P1 Blocker bugs or important features labels Jul 31, 2020
@paulstelian97
Copy link
Collaborator

paulstelian97 commented Jul 31, 2020

Either this is a regression (I have used it on i.MX8 initially before adding the DMA trace support -- heck, before adding the host DMA in the first place) or something is different between Intel and NXP platforms.

Is there any known working commit a few months ago so you can start bisecting?

@plbossart
Copy link
Member

plbossart commented Jul 31, 2020

We should really have a first message to check that the trace works. An empty trace is not-testable, it can mean the trace is not working or there is no error message. An 'Hello SOF" seems required for CI/daily tests to check the trace functionality, no?

@mmaka1 @lgirdwood @xiulipan @marc-hb FYI

@marc-hb
Copy link
Collaborator

marc-hb commented Jul 31, 2020

@keyonjie does revert #3245 help?

We should really have a first message to check that the trace works. An empty trace is not-testable, it can mean the trace is not working or there is no error message. An 'Hello SOF" seems required for CI/daily tests to check the trace functionality, no?

thesofproject/sof-test#297 thesofproject/sof-test#298

@slawblauciak
Copy link
Collaborator

Issue not apparent when testing with the slim driver, maybe a kernel regression?

@dbaluta
Copy link
Collaborator

dbaluta commented Aug 4, 2020

Cc: @IulianOlaru249 @crazoes

I also noticed this last week but I thought it's just me being unable to use sof-logger properly.

@lyakh
Copy link
Collaborator

lyakh commented Aug 4, 2020

I hardly remember seeing error trace data in "etrace" ever. Looking through both kernel and firmware sources it seems like the kernel expects two memory window regions - SOF_IPC_REGION_DEBUG for debug and SOF_IPC_REGION_TRACE for errors, in the firmware e.g. on CAVS platforms they are initialised by EXT_MAN_PORT() in src/platform/intel/cavs/platform.c - so far so good. But I don't see the DEBUG region being used. Both _trace_error_with_ids() and _trace_event_with_ids() seem to translate to the same code in the end. I also checked one of daily tests on ICL from yesterday, where there is an "ERROR" entry in debug trace from hda_dma_link_check_xrun() where tr_err() is called, but "etrace" is empty. So, it seems like a global error to me.

@lgirdwood
Copy link
Member

We should really have a first message to check that the trace works. An empty trace is not-testable, it can mean the trace is not working or there is no error message. An 'Hello SOF" seems required for CI/daily tests to check the trace functionality, no?

@mmaka1 @lgirdwood @xiulipan @marc-hb FYI

Ack - "Hello SOF". @marc-hb do you fancy making this change, it should be a one liner for each platform in the FW ready function to add a trace_error call with the above string..

@xiulipan
Copy link
Contributor

xiulipan commented Aug 5, 2020

@lgirdwood I am OK with this!
@marc-hb And I think you can also make patches to sof-test scripts to make a valid test for error logger now.

@lyakh
Copy link
Collaborator

lyakh commented Aug 5, 2020

@xiulipan @akloniex so how is the error trace supposed to work and when did it work last time? I looked at v1.0 and v1.1 and even there I couldn't so far find that out.

@xiulipan
Copy link
Contributor

xiulipan commented Aug 6, 2020

@lyakh To understand how the trace work, we need to know we have 2 kinds of trace:

  1. error trace: a simple shared memory window between DSP and host (same as the IPC mailbox), which means this kind of trace can only be directly access when DSP in D0. To make it work with PM, we have some method to cache this filed into debugfs when enter suspend, but I think it is not prefect at now and will lose data in some case (tplg load fail message will lost becaise of unfinished multiple core support feature)
  2. DMA trace: something like a capture stream. We have set up buffer in debugfs to store data copied by DMA from DSP. As the buffer is limited size, so it can only hold limit size logs in the past. But if you open the sof-logger before the test, it can record all logs.

To understand more about error trace, we need first to know where is the buffer. Following code take APL as example:
We have about 8KB buffer for error log:

/* window 3 */
#define SRAM_TRACE_BASE SRAM_WND_BASE
#if CONFIG_TRACE
#define SRAM_TRACE_SIZE 0x2000
#else
#define SRAM_TRACE_SIZE 0
#endif

/* window 3 - trace */
#define MAILBOX_TRACE_SIZE SRAM_TRACE_SIZE
#define MAILBOX_TRACE_BASE SRAM_TRACE_BASE

Function for trace written to mailbox:

sof/src/trace/trace.c

Lines 68 to 97 in cf74566

#if CONFIG_TRACEM
static inline void mtrace_event(const char *data, uint32_t length)
{
struct trace *trace = trace_get();
volatile char *t;
uint32_t i, available;
available = MAILBOX_TRACE_SIZE - trace->pos;
t = (volatile char *)(MAILBOX_TRACE_BASE);
/* write until we run out of space */
for (i = 0; i < available && i < length; i++)
t[trace->pos + i] = data[i];
dcache_writeback_region((void *)&t[trace->pos], i);
trace->pos += length;
/* if there was more data than space available, wrap back */
if (length > available) {
for (i = 0; i < length - available; i++)
t[i] = data[available + i];
dcache_writeback_region((void *)t, i);
trace->pos = i;
}
platform_shared_commit(trace, sizeof(*trace));
}
#endif /* CONFIG_TRACEM */

I am also look into this issue, it seems we have some error in trace implement.

@lyakh
Copy link
Collaborator

lyakh commented Aug 6, 2020

@lyakh To understand how the trace work, we need to know we have 2 kinds of trace:
[snip]
I am also look into this issue, it seems we have some error in trace implement.

@xiulipan thanks for the explanation, that's also my impression, that the current implementation isn't right and I don't understand whether and when and how it ever worked in the past.

@xiulipan
Copy link
Contributor

xiulipan commented Aug 6, 2020

@lyakh I did some debug and test, root cause to be we never copy anything into mailbox even if the trace is a ERROR. But all the functionality for MBOX trace is still OK. Have a simple fix: #3265

@xiulipan
Copy link
Contributor

xiulipan commented Aug 6, 2020

@plbossart @lgirdwood Now all trace is refined to link with comp ctx and uuid, I found it is not that easy to just add one line Hello SOF directly into our code.
Any idea here? Which comp we can used here to let the hello message linked to?

@paulstelian97
Copy link
Collaborator

paulstelian97 commented Aug 6, 2020

@plbossart @lgirdwood Now all trace is refined to link with comp ctx and uuid, I found it is not that easy to just add one line Hello SOF directly into our code.
Any idea here? Which comp we can used here to let the hello message linked to?

I typically use IPC for such messages in my internal tests that don't have an obvious component. But any component that is obvious and already included/visible in the current context is good.

@dbaluta
Copy link
Collaborator

dbaluta commented Aug 6, 2020

@xiulipan thanks a lot for your information about traces. This is the kind of information we need to write it in sof-docs.

@lyakh
Copy link
Collaborator

lyakh commented Aug 6, 2020

@lyakh I did some debug and test, root cause to be we never copy anything into mailbox even if the trace is a ERROR. But all the functionality for MBOX trace is still OK. Have a simple fix: #3278

@xiulipan nice, just as I said - we never actually copy the data to the mailbox, no idea how it has been tested with the slim driver... Wondering whether it ever worked and since when it broke down.

@xiulipan
Copy link
Contributor

xiulipan commented Aug 7, 2020

@lyakh I tried to figure out why and when the trace is broken, but no luck because of the refinement
@dbaluta Yes, I think we may missed this in sof-docs. Would you like to review my comments above and add them into doc?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working as expected P1 Blocker bugs or important features
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants