Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Check for a valid trace for ALL tests #297

Closed
plbossart opened this issue Jul 28, 2020 · 17 comments · Fixed by #662
Closed

Check for a valid trace for ALL tests #297

plbossart opened this issue Jul 28, 2020 · 17 comments · Fixed by #662
Assignees
Labels
area:logs Log and results collection, storage, etc. False Pass / green failure P1 Blocker bugs or important features state:blocked type:enhancement New framework feature or request

Comments

@plbossart
Copy link
Member

plbossart commented Jul 28, 2020

We need to double-check that the trace is functional and that sof-logger reports at least the firmware details and a dai trigger.

If there is no trace then there's no point in checking results and debugging further.

cc:

  • many others, follow the links.
@marc-hb
Copy link
Collaborator

marc-hb commented Sep 11, 2020

First part submitted in #373, please help review.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 15, 2020

From @xiulipan in #373

it that behavior right? Do we need to fail the case if we could not get the logger? As we have standalone test case for logger.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 15, 2020

Do we need to fail the case if we could not get the logger?

I think what you're asking is: "is the logger a low priority feature"? I read the description of this request as a pretty clear "no" but I'd like to hear more from others.

As we have standalone test case for logger.

It's risky to rely on a single test, if the test is broken then the feature can be broken and no one will notice. Even if the test is not broken, the logger could be broken only in some scenarios not tested by the logger test. I think the links above are some real-world examples of either.

A test is never meant to test only one thing, the more extras each test can test (in no extra time) and the better. Unexpected failures mean extra bugs are found which is great.

These questions sounds strange to me at a time when the logger seems to be working fine, I mean all tests in #373 pass right now. I mean it sounds very strange to worry about failures we don't even have right now (this reminds me of recent #372) It almost sounds like like a desire to find FEWER bugs / and have more "green failures"?

@xiulipan
Copy link
Contributor

@marc-hb What I want to make clarify about is the criterion for our test case pass/fail.
Should we make aplay test fail if a debug feature is not working?

A test is never meant to test only one thing, the more extras each test can test (in no extra time) and the better.

I do not think we are adding test cases in this way. Each test case in this repo is focusing in a single feature. This will help us to narrow down the issue.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 15, 2020

I do not think we are adding test cases in this way. Each test case in this repo is focusing in a single feature. This will help us to narrow down the issue.

This is :

  • neither possible because sof-test are not unit tests. They all run from user space so each test involves many lines of code
  • nor desirable unless you want to find fewer bugs - as it happened in the links above and elsewhere.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 18, 2020

I mean all tests in #373 pass right now.

They didn't pass, they only appear to pass because I misused exit_failure=2 in #373 where 2 means "Not Applicable" and Not Applicable is green. I didn't know 2 was a special value.

I mean it sounds very strange to worry about [logger] failures we don't even have right now

Now we have one: pretty much all tests involving the logger fail to kill -INT it at the end, see new bug https://github.com/thesofproject/sof/issues/3433 [BUG] sof-logger resisted pkill -INT, using -KILL

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 18, 2020

multipipeline tests returning an empty trace: #380 (comment)

marc-hb added a commit to marc-hb/sof-test that referenced this issue Sep 18, 2020
Does not completely fixes thesofproject#297 but goes a long way.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this issue Sep 18, 2020
Does not completely fixes thesofproject#297 but goes a long way.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator

marc-hb commented Sep 18, 2020

multiple-pipeline-capture.sh and multiple-pipeline-playback.sh have a consistently empty trace on about half the platforms: https://sof-ci.01.org/softestpr/PR383/build194/devicetest/

2020-09-18 05:13:41 UTC [REMOTE_INFO] nlines=1 /home/ubuntu/sof-test/logs/multiple-pipeline-playback/2020-09-18-05:13:29-16938/slogger.txt
2020-09-18 05:13:41 UTC [REMOTE_ERROR] Empty logger trace
2020-09-18 05:13:41 UTC [REMOTE_INFO] Test Result: FAIL!

@aiChaoSONG does this seem related to your #359 PR?

marc-hb added a commit to marc-hb/sof-test that referenced this issue Sep 18, 2020
Does not completely fixes thesofproject#297 but goes a long way.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@aiChaoSONG
Copy link

aiChaoSONG commented Sep 18, 2020

@marc-hb Yes, they are related. without #359, these two case are actually testing nothing, and always pass. It is the sof-logger issue which makes the multiple-pipeline-playback/capture failed after your #373 patch

marc-hb added a commit to marc-hb/sof-test that referenced this issue Sep 19, 2020
... because it makes no sense.

When a test using the logger fails it produces a confusing error
message like this:

2020-09-18 10:17:25 UTC [REMOTE_INFO] Starting /usr/bin/sof-logger
   -l /etc/sof/sof-byt.ldc -o
    /home/ubuntu/sof-test/logs/check-alsabat/<date>/etrace.txt
error: in logger_read(), fread(..., /sys/kernel/debug/sof/etrace)
    failed:  Inappropriate ioctl for device(25)

Reported by Pierre in #384

This mistake seems to have been there since the dawn of time however no
one noticed because most things logger-related (and others) have been
silenced so far - which is changing now with thesofproject#297.

The problem this fixes can be reproduced trivially with this one-line
patch:

--- a/case-lib/lib.sh
+++ b/case-lib/lib.sh
@@ -125,6 +125,7 @@ func_lib_start_log_collect()
     dlogi "Starting $loggerCmd"
     # Cleaned up by func_exit_handler() in hijack.sh
     sudo "$loggerCmd" &
+    exit 1
 }

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit that referenced this issue Apr 23, 2021
Does not completely fixes #297 but goes a long way.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this issue Jun 9, 2021
Probably the main change is fixing the huge etrace test gaps thesofproject#321 and
thesofproject/sof#3281

Also fixes DMA trace gaps thesofproject#297 and thesofproject#298

I initial tried to preserve some of the existing code but it was just
too bad. PR thesofproject#161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in thesofproject#167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit that referenced this issue Jun 10, 2021
Probably the main change is fixing the huge etrace test gaps #321 and
thesofproject/sof#3281

Also fixes DMA trace gaps #297 and #298

I initial tried to preserve some of the existing code but it was just
too bad. PR #161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in #167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator

marc-hb commented Jun 12, 2021

After finally testing this a bit (#666) and performing some overdue "data-mining" in test results it's very clear the DMA trace has been very unreliable since forever: Empty DMA trace thesofproject/sof/issues/4333. On some platforms much more than others.

@marc-hb marc-hb added area:logs Log and results collection, storage, etc. False Pass / green failure labels Jul 3, 2021
kv2019i added a commit to kv2019i/sof-docs that referenced this issue Nov 10, 2021
If sof-logger is started (or restarted) while DSP is running,
the initial traces may come in incorrect order or are incomplete.
This is important to note when parsing the logger results.

BugLink: thesofproject/sof-test#297
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
kv2019i added a commit to kv2019i/sof-docs that referenced this issue Nov 12, 2021
If sof-logger is started (or restarted) while DSP is running,
the initial traces may come in incorrect order or are incomplete.
This is important to note when parsing the logger results.

BugLink: thesofproject/sof-test#297
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
kv2019i added a commit to kv2019i/sof-docs that referenced this issue Nov 15, 2021
If sof-logger is started (or restarted) while DSP is running, the
initial traces may be incomplete. Document the limitation and give a
brief explanation of the current ringbuffer design and how it affects
the start-up behaviour.

BugLink: thesofproject/sof-test#297
BugLink: thesofproject/linux#3275
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
lgirdwood pushed a commit to thesofproject/sof-docs that referenced this issue Nov 15, 2021
If sof-logger is started (or restarted) while DSP is running, the
initial traces may be incomplete. Document the limitation and give a
brief explanation of the current ringbuffer design and how it affects
the start-up behaviour.

BugLink: thesofproject/sof-test#297
BugLink: thesofproject/linux#3275
Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@mengdonglin
Copy link
Contributor

mengdonglin commented Mar 20, 2022

Low priority to P2 as SOF is going to use native Zephyr log implementation. @marc-hb @plbossart

@mengdonglin mengdonglin added P2 Critical bugs or normal features and removed P1 Blocker bugs or important features labels Mar 20, 2022
@marc-hb
Copy link
Collaborator

marc-hb commented Mar 29, 2022

I don't understand, Zephyr bugs can be solved without any logs?

@marc-hb marc-hb added P1 Blocker bugs or important features and removed P2 Critical bugs or normal features labels Mar 31, 2022
@marc-hb
Copy link
Collaborator

marc-hb commented Jul 26, 2022

... should finish this but it's currently blocking, triggering some failures.

@mengdonglin
Copy link
Contributor

mengdonglin commented Sep 21, 2022

Blocked by mtrace enabling atm.

@marc-hb
Copy link
Collaborator

marc-hb commented Sep 28, 2023

This has been finally fixed by @kv2019i:

And it does catch mtrace issues like thesofproject/linux#4618 and others.

The test works, closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Log and results collection, storage, etc. False Pass / green failure P1 Blocker bugs or important features state:blocked type:enhancement New framework feature or request
Projects
None yet
5 participants