-
Notifications
You must be signed in to change notification settings - Fork 49
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable cavstool #897
Enable cavstool #897
Conversation
test-case/check-sof-logger.sh
Outdated
# several other places. | ||
stdout_files=(data) | ||
stderr_files=(error) | ||
dlogi "Stop using sof-logger on Zephyr platforms." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cavstool.py
has already been working at the same time than sof-logger for months. Let's start by using both first.
Change line 96 and simply call cavstool.py
on the etrace.
case-lib/lib.sh
Outdated
[[ "${OPT_VAL['z']}" -eq 0 ]] && return 0 | ||
|
||
[ "x$CAVSTOOL" == 'xnone' ] && { | ||
dlogi 'CAVSTOOL logs collection globally disabled by CAVSTOOL=none' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Stick to SOF_LOGGING, no need for yet another parameter.
case-lib/lib.sh
Outdated
|
||
test -x "$CAVSTOOL" || { | ||
dlogw "$CAVSTOOL not found or not executable" | ||
return 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
2 means SKIP. Did you test this?
case-lib/lib.sh
Outdated
ZPH_LOG_COLLECT=1 | ||
else | ||
>&2 dlogw "without sudo permission to run $CAVSTOOL command" | ||
return 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're returning 3 different error codes but the callers never look at them. I don't think you ever tested all this error handling code; what problem does it solve? sudo, the shell and other system utilities already provide error messages for free, what was wrong with them?
Every line of code you submit should have been run at least once. Write the code you actually need, not the code you think you need.
test-case/check-alsabat.sh
Outdated
@@ -42,6 +42,9 @@ OPT_HAS_ARG['n']=1 OPT_VAL['n']=240000 | |||
OPT_NAME['s']='sof-logger' OPT_DESC['s']="Open sof-logger trace the data will store at $LOG_ROOT" | |||
OPT_HAS_ARG['s']=0 OPT_VAL['s']=1 | |||
|
|||
OPT_NAME['z']='cavstool' OPT_DESC['z']="Zephyr native logging tool, the data will store at $LOG_ROOT" | |||
OPT_HAS_ARG['z']=0 OPT_VAL['z']=1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need to make this configurable, do we have multiple names to support? Don't submit code that we don't need yet.
test-case/check-alsabat.sh
Outdated
if is_zephyr; then | ||
cavstool_disabled || func_cavstool_log_collect | ||
else | ||
logger_disabled || func_lib_start_log_collect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_zephyr
must be kept _inside the logger_disabled
and func_lib_start_log_collect
abstractions to avoid so much copy/paste. Every time you do some repetitive copy/paste/diverge, an alarm should ring.
Don't pollute every single test with logging implementation details.
83d9a71
to
848b608
Compare
@marc-hb @aiChaoSONG |
This is becoming required because Zephyr logging is adding another layer of complexity, See for instance thesofproject#897 work in progress. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I forgot how difficult this code was to understand. I'm adding some comments in #906, please help review.
This looks a good first step, thanks!
Actually, I think this is a good SECOND step: could you try to add the same feature to check-sof-logger.sh first in a separate PR? So if we find issues with cavstool.py then it will impact only the check-sof-logger.sh test whereas this PR is affecting all tests. If the new check-sof-logger.sh works fine for a couple days then we can merge this.
case-lib/hijack.sh
Outdated
dloge "cavstool.py was already dead" | ||
exit_status=1 | ||
} | ||
fi |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This file does not use TABS, that's why the indentation looks wrong here on Github.
I'm really sorry that we use TABS in some projects and no TABS in other projects like sof-test. This means we must use editors that can quickly change from TABS mode to no TABS mode.
For Emacs I use this:
(defun tabs-toggle-myfunc()
(interactive)
(setq indent-tabs-mode (not indent-tabs-mode)))
case-lib/hijack.sh
Outdated
fi | ||
for logtool in "$cavstoolBin" "$loggerBin" | ||
do | ||
if pgrep -f -n "$logtool"; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why add -n
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For example, when running cavstool.py, it will trigger 2 processes in process list.
root 16635 0.0 0.0 13856 4748 pts/0 S+ 08:34 0:00 sudo cavstool.py -l
root 16636 0.4 0.2 28744 16708 pts/0 S+ 08:34 0:01 python3 /usr/local/bin/cavstool.py -l
With -f
option, it will list all 2 processes
pgrep -f cavstool.py
16635
16636
But it only list the child process with -n
pgrep -f -n cavstool.py
16636
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to be more selective then relying on the "newest" feels a bit brittle...
But why be more selective at all here? Both processes must be dead at this point in time. If any is left then something is wrong and the test should fail.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That makes sense, remove -n
.
case-lib/lib.sh
Outdated
@@ -175,8 +198,14 @@ func_lib_start_log_collect() | |||
if [ "X$is_etrace" == "X0" ];then | |||
logfile=$LOG_ROOT/slogger.txt | |||
else | |||
logfile=$LOG_ROOT/etrace.txt | |||
logopt="" | |||
if is_zephyr; then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same TABS problem.
This is becoming required because Zephyr logging is adding another layer of complexity, See for instance #897 work in progress. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb , sorry for late reply. I tried to enable the cavstool.py in check-sof-logger.sh to dump the etrace on zephyr platforms, but I have 2 issues to solve.
If only redirect stderr, the output file is not empty:
@aiChaoSONG , can we change the error level for above logs?
|
These I think these logs are from zephyr system, not the ones we can control |
I just found by chance that |
848b608
to
db0ca17
Compare
@marc-hb , any comments on this? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just found by chance that cavstool.py disables power management (forcing /sys/devices/pci0000:00/0000:00:0e.0/power/control). So we can't use it unconditionally yet. I'm looking into that right now.
Have you looked at this? This is very important. I was busy with the Zephyr conference, will look at this next.
case-lib/lib.sh
Outdated
|
||
logfile=$LOG_ROOT/cavstool.txt | ||
|
||
local cavstoolCmd=("$CAVSTOOL" "$logopt" '&>' "$logfile") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For here avoid this array and quoting complexity thanks to sudo tee
:
sudo cavstool arguments |& sudo tee "$logfile" > /dev/null
I will see what can be done for the check-sof-logger
test, thanks for highlighting this issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sort of "fixed" for check-sof-logger in #929
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Blocking this until cavstool is compatible with D3. Working on it.
@marc-hb please share more information about this, I tried cavstool on APL and TGL UP extreme board. Power control always on when running cavstool.py. |
Exactly. |
cavstool.py has unrelated stability issues slowing or blocking progress: zephyrproject-rtos/zephyr#46864 There are tentative fixes. |
cavstool support for D3 submitted in: |
The immediate next step will be to use it in every test and collect Zephyr logs, see PR thesofproject#897 Signed-off-by: Marc Herbert <marc.herbert@intel.com>
fd8d0b7
to
ac0421e
Compare
Except for the TIMEOUT of suspend/resume on WHL, all other failures are related to the SOF issue thesofproject/sof#6020. Also thanks to thesofproject/sof#6020, it helped to prove that this PR really works. |
The immediate next step will be to use it in every test and collect Zephyr logs, see PR #897 Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb , the PR testing results are not good on Zephyr platforms (EDIT: https://sof-ci.01.org/softestpr/PR897/build103/devicetest/). It seems that starting BOTH sof-logger DMA trace AND cavstool for the whole test duration will introduce new IPC errors/timeouts. I've confirmed that |
Wow, fascinating. So let's do this: please file a new bug, something "Long-running cavstool cause random failures". Then insert the bug number at the FIXME location below and let's merge this sof-test PR with the new feature commented out for now. This will help anyone trying to fix this new bug: now they will have a single line to change in sof-test. |
case-lib/hijack.sh
Outdated
@@ -54,9 +54,18 @@ function func_exit_handler() | |||
# shellcheck disable=SC2154 | |||
logfile="$logfile" | |||
|
|||
local loggerBin wcLog; loggerBin=$(basename "$SOFLOGGER") | |||
# We need this to avoid the confusion of a "Terminated" message | |||
local loggerBin wcLog; loggerBin=$(basename "$SOFLOGGER"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Spurious edit
local loggerBin wcLog; loggerBin=$(basename "$SOFLOGGER"); | |
local loggerBin wcLog; loggerBin=$(basename "$SOFLOGGER") |
case-lib/lib.sh
Outdated
@@ -293,6 +293,29 @@ find_ldc_file() | |||
printf '%s' "$ldcFile" | |||
} | |||
|
|||
func_cavstool_etrace_collect() | |||
{ | |||
local logopt="-l" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ping
case-lib/lib.sh
Outdated
# start cavstool in the background to collect the etrace. | ||
# Since cavstool can follow a ring buffer in etrace, so it should be | ||
# started at the start of the test and we would not miss any Zephyr logs. | ||
is_zephyr && func_cavstool_etrace_collect |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is_zephyr && func_cavstool_etrace_collect | |
# FIXME (new) sof bug #1234 | |
# is_zephyr && func_cavstool_etrace_collect |
case-lib/lib.sh
Outdated
local cavstoolCmd=("$CAVSTOOL" "$logopt") | ||
dlogi "Starting ${cavstoolCmd[*]}" | ||
# Cleaned up by func_exit_handler() in hijack.sh | ||
sudo "${cavstoolCmd[@]}" > "$clogfile" & |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sudo "${cavstoolCmd[@]}" > "$clogfile" & | |
# shellcheck disable=SC2024 | |
sudo "${cavstoolCmd[@]}" >& "$clogfile" & |
sudo "${cavstoolCmd[@]}" > "$clogfile" &
^-- SC2024: sudo doesn't affect redirects. Use ..| sudo tee file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You must use >&
because we really want BOTH
case-lib/lib.sh
Outdated
func_cavstool_etrace_collect() | ||
{ | ||
local logopt="-l" | ||
local clogfile=$LOG_ROOT/cavstool.txt |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local clogfile=$LOG_ROOT/cavstool.txt | |
local clogfile=$LOG_ROOT/etrace.txt |
Please keep the same name for consistency. I'm working on another PR that scans for ERRORs.
https://sof-ci.01.org/softestpr/PR897/build107/devicetest had a lot of thesofproject/sof#6039 failures Looks like adding cavstool to the sof-logger test ONLY is already causing some trouble: thesofproject/sof#6042 |
cavstool.py is a Zephyr native logging tool, this tool will be used to dump the etrace from the shared memory. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
we will use the Zephyr logging tool to dump the etrace form the shared memory for all Zephyr platforms. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
we need to kill the cavstool.py process after the test to prevent the conflicts between each case. Signed-off-by: Keqiao Zhang <keqiao.zhang@intel.com>
17b9ec7
to
d11a6b5
Compare
I have filed a SOF issue:thesofproject/sof#6039 Since the latest cavstool.py supports verbose log, so I enabled it accordingly |
} | ||
fi | ||
|
||
local cavstoolCmd=("$CAVSTOOL" "$clogopt") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
local cavstoolCmd=("$CAVSTOOL" "$clogopt") | |
local cavstoolCmd=("$CAVSTOOL" --log-only --verbose) |
Otherwise this tries to load a firmware file named '--log-only --verbose'
I just wasted a couple of hours on this... already asked to inline this pointless variable before (when it was a single option and working)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK, I realize only I see this because I disable the sof-test sudo
function, I'm using the regular sudo
command.
Should still be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please help review and merge this sudo()
fix first:
SOFCI TEST EDIT: newer https://sof-ci.01.org/softestpr/PR897/build108/devicetest/ seems to still fail the same |
Overriden commands should work exactly like real commands and support whitespace and special characters exactly like the real command. I usually disable the sudo() override locally when using `set -x` to reduce trace noise considerably. The previous difference between the sudo() override and the built-in sudo() cost me couple hours twice: - WIP thesofproject#897 (comment) - Merged commit 232d40a No more! Also fix the return value: 2 means SKIP. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Overriden commands should work exactly like real commands and support whitespace and special characters exactly like the real command. I usually disable the sudo() override locally when using `set -x` to reduce trace noise considerably. The previous difference between the sudo() override and the built-in sudo() cost me couple hours twice: - WIP thesofproject#897 (comment) - Merged commit 232d40a No more! Also fix the return value: 2 means SKIP. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Overriden commands should work exactly like real commands and support whitespace and special characters exactly like the real command. I usually disable the sudo() override locally when using `set -x` to reduce trace noise considerably. The previous difference between the sudo() override and the built-in sudo() cost me couple hours twice: - WIP #897 (comment) - Merged commit 232d40a No more! Also fix the return value: 2 means SKIP. Signed-off-by: Marc Herbert <marc.herbert@intel.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just realized the $logfile is a complete mess that must be fixed before this PR in a different and earlier PR
# it again. We only need to start sof-logger to collect the etrace logs | ||
# only for the non-zephyr platforms at the end of a test. | ||
is_zephyr && return 0 | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The $logfile
variable in this function is a global variable passed to lib.sh and I just realized it is a complete mess that must be fixed before this PR
I'm sorry for this wrong advice. @kv2019i explained me (and you?) that cavstool is "spamming" the SRAM which can have bad interactions with the firmware - as you found in test results above and filed in thesofproject/sof#6039. So we're back to having just a quick cavstool look at the end of the test, really sorry for the back and forth. |
Excuse me, @marc-hb , is this caused by the modified cavstool.py (client-server-base)? I am working on another PR to split the pure cavstool.py (original one) and the client-server logic. Does this help? Thanks. |
No, it's a warning that any castool.py is "spamming" the SRAM which is causing issues. |
It should only be polling 4 words every 100ms. Obviously if there's a ton of data to be copied out, that's going to have impact. But it's intended to be a reasonably good citizen. Are there existing interactions it's causing? We should look at those, I suspect. |
@andyross wrote:
Ack, thanks Andy. I actually assumed it was polling more frequently (and gave the above feedback). So yeah, this seems more like a bug (and probably not in cavstool.py). Given the problems only occur in large test campaigns, this is not so easy to debug. We do have other logging backends coming available very soon (zephyrproject-rtos/zephyr#47058 and thesofproject/sof#6072 ), I'd like to focus energy on getting these working first. Both mtrace (SRAM) and probes (DMA/DDR) backends provide capability to notify the host (with a IPC/DMA irq) when new data is available, eliminating the need for any polling of data by host. Given the possible people who could debug the rare failures with cavstool.py with SOF test campaigns are the same people working on the new backends, I'd prefer to complete the new backends first and prioritize enabling these in the SOF CI. |
So may be the bug WAS in cavstool.py after all? Just fixed by @ujfalusi ? |
Closed in the weekly CI meeting because IPC3+Zephyr will never be supported in production. Zephyr will be IPC4 only which means it will always have mtrace support. |
No description provided.