Failure to start logger does not fail immediately, only at the end of the test #506

ranj063 · 2020-11-09T21:25:45Z

If the logger fails to start for some reason, the test result is reported as a FAIL. In this case, the test should report the correct status and warn that logger output is unavailable. For example, I have an error with starting logger as follows:
"error: src hash value from version file (0x35cdfa4f) differ from src hash version saved in dictionary (0xfb141ec0)."

But the test, check-playback.sh, runs through all the loops and PCM devices and results in a FAIL at the end when the test itself was actually a PASS.

2020-11-09 21:21:23 UTC [INFO] ===== Testing: (Round: 1/1) (PCM: Speakers [hw:0,5]) (Loop: 3/3) =====
2020-11-09 21:21:23 UTC [COMMAND] aplay -Dhw:0,5 -r 48000 -c 2 -f S16_LE -d 10 /dev/zero -v -q
Hardware PCM card 0 'sof-cml_rt1011_rt5682' device 5 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 16384
period_size : 4096
period_time : 85333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 4096
period_event : 0
start_threshold : 16384
stop_threshold : 16384
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
2020-11-09 21:21:33 UTC [INFO] pkill -TERM sof-logger
2020-11-09 21:21:33 UTC [ERROR] sof-logger was already dead
2020-11-09 21:21:34 UTC [INFO] nlines=1 /root/sof-test/logs/check-playback/2020-11-09-13:19:01-2579/slogger.txt
2020-11-09 21:21:34 UTC [INFO] Test Result: FAIL!

marc-hb · 2020-11-10T01:26:44Z

@ranj063 yes a logger failure is a test failure and should be reported as a FAIL. See previous discussions in #297 and links from there. @plbossart confirm?

Many tests including check-playback.sh have an option not to start the logger and then the test will just PASS. See check-playback.sh -h

Neither our test suite, nor Jenkins, nor github have any concept of any intermediate "warning" level, it's either green or red. Nothing can do yellow.

The logger is started in the background that's why the logger failure is collected only at the end of the test. We could try to detect a logger failure earlier but I think we have much more confusing situations to enhance and higher priorities.

"error: src hash value from version file (0x35cdfa4f) differ from src hash version saved in dictionary (0xfb141ec0)."

@ktrzcinx doesn't think using a firmware without the corresponding .ldc file makes sense:
thesofproject/sof#3195 (comment)

Nevertheless, he generously added an option to ignore the checksum mismatch, see sof-logger -h

ranj063 · 2020-11-10T01:30:07Z

@marc-hb the problem is with the consistency and the check for logger return value in case of errors.

if not starting sof-logger is an error, why run the test and then report the failure? See my case above, the test loops 3 all the PCM devces and iterations before saying the test failed because the logger never started.

marc-hb · 2020-11-10T02:14:25Z

if not starting sof-logger is an error, why run the test and then report the failure?

Because the logger is started in the background, see my answer above. We could try to detect this earlier but.... see above.

ranj063 · 2020-11-10T05:07:27Z

if not starting sof-logger is an error, why run the test and then report the failure?

Because the logger is started in the background, see my answer above. We could try to detect this earlier but.... see above.

sure, there are other higher priority issues but if it is an error there is no need to run the test to report the error.

marc-hb · 2020-11-10T06:57:51Z

but if it is an error there is no need to run the test to report the error.

The main test process does not even know when a backgrounded process fails, not before it wait for its completion. Let's try this analogy in case it helps: you can not know the return value at pthread_create() time, you can tell only at pthread_join() / wait time.

The main process cannot wait for the completion of the logger before the end of the test because it does not want the logger to complete before the end of the test, that would defeat the purpose of the logger.

The main process could find creative ways to check from time to time how the logger is doing but that would be a fair amount of code for not much value.

ranj063 · 2020-11-10T17:39:56Z

The main process could find creative ways to check from time to time how the logger is doing but that would be a fair amount of code for not much value.

One simple creative way is to check for pre-requisites such as "is sof-logger functional?" before actually starting it in the background?

marc-hb · 2020-11-10T18:30:35Z

One simple creative way is to check for pre-requisites such as "is sof-logger functional?" before actually starting it in the background?

There are already a couple sanity checks in case-lib/lib.sh but nothing that compares the dictionary source hash yet. If you can find a short and simple way to perform this pre-check and add it there then why not.

I don't know your exact test configuration/duration but I feel like the time we spent discussing this source hash check is already bigger than the time it would save in the future. So you can guess my perspective on the time it would take to implement and test it :-) Granted: it could possibly catch other types of logger failures too...

PS: when I wrote "creative" what I had in mind was something else: "racy because a post-check after starting the logger".

aiChaoSONG · 2020-11-11T01:58:05Z

@ranj063 sof-logger was already dead this line means that when we are checking if sof-logger is alive, it is dead, or it is not started at all.

if it is already dead: means we do have issue with FW, because sof-logger is started in trace mode, it should be alive unless we kill it.
it is not started: you don't have sof-logger in your PATH, so it cannot started by sof-test, because test case cannot find your sof-logger.

I guess this issue should be in category 2, please run env-check.sh and see if there is any warnings

marc-hb · 2020-11-11T05:22:52Z

@aiChaoSONG if you read the entire description more slowly you will see that in this particular case we know exactly why the logger failed.

marc-hb · 2020-12-04T18:32:02Z

I changed the issue title in an attempt to reflect the evolution of this issue. Please correct me.

marc-hb · 2021-02-03T02:31:13Z

@ranj063 does this still matter to you? I'm afraid we have many more urgent things to tackle.

marc-hb changed the title ~~Failure to start logger is not a failed test result~~ Failure to start logger does not fail immediately, only at the end of the test Dec 4, 2020

xiulipan added type:discussion Open ended discussion topic type:enhancement New framework feature or request labels Jan 26, 2021

marc-hb closed this as completed Mar 5, 2021

marc-hb added False Pass / green failure area:logs Log and results collection, storage, etc. labels Jul 3, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failure to start logger does not fail immediately, only at the end of the test #506

Failure to start logger does not fail immediately, only at the end of the test #506

ranj063 commented Nov 9, 2020

marc-hb commented Nov 10, 2020

ranj063 commented Nov 10, 2020

marc-hb commented Nov 10, 2020

ranj063 commented Nov 10, 2020

marc-hb commented Nov 10, 2020

ranj063 commented Nov 10, 2020

marc-hb commented Nov 10, 2020

aiChaoSONG commented Nov 11, 2020

marc-hb commented Nov 11, 2020

marc-hb commented Dec 4, 2020

marc-hb commented Feb 3, 2021

Failure to start logger does not fail immediately, only at the end of the test #506

Failure to start logger does not fail immediately, only at the end of the test #506

Comments

ranj063 commented Nov 9, 2020

marc-hb commented Nov 10, 2020

ranj063 commented Nov 10, 2020

marc-hb commented Nov 10, 2020

ranj063 commented Nov 10, 2020

marc-hb commented Nov 10, 2020

ranj063 commented Nov 10, 2020

marc-hb commented Nov 10, 2020

aiChaoSONG commented Nov 11, 2020

marc-hb commented Nov 11, 2020

marc-hb commented Dec 4, 2020

marc-hb commented Feb 3, 2021