Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Failure to start logger does not fail immediately, only at the end of the test #506

Closed
ranj063 opened this issue Nov 9, 2020 · 11 comments
Closed
Labels
area:logs Log and results collection, storage, etc. False Pass / green failure type:discussion Open ended discussion topic type:enhancement New framework feature or request

Comments

@ranj063
Copy link
Contributor

ranj063 commented Nov 9, 2020

If the logger fails to start for some reason, the test result is reported as a FAIL. In this case, the test should report the correct status and warn that logger output is unavailable. For example, I have an error with starting logger as follows:
"error: src hash value from version file (0x35cdfa4f) differ from src hash version saved in dictionary (0xfb141ec0)."

But the test, check-playback.sh, runs through all the loops and PCM devices and results in a FAIL at the end when the test itself was actually a PASS.

2020-11-09 21:21:23 UTC [INFO] ===== Testing: (Round: 1/1) (PCM: Speakers [hw:0,5]) (Loop: 3/3) =====
2020-11-09 21:21:23 UTC [COMMAND] aplay -Dhw:0,5 -r 48000 -c 2 -f S16_LE -d 10 /dev/zero -v -q
Hardware PCM card 0 'sof-cml_rt1011_rt5682' device 5 subdevice 0
Its setup is:
stream : PLAYBACK
access : RW_INTERLEAVED
format : S16_LE
subformat : STD
channels : 2
rate : 48000
exact rate : 48000 (48000/1)
msbits : 16
buffer_size : 16384
period_size : 4096
period_time : 85333
tstamp_mode : NONE
tstamp_type : MONOTONIC
period_step : 1
avail_min : 4096
period_event : 0
start_threshold : 16384
stop_threshold : 16384
silence_threshold: 0
silence_size : 0
boundary : 4611686018427387904
appl_ptr : 0
hw_ptr : 0
2020-11-09 21:21:33 UTC [INFO] pkill -TERM sof-logger
2020-11-09 21:21:33 UTC [ERROR] sof-logger was already dead
2020-11-09 21:21:34 UTC [INFO] nlines=1 /root/sof-test/logs/check-playback/2020-11-09-13:19:01-2579/slogger.txt
2020-11-09 21:21:34 UTC [INFO] Test Result: FAIL!

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 10, 2020

@ranj063 yes a logger failure is a test failure and should be reported as a FAIL. See previous discussions in #297 and links from there. @plbossart confirm?

Many tests including check-playback.sh have an option not to start the logger and then the test will just PASS. See check-playback.sh -h

Neither our test suite, nor Jenkins, nor github have any concept of any intermediate "warning" level, it's either green or red. Nothing can do yellow.

The logger is started in the background that's why the logger failure is collected only at the end of the test. We could try to detect a logger failure earlier but I think we have much more confusing situations to enhance and higher priorities.

"error: src hash value from version file (0x35cdfa4f) differ from src hash version saved in dictionary (0xfb141ec0)."

@ktrzcinx doesn't think using a firmware without the corresponding .ldc file makes sense:
thesofproject/sof#3195 (comment)

Nevertheless, he generously added an option to ignore the checksum mismatch, see sof-logger -h

@ranj063
Copy link
Contributor Author

ranj063 commented Nov 10, 2020

@marc-hb the problem is with the consistency and the check for logger return value in case of errors.

if not starting sof-logger is an error, why run the test and then report the failure? See my case above, the test loops 3 all the PCM devces and iterations before saying the test failed because the logger never started.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 10, 2020

if not starting sof-logger is an error, why run the test and then report the failure?

Because the logger is started in the background, see my answer above. We could try to detect this earlier but.... see above.

@ranj063
Copy link
Contributor Author

ranj063 commented Nov 10, 2020

if not starting sof-logger is an error, why run the test and then report the failure?

Because the logger is started in the background, see my answer above. We could try to detect this earlier but.... see above.

sure, there are other higher priority issues but if it is an error there is no need to run the test to report the error.

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 10, 2020

but if it is an error there is no need to run the test to report the error.

The main test process does not even know when a backgrounded process fails, not before it wait for its completion. Let's try this analogy in case it helps: you can not know the return value at pthread_create() time, you can tell only at pthread_join() / wait time.

The main process cannot wait for the completion of the logger before the end of the test because it does not want the logger to complete before the end of the test, that would defeat the purpose of the logger.

The main process could find creative ways to check from time to time how the logger is doing but that would be a fair amount of code for not much value.

@ranj063
Copy link
Contributor Author

ranj063 commented Nov 10, 2020

The main process could find creative ways to check from time to time how the logger is doing but that would be a fair amount of code for not much value.

One simple creative way is to check for pre-requisites such as "is sof-logger functional?" before actually starting it in the background?

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 10, 2020

One simple creative way is to check for pre-requisites such as "is sof-logger functional?" before actually starting it in the background?

There are already a couple sanity checks in case-lib/lib.sh but nothing that compares the dictionary source hash yet. If you can find a short and simple way to perform this pre-check and add it there then why not.

I don't know your exact test configuration/duration but I feel like the time we spent discussing this source hash check is already bigger than the time it would save in the future. So you can guess my perspective on the time it would take to implement and test it :-) Granted: it could possibly catch other types of logger failures too...

PS: when I wrote "creative" what I had in mind was something else: "racy because a post-check after starting the logger".

@aiChaoSONG
Copy link

@ranj063 sof-logger was already dead this line means that when we are checking if sof-logger is alive, it is dead, or it is not started at all.

  • if it is already dead: means we do have issue with FW, because sof-logger is started in trace mode, it should be alive unless we kill it.
  • it is not started: you don't have sof-logger in your PATH, so it cannot started by sof-test, because test case cannot find your sof-logger.

I guess this issue should be in category 2, please run env-check.sh and see if there is any warnings

@marc-hb
Copy link
Collaborator

marc-hb commented Nov 11, 2020

@aiChaoSONG if you read the entire description more slowly you will see that in this particular case we know exactly why the logger failed.

@marc-hb marc-hb changed the title Failure to start logger is not a failed test result Failure to start logger does not fail immediately, only at the end of the test Dec 4, 2020
@marc-hb
Copy link
Collaborator

marc-hb commented Dec 4, 2020

I changed the issue title in an attempt to reflect the evolution of this issue. Please correct me.

@xiulipan xiulipan added type:discussion Open ended discussion topic type:enhancement New framework feature or request labels Jan 26, 2021
@marc-hb
Copy link
Collaborator

marc-hb commented Feb 3, 2021

@ranj063 does this still matter to you? I'm afraid we have many more urgent things to tackle.

@marc-hb marc-hb closed this as completed Mar 5, 2021
@marc-hb marc-hb added False Pass / green failure area:logs Log and results collection, storage, etc. labels Jul 3, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:logs Log and results collection, storage, etc. False Pass / green failure type:discussion Open ended discussion topic type:enhancement New framework feature or request
Projects
None yet
Development

No branches or pull requests

4 participants