Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

@marc-hb sof-kernel-log-check: narrower ignore_str for ICL reboot issue #372

Merged
merged 2 commits into from
Sep 11, 2020

Conversation

marc-hb
Copy link
Collaborator

@marc-hb marc-hb commented Sep 10, 2020

2 commits. The main one:

Fixes ac415de ("tools: ignore a false error message with 'panic'")
which was ignoring way too many errors. See previous PR and long source
code comment.

Generally speaking, ignoring errors ("green failures") is extremely
dangerous and should be as narrow as possible.

More specifically here, not even knowing which platforms experience the
issue and what code they print is is really not going to help fix it.

So we can start ignoring known issues on a per platform basis.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@marc-hb
Copy link
Collaborator Author

marc-hb commented Sep 10, 2020

This PR doesn't not add any shellcheck warning in https://travis-ci.org/github/thesofproject/sof-test/builds/725765539, they were all already there.

I didn't to stress test ICL to test this change, #361 was enough. Please help review #361 too (merged, thx @xiulipan )

@marc-hb marc-hb marked this pull request as ready for review September 10, 2020 01:19
@marc-hb marc-hb requested a review from a team as a code owner September 10, 2020 01:19
@xiulipan
Copy link
Contributor

@marc-hb I will close this PR. The issue can be seen on any device, but the reproduce rate on ICL is very high. So we should ignore the info for all platforms.

@xiulipan xiulipan closed this Sep 10, 2020
@marc-hb
Copy link
Collaborator Author

marc-hb commented Sep 10, 2020

@marc-hb I will close this PR.

You merged #365 before all discussions were finished and now you're closing this one before any discussion has started.

but the reproduce rate on ICL is very high. So we should ignore the info for all platforms.

I can't see the how the second sentence relates to the first one.

("very high" doesn't mean much without some numbers thesofproject/sof#3395 has some)

@marc-hb marc-hb reopened this Sep 10, 2020
@xiulipan
Copy link
Contributor

#365 is only a workaround. I did not think more discussion is needed for that. I think we have plan to use log level to filter the warning and error message to avoid keyword detection.

As the panic log is not platform specific one, I think we should ignore this info message for all platforms.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Sep 10, 2020

As the panic log is not platform specific one, I think we should ignore this info message for all platforms.

I've never seen it outside ICL across many PRs and the description of thesofproject/sof#3395 is very specifically about ICL.

If/when it happens elsewhere then we want to know exactly when and how and not ignore failures "just in case they might happen". This code makes it trivial to add other platforms later, it's designed for that.

@aiChaoSONG
Copy link

@marc-hb It also happened to CML_HEL_RT5682, check the failure on CML_HEL_RT5682 in https://sof-ci.sh.intel.com/#/result/planresultdetail/319

@marc-hb
Copy link
Collaborator Author

marc-hb commented Sep 10, 2020

@marc-hb It also happened to CML_HEL_RT5682, check the failure on CML_HEL_RT5682 in https://sof-ci.sh.intel.com/#/result/planresultdetail/319

Thanks @aiChaoSONG , I just added cml. It took 4 additional characters.

I also updated thesofproject/sof#3395

@xiulipan
Copy link
Contributor

@marc-hb As the panic info may show on any platforms, why we need to narrower the ignore list? I do not want to add a new platform name here once we see it on another platform.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Sep 10, 2020

I do not want to add a new platform name here once we see it on another platform.

People interested in finding and fixing issues are interested in when, where, how and how often they happened. They don't mind adding 4 characters in one file for that sort of information, that's a small price to pay.

@marc-hb
Copy link
Collaborator Author

marc-hb commented Sep 10, 2020

People interested in finding and fixing issues are interested in when, where, how and how often they happened. They don't mind adding 4 characters in one file for that sort of information, that's a small price to pay.

Anyway this is not even the main point of this PR. The main purpose here is not to miss OTHER issues unrelated to thesofproject/sof#3395

ranj063
ranj063 previously approved these changes Sep 10, 2020
Copy link
Contributor

@ranj063 ranj063 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks good. the commit title only talks of ICL when the patch is for ICL and CML. only minor though

ignore_str="$ignore_str"'|sof-audio-pci 0000:[0-9a-f]{2}:[0-9a-f]{2}\.[0-9a-f]: status = 0x[0-9a-f]{8} panic = 0x[0-9a-f]{8}'
case "$platform" in
icl|cml)
ignore_str="$ignore_str"'|sof-audio-pci 0000:00:1f.3: status = 0x[0]{8} panic = 0x[0]{8}'
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't you need a \ before the .?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It won't make any difference in this very particular case because the kernel will always print the exact same character in this position. However it is good reminder for other cases when matching an actual dot matters so I'm adding it.

Fixes ac415de ("tools: ignore a false error message with 'panic'")
which was ignoring way too many errors. See previous discussion in
corresponding PR and long source code comment.

Generally speaking, ignoring errors ("green failures") is extremely
dangerous and should be as narrow as possible.

More specifically here, not even knowing which platforms experience the
issue and what code they print is is really not going to help fix it.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
@aiChaoSONG
Copy link

Merge this for sof-audio-pci 0000:00:1f.3: status = 0x00000000 panic = 0x00000000 is only seen on ICL and CML. If we see this on more platforms, we can add later.

@aiChaoSONG aiChaoSONG merged commit e8d7b3f into thesofproject:master Sep 11, 2020
@marc-hb marc-hb deleted the narrower-panic-ignore branch September 11, 2020 03:37
@marc-hb marc-hb added the area:logs Log and results collection, storage, etc. label Jul 3, 2021
@marc-hb marc-hb added False Pass / green failure and removed area:logs Log and results collection, storage, etc. labels Jul 3, 2021
@marc-hb marc-hb added the area:non-audio Failure False positives: failing when we don't want to label Oct 15, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:non-audio Failure False positives: failing when we don't want to False Pass / green failure
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants