Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

agent: trace drift message as warning, not error #2686

Merged
merged 1 commit into from
Apr 2, 2020

Conversation

mmaka1
Copy link

@mmaka1 mmaka1 commented Apr 1, 2020

Non-critical drifts should be traced as warnings (WARN,
yellow color), not errors since the system is expected
to recover in this scenario.

Signed-off-by: Marcin Maka marcin.maka@linux.intel.com

Non-critical drifts should be traced as warnings (WARN,
yellow color), not errors since the system is expected
to recover in this scenario.

Signed-off-by: Marcin Maka <marcin.maka@linux.intel.com>
@mmaka1 mmaka1 requested a review from libinyang as a code owner April 1, 2020 15:01
@mmaka1
Copy link
Author

mmaka1 commented Apr 1, 2020

Answers some of the concerns from #2678.

@tlauda tlauda self-requested a review April 1, 2020 15:03
Copy link
Collaborator

@paulstelian97 paulstelian97 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Didn't notice when trace_warn showed up but it is in fact a good idea to put it here. Looks good.

@lgirdwood
Copy link
Member

Fwiw, related to agent but not this PR on CFL (from PR CI)

jf-cfl-rvp-hda-1 kernel: [  680.915141] sof-audio-pci 0000:00:1f.3: In hda_link_pcm_trigger cmd=1
jf-cfl-rvp-hda-1 kernel: [  680.915146] sof-audio-pci 0000:00:1f.3: pcm: trigger stream 0 dir 0 cmd 1
jf-cfl-rvp-hda-1 kernel: [  680.915714] sof-audio-pci 0000:00:1f.3: FW Poll Status: reg=0x14001e successful
jf-cfl-rvp-hda-1 kernel: [  680.915716] sof-audio-pci 0000:00:1f.3: ipc tx: 0x60040000: GLB_STREAM_MSG: TRIG_START
jf-cfl-rvp-hda-1 kernel: [  680.924025] sof-audio-pci 0000:00:1f.3: ipc tx succeeded: 0x60040000: GLB_STREAM_MSG: TRIG_START
jf-cfl-rvp-hda-1 kernel: [  681.412014] sof-audio-pci 0000:00:1f.3: ipc rx: 0x90020000: GLB_TRACE_MSG
jf-cfl-rvp-hda-1 kernel: [  681.412046] sof-audio-pci 0000:00:1f.3: ipc rx done: 0x90020000: GLB_TRACE_MSG
jf-cfl-rvp-hda-1 kernel: [  681.911952] sof-audio-pci 0000:00:1f.3: ipc rx: 0x90020000: GLB_TRACE_MSG
jf-cfl-rvp-hda-1 kernel: [  681.911985] sof-audio-pci 0000:00:1f.3: ipc rx done: 0x90020000: GLB_TRACE_MSG
jf-cfl-rvp-hda-1 kernel: [  690.776478] sof-audio-pci 0000:00:1f.3: pcm: trigger stream 6 dir 1 cmd 0
jf-cfl-rvp-hda-1 kernel: [  690.776482] sof-audio-pci 0000:00:1f.3: ipc tx: 0x60050000: GLB_STREAM_MSG: TRIG_STOP
jf-cfl-rvp-hda-1 kernel: [  690.776537] sof-audio-pci 0000:00:1f.3: pcm: trigger stream 0 dir 1 cmd 0
jf-cfl-rvp-hda-1 kernel: [  690.776550] sof-audio-pci 0000:00:1f.3: pcm: trigger stream 7 dir 1 cmd 0
jf-cfl-rvp-hda-1 kernel: [  690.780795] sof-audio-pci 0000:00:1f.3: error : DSP panic!
jf-cfl-rvp-hda-1 kernel: [  690.780797] sof-audio-pci 0000:00:1f.3: panic: dsp_oops_offset 788480 offset 788480
jf-cfl-rvp-hda-1 kernel: [  690.780801] sof-audio-pci 0000:00:1f.3: status: fw entered - code 00000005
jf-cfl-rvp-hda-1 kernel: [  690.780940] sof-audio-pci 0000:00:1f.3: error: can't enter idle
jf-cfl-rvp-hda-1 kernel: [  690.780941] sof-audio-pci 0000:00:1f.3: error: trace point 00004000
jf-cfl-rvp-hda-1 kernel: [  690.780942] sof-audio-pci 0000:00:1f.3: error: panic at src/lib/agent.c:62
jf-cfl-rvp-hda-1 kernel: [  690.780943] sof-audio-pci 0000:00:1f.3: error: DSP Firmware Oops
jf-cfl-rvp-hda-1 kernel: [  690.780944] sof-audio-pci 0000:00:1f.3: EXCCAUSE 0x0000003f EXCVADDR 0x00000000 PS       0x00060d25 SAR     0x00000000
jf-cfl-rvp-hda-1 kernel: [  690.780945] sof-audio-pci 0000:00:1f.3: EPC1     0x00000000 EPC2     0xbe029791 EPC3     0x00000000 EPC4    0x00000000
jf-cfl-rvp-hda-1 kernel: [  690.780946] sof-audio-pci 0000:00:1f.3: EPC5     0xbe029791 EPC6     0x00000000 EPC7     0x00000000 DEPC    0x00000000
jf-cfl-rvp-hda-1 kernel: [  690.780947] sof-audio-pci 0000:00:1f.3: EPS2     0x00060520 EPS3     0x00000000 EPS4     0x00000000 EPS5    0x00060520
jf-cfl-rvp-hda-1 kernel: [  690.780948] sof-audio-pci 0000:00:1f.3: EPS6     0x00000000 EPS7     0x00000000 INTENABL 0x00000000 INTERRU 0x00010222
jf-cfl-rvp-hda-1 kernel: [  690.780949] sof-audio-pci 0000:00:1f.3: stack dump from 0xbe05c190
jf-cfl-rvp-hda-1 kernel: [  690.780950] sof-audio-pci 0000:00:1f.3: 0xbe05c190: be013c68 be05c1c0 be062180 00000001
jf-cfl-rvp-hda-1 kernel: [  690.780951] sof-audio-pci 0000:00:1f.3: 0xbe05c194: 00000000 00000000 0000003e be062180
jf-cfl-rvp-hda-1 kernel: [  690.780952] sof-audio-pci 0000:00:1f.3: 0xbe05c198: d4f8bf00 dd691a70 5734d828 ffff9bba
jf-cfl-rvp-hda-1 kernel: [  690.780953] sof-audio-pci 0000:00:1f.3: 0xbe05c19c: 5734d828 ffff9bba 0000007f 00000000
jf-cfl-rvp-hda-1 kernel: [  690.780954] sof-audio-pci 0000:00:1f.3: 0xbe05c1a0: a1ac09dc ffffffff 5a32ec00 ffff9bba
jf-cfl-rvp-hda-1 kernel: [  690.780955] sof-audio-pci 0000:00:1f.3: 0xbe05c1a4: d4f8bf00 dd691a70 40a01240 ffff9bba
jf-cfl-rvp-hda-1 kernel: [  690.780955] sof-audio-pci 0000:00:1f.3: 0xbe05c1a8: 40a01278 ffff9bba a1a88680 ffffffff
jf-cfl-rvp-hda-1 kernel: [  690.780957] sof-audio-pci 0000:00:1f.3: 0xbe05c1ac: 00000000 00000000 00000000 00000000

@lgirdwood lgirdwood merged commit 8d9c762 into thesofproject:master Apr 2, 2020
@mmaka1 mmaka1 deleted the sa-drift-warn branch April 2, 2020 18:23
@marc-hb
Copy link
Collaborator

marc-hb commented Mar 5, 2021

Non-critical drifts should be traced as warnings (WARN, yellow color), not errors since the system is expected to recover in this scenario.

@mmaka1 what is a "critical" drift and how is it traced?

See #3854

marc-hb added a commit to marc-hb/sof-test that referenced this pull request Apr 27, 2021
Probably the main change is fixing the huge etrace test gaps thesofproject#321 and
thesofproject/sof#3281

Also fixes DMA trace gaps thesofproject#297 and thesofproject#298

I initial tried to preserve some of the existing code but it was just
too bad. PR thesofproject#161 / commit 9136776 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in thesofproject#167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this pull request Apr 27, 2021
Probably the main change is fixing the huge etrace test gaps thesofproject#321 and
thesofproject/sof#3281

Also fixes DMA trace gaps thesofproject#297 and thesofproject#298

I initial tried to preserve some of the existing code but it was just
too bad. PR thesofproject#161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in thesofproject#167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this pull request Apr 27, 2021
Probably the main change is fixing the huge etrace test gaps thesofproject#321 and
thesofproject/sof#3281

Also fixes DMA trace gaps thesofproject#297 and thesofproject#298

I initial tried to preserve some of the existing code but it was just
too bad. PR thesofproject#161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in thesofproject#167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this pull request Apr 28, 2021
Probably the main change is fixing the huge etrace test gaps thesofproject#321 and
thesofproject/sof#3281

Also fixes DMA trace gaps thesofproject#297 and thesofproject#298

I initial tried to preserve some of the existing code but it was just
too bad. PR thesofproject#161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in thesofproject#167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to marc-hb/sof-test that referenced this pull request Jun 9, 2021
Probably the main change is fixing the huge etrace test gaps thesofproject#321 and
thesofproject/sof#3281

Also fixes DMA trace gaps thesofproject#297 and thesofproject#298

I initial tried to preserve some of the existing code but it was just
too bad. PR thesofproject#161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in thesofproject#167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
marc-hb added a commit to thesofproject/sof-test that referenced this pull request Jun 10, 2021
Probably the main change is fixing the huge etrace test gaps #321 and
thesofproject/sof#3281

Also fixes DMA trace gaps #297 and #298

I initial tried to preserve some of the existing code but it was just
too bad. PR #161 / commit 7274f49 seemed especially bad:

- It tried to ignore a specific `ll drift` error but instead it filtered
out almost every log statement out of... stderr, that does not have show
log statements!! (Just for the record this `ll drift` error has been
downgraded to warning now, see
thesofproject/sof#2686 and
thesofproject/sof#3854)

- That same commit also added code that merely starts the DMA trace with
"there is an error below" (without failing the test) but that's eclipsed
by the entire log that follows. Later, the firmware started printing
ERROR every single time when the ERROR FW ABI prefix was introduced yet
no one ever noticed which proves how useless this prefix is was.

So remove this DMA trace prefix as the purpose of this test is - as
clearly stated in #167 - not to find firmware errors but errors with the
sof-logger itself (even though we never had anything looking at firmware
errors so far)

Don't grep for "error" on stderr: anything on stderr is a logger
failure (not a firmware failure).

Don't require whitespace before the TIMESTAMP header.

Add set -e.

Use shell functions.

Signed-off-by: Marc Herbert <marc.herbert@intel.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants