Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

west.yml: update Zephyr to e97d33d0c896 #9119

Merged
merged 1 commit into from
May 14, 2024

Conversation

kv2019i
Copy link
Collaborator

@kv2019i kv2019i commented May 13, 2024

Fast-forward Zephyr by around 350 patches, including following affecting SOF build targets:

soc/intel_adsp: ipc: initialize semaphore in driver init
drivers: dai: ssp: fix MN_MDIVCTRL_M_DIV_ENABLE for ACE+ platform
drivers: dai: intel: ssp: Only setup mclk/bclk when it is needed by role

@kv2019i
Copy link
Collaborator Author

kv2019i commented May 13, 2024

The internal Intel CI test hit a bug on SrcLite test on MTL with following log (via https://sof-ci.01.org/sof-pr-viewer/#/build/PR9119/build13962171 )

[    0.000000] <inf> init: print_version_banner: FW ABI 0x301d000 DBG ABI 0x5003000 tags SOF:v2.5-stable-branch-2376-g174d0cff4c63 zephyr:v3.6.0-3805-gb501278237bf src hash 0xd6863264 (ref hash 0xd6863264)
[    0.000000] <inf> clock: clock_set_freq: clock 0 set freq 20000000Hz freq_idx 0
[    0.000000] <inf> clock: clock_set_freq: clock 1 set freq 20000000Hz freq_idx 0
[    0.000000] <inf> clock: clock_set_freq: clock 2 set freq 20000000Hz freq_idx 0
*** Booting Zephyr OS build v3.6.0-3805-gb501278237bf ***
[    0.000000] <inf> ipc: telemetry_init: Telemetry enabled. May affect performance
[    0.000000] <inf> main: sof_app_main: SOF on intel_adsp
[    0.000000] <inf> main: sof_app_main: SOF initialized
[    0.000008] <inf> ipc: ipc_cmd: rx	: 0x43000000|0x30701000
[    0.000010] <inf> ipc: ipc_cmd: rx	: 0x43000000|0x30801000
[    0.000010] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x3070000c
[    0.000010] <wrn> basefw_intel: basefw_set_fw_config: returning success for Set FW_CONFIG without handling it
[    0.000010] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x31400008
[    0.001903] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x30600064
[    0.007090] <inf> ipc: ipc_cmd: rx	: 0x44000000|0x3070000c
[    0.007188] <wrn> basefw_intel: basefw_set_fw_config: returning success for Set FW_CONFIG without handling it
[    0.032851] <inf> ipc: ipc_cmd: rx	: 0x11000009|0x0
[    0.032891] <inf> pipe: pipeline_new: pipeline new pipe_id 0 priority 0
[    0.035505] <inf> ipc: ipc_cmd: rx	: 0x40000004|0x15
[    0.035841] <inf> dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0
[    0.037945] <inf> ipc: ipc_cmd: rx	: 0x40000019|0x1000000b
[    0.040640] <inf> ipc: ipc_cmd: rx	: 0x40010004|0x15
[    0.040955] <inf> dma: dma_get: dma_get() ID 0 sref = 1 busy channels 0
[    0.043401] <inf> ipc: ipc_cmd: rx	: 0x45000004|0x19
[    0.043465] <inf> ipc: buffer_new: buffer new size 0xb40 id 0.0 flags 0x0
[    0.043601] <inf> dp_queue: dp_queue_create: DpQueue created, id: 0 shared: 0 min_available: 1440 min_free_space 1440, size 2880
[    0.043685] <inf> buffer: buffer_create_shadow_dp_queue: buf:0 0x0 dp_queue attached to buffer as a shadow, at_input: 0
[    0.043715] <inf> pipe: pipeline_connect: comp:0 0x4 connect buffer 0 as sink
[    0.043746] <inf> pipe: pipeline_connect: comp:0 0x19 connect buffer 0 as source
[    0.045236] <inf> ipc: ipc_cmd: rx	: 0x45000019|0x10004
[    0.045300] <inf> ipc: buffer_new: buffer new size 0xc00 id 0.0 flags 0x0
[    0.045435] <inf> dp_queue: dp_queue_create: DpQueue created, id: 0 shared: 0 min_available: 1536 min_free_space 2048, size 4096
[    0.045465] <inf> buffer: buffer_create_shadow_dp_queue: buf:0 0x0 dp_queue attached to buffer as a shadow, at_input: 1
[    0.045493] <inf> pipe: pipeline_connect: comp:0 0x19 connect buffer 0 as sink
[    0.045523] <inf> pipe: pipeline_connect: comp:0 0x10004 connect buffer 0 as source
[    0.161408] <inf> ipc: ipc_cmd: rx	: 0x13000003|0x0
[    0.161515] <wrn> ipc: ipc_pipeline_complete: ipc_pipeline_complete(): no scheduling component specified, use comp 0x10004
[    0.163000] <inf> ipc: ipc_cmd: rx	: 0x13000004|0x0
[    0.163436] <inf> copier: copier_prepare: comp:0 0x4 copier_prepare()
[    0.163603] <inf> src: src_prepare: comp:0 0x19 src_prepare()
[    0.163688] <inf> src: src_params_general: comp:0 0x19 src_params()
[    0.163726] <inf> src: src_set_params: comp:0 0x19 SRC DP period calculated as: 1333
[    0.163783] <inf> src: src_params_general: comp:0 0x19 src_params(), source_rate = 44100, sink_rate = 48000
[    0.164208] <inf> copier: copier_prepare: comp:0 0x10004 copier_prepare()
[    0.164361] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 7
[    0.164425] <inf> clock: clock_set_freq: clock 0 set freq 115018000Hz freq_idx 1
[    0.164448] <inf> clock: clock_set_freq: clock 1 set freq 115018000Hz freq_idx 1
[    0.164453] <inf> clock: clock_set_freq: clock 2 set freq 115018000Hz freq_idx 1
[    0.164458] <inf> pipe: pipeline_trigger: Sum of KCPS consumption: 115018, core: 0
[    0.164466] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0114cc0 0x20210U priority 0 flags 0x0
[    0.164483] <inf> ll_schedule: zephyr_domain_register: zephyr_domain_register domain->type 1 domain->clk 3 domain->ticks_per_ms 38400 period 1000
[    0.164633] <inf> copier: copier_comp_trigger: comp:0 0x4 No dai copier found, start/end offset is not calculated
[    0.164656] <inf> copier: copier_comp_trigger: comp:0 0x4 No dai copier found, start/end offset is not calculated
[    0.164663] <inf> ll_schedule: zephyr_ll_task_schedule_common: task add 0xa0109288 0x20150U priority 0 flags 0x0
[    0.164706] <inf> host_comp: host_get_copy_bytes_normal: comp:0 0x10004 no bytes to copy, available samples: 0, free_samples: 1536
[    1.187626] <inf> ll_schedule: zephyr_domain_thread_fn: ll core 0 timer avg 1970, max 3700, overruns 0
[    1.768773] <inf> ipc: ipc_cmd: rx	: 0x13000003|0x0
[    1.768785] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 2
[    1.768796] <inf> clock: clock_set_freq: clock 0 set freq 20000000Hz freq_idx 0
[    1.768801] <inf> clock: clock_set_freq: clock 1 set freq 20000000Hz freq_idx 0
[    1.768841] <inf> clock: clock_set_freq: clock 2 set freq 20000000Hz freq_idx 0
[    1.768923] <inf> pipe: pipeline_trigger: Sum of KCPS consumption: 20000, core: 0
[    1.769866] <inf> copier: copier_comp_trigger: comp:0 0x4 No dai copier found, start/end offset is not calculated
[    1.769956] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa0114cc0 0x20210U
[    1.769983] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 2 total_num_tasks 2
[    1.770013] <inf> ll_schedule: zephyr_ll_task_done: task complete 0xa0109288 0x20150U
[    1.770040] <inf> ll_schedule: zephyr_ll_task_done: num_tasks 1 total_num_tasks 1
[    1.770083] <inf> ll_schedule: zephyr_domain_unregister: zephyr_domain_unregister domain->type 1 domain->clk 3
[    1.771623] <inf> ipc: ipc_cmd: rx	: 0x13000002|0x0
[    1.771683] <inf> pipe: pipeline_trigger: pipe:0 0x0 pipe trigger cmd 0
[    1.771816] <inf> copier: copier_comp_trigger: comp:0 0x4 No dai copier found, start/end offset is not calculated
[    1.772156] <inf> src: src_reset: comp:0 0x19 src_reset()
[    1.772951] <err> os: print_fatal_exception:  ** FATAL EXCEPTION
[    1.773031] <err> os: print_fatal_exception:  ** CPU 0 EXCCAUSE 13 (load/store PIF data error)
[    1.773058] <err> os: print_fatal_exception:  **  PC 0xa0072f6c VADDR 0x18
[    1.773085] <err> os: print_fatal_exception:  **  PS 0x60620
[    1.773111] <err> os: print_fatal_exception:  **    (INTLEVEL:0 EXCM: 0 UM:1 RING:0 WOE:1 OWB:6 CALLINC:2)
[    1.773138] <err> os: xtensa_dump_stack:  **  A0 0xa00736a1  SP 0xa0110830  A2 0xa010e700  A3 0xa0112008
[    1.773166] <err> os: xtensa_dump_stack:  **  A4 0xa011399c  A5 (nil)  A6 0x2e8  A7 0xa004bd50
[    1.773193] <err> os: xtensa_dump_stack:  **  A8 0xa01176e0  A9 (nil) A10 0xa0110844 A11 0x3
[    1.773220] <err> os: xtensa_dump_stack:  ** A12 0x4 A13 0xfffffffc A14 0x4e0 A15 0x3
[    1.773246] <err> os: xtensa_dump_stack:  ** LBEG 0xa004bf1d LEND 0xa004bf2c LCOUNT (nil)
[    1.773271] <err> os: xtensa_dump_stack:  ** SAR (nil)
[    1.773298] <err> os: xtensa_dump_stack:  **  THREADPTR 0xa011a360

Backtrace:0xa0072f69:0xa0110830  |<-CORRUPTED

[    1.773638] <err> os: z_fatal_error: >>> ZEPHYR FATAL ERROR 0: CPU exception on CPU 0

@kv2019i
Copy link
Collaborator Author

kv2019i commented May 13, 2024

@marcinszkudlinski @singalsu @wszypelt Have you seen in this on other runs? It seems src_lite (using DP scheduling) is crashing in src_2s() at:

                n_written += s2.times * cd->src.stage2->blk_out;
a0072f6c:       6968                            l32i.n  a6, a9, 24
a0072f6e:       8276f0                          mull    a7, a6, a15

A9 is NULL, so this won't work. What confuses me is that there doesn't seem to be any link to the Zephyr update, so I suspect this is an existing bug we hit with (un)lucky timing. I can see src_reset() called which clears the stage2 pointers. What I don't get is why we have src_process() running at a point when the LL scheduler tasks have been terminated. Do we have a runaway DP thread running still, and when we run src_reset(), this blows up.

I can't run this test locally, so I'll next try a bisect with less Zephyr commits, but any input is welcome.

Fast-forward Zephyr by around 300 patches, including following
affecting SOF build targets:

soc/intel_adsp: ipc: initialize semaphore in driver init
drivers: dai: ssp: fix MN_MDIVCTRL_M_DIV_ENABLE for ACE+ platform
drivers: dai: intel: ssp: Only setup mclk/bclk when it is needed by role

Signed-off-by: Kai Vehmanen <kai.vehmanen@linux.intel.com>
@kv2019i
Copy link
Collaborator Author

kv2019i commented May 13, 2024

V2:

  • try a smaller bundle from Zephyr upstream

@kv2019i kv2019i changed the title west.yml: update Zephyr to b501278237bf west.yml: update Zephyr to e97d33d0c896 May 13, 2024
@marcinszkudlinski
Copy link
Contributor

marcinszkudlinski commented May 14, 2024

@kv2019i it looks like you hit the same race I'm tracking now. It reproduces very rarely on hardware, more often on FPGA. So looks very much like timing
edit: I have never seen it if SRC was processed on secondary core

@kv2019i
Copy link
Collaborator Author

kv2019i commented May 14, 2024

@marcinszkudlinski wrote:

@kv2019i it looks like you hit the same race I'm tracking now. It reproduces very rarely on hardware, more often on FPGA. So looks very much like timing edit: I have never seen it if SRC was processed on secondary core

Ack. I filed #9124 to track incidences of this as it seems this can be hit with main of SOF now (so can be seen with PR CI). Let's gather stats to #9124

On a new run, issue not seen, so I'll proceed with merging this PR.

@kv2019i kv2019i merged commit fdb67ee into thesofproject:main May 14, 2024
44 of 45 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants