Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optee 3.11 cannot initialize imx-watchdog causing cpu stall and heavy network lag #4219

Closed
ghost opened this issue Nov 20, 2020 · 4 comments
Closed
Labels

Comments

@ghost
Copy link

ghost commented Nov 20, 2020

Hey guys,

I am using optee tag 3.11.0 with yocto zeus branch on a custom hardware powered by an i.MX6q SoM. The relvant software I'm using is

  • barebox 2019.11.0-i.mx6-master
  • linux kernel v4.14.78 or v4.19.100
  • Yocto zeus (build info)

I'm facing a problem with network timeouts especially visible with qualcom's QCA7005 and general system lags when I boot optee along with REE. I can boot two different kernels: 4.14.78 and 4.19.100. The later is more sensitive to the problems and lags more frequently as the 4.14.78.
The problems heavily increase when I run some CPU intensive operations in the TA. When optee starts, it tells me it can't perform an Initcall in the method wdog_init().

D/TC:0 0 call_initcalls:21 level 5 init_caam()
D/TC:0 0 call_initcalls:21 level 5 csu_init()
D/TC:0 0 call_initcalls:21 level 5 scu_init()
D/TC:0 0 call_initcalls:21 level 5 imx_wdog_init()
E/TC:0 0 call_initcalls:24 Initcall __text_start + 0x0008546c failed
D/TC:0 0 call_initcalls:21 level 6 mobj_init()
D/TC:0 0 call_initcalls:21 level 6 default_mobj_init()
D/TC:0 0 call_finalcalls:40 level 1 release_external_dt()
I/TC: Primary CPU switching to normal world boot
I/TC: Secondary CPU 1 initializing
I/TC: Secondary CPU 1 switching to normal world boot
I/TC: Secondary CPU 2 initializing
I/TC: Secondary CPU 2 switching to normal world boot
I/TC: Secondary CPU 3 initializing
I/TC: Secondary CPU 3 switching to normal world boot

And when I reboot the system optee panics:

E/TC:0 imx_wdog_restart:50 No wdog mapped
E/TC:0 Panic at core/drivers/imx_wdog.c:51 <imx_wdog_restart>
E/TC:0 TEE load address @ 0x8e000000
E/TC:0 Call stack:
E/TC:0 0x8e007bad
E/TC:0 0x8e0135d9
E/TC:0 0x8e0127a5
E/TC:0 0x8e03f107
E/TC:0 0x8e00f4a5
E/TC:0 0x8e03f363
E/TC:0 0x8e00f33f
E/TC:0 0x8e00f264

Additionally, when performing CPU intensive operations both kernel stall and or panic. Here are example traces for the 4.14.78 and 4.19.100 kernel:

4.19.100 - kernel
[ 1254.474232] CPU: 1 PID: 2181 Comm: testApp Not tainted 4.19.100-i.mx6-master #1
[ 1254.481542] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 1254.488120] [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
[ 1254.495883] [] (show_stack) from [] (dump_stack+0xc8/0xe4)
[ 1254.503122] [] (dump_stack) from [] (nmi_cpu_backtrace+0x8c/0xc4)
[ 1254.510966] [] (nmi_cpu_backtrace) from [] (nmi_trigger_cpumask_backtrace+0xe8/0x13c)
[ 1254.520542] [] (nmi_trigger_cpumask_backtrace) from [] (arch_trigger_cpumask_backtrace+0x1c/0x24)
[ 1254.531184] [] (arch_trigger_cpumask_backtrace) from [] (rcu_dump_cpu_stacks+0xa0/0xd4)
[ 1254.540940] [] (rcu_dump_cpu_stacks) from [] (rcu_check_callbacks+0x720/0x8e0)
[ 1254.549925] [] (rcu_check_callbacks) from [] (update_process_times+0x40/0x70)
[ 1254.558817] [] (update_process_times) from [] (tick_sched_handle+0x64/0x70)
[ 1254.567524] [] (tick_sched_handle) from [] (tick_sched_timer+0x58/0xb8)
[ 1254.575886] [] (tick_sched_timer) from [] (__hrtimer_run_queues+0x1a0/0x394)
[ 1254.584683] [] (__hrtimer_run_queues) from [] (hrtimer_interrupt+0x130/0x2dc)
[ 1254.593572] [] (hrtimer_interrupt) from [] (twd_handler+0x3c/0x48)
[ 1254.601512] [] (twd_handler) from [] (handle_percpu_devid_irq+0x94/0x248)
[ 1254.610051] [] (handle_percpu_devid_irq) from [] (generic_handle_irq+0x30/0x44)
[ 1254.619109] [] (generic_handle_irq) from [] (__handle_domain_irq+0x60/0xc4)
[ 1254.627821] [] (__handle_domain_irq) from [] (gic_handle_irq+0x58/0xb0)
[ 1254.636182] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90)
[ 1254.643668] Exception stack(0xecf8fc88 to 0xecf8fcd0)
[ 1254.648727] fc80: ffff0004 00000000 00000000 00000000 00000000 00000000
[ 1254.656910] fca0: 00000000 00000000 ec22d0cc ffff0000 eddbafc0 ecf8fd14 ecf8fce8 ecf8fcd8
[ 1254.665089] fcc0: c0885314 c011968c 40030013 ffffffff
[ 1254.670158] [] (__irq_svc) from [] (__arm_smccc_smc+0x10/0x20)
[ 1254.677743] [] (__arm_smccc_smc) from [] (optee_smccc_smc+0x48/0x50)
[ 1254.685848] [] (optee_smccc_smc) from [] (optee_do_call_with_arg+0xbc/0x160)
[ 1254.694643] [] (optee_do_call_with_arg) from [] (optee_invoke_func+0x118/0x190)
[ 1254.703716] [] (optee_invoke_func) from [] (tee_ioctl+0x1008/0x11bc)
[ 1254.711837] [] (tee_ioctl) from [] (do_vfs_ioctl+0x400/0x97c)
[ 1254.719338] [] (do_vfs_ioctl) from [] (ksys_ioctl+0x44/0x70)
[ 1254.726748] [] (ksys_ioctl) from [] (sys_ioctl+0x18/0x1c)
[ 1254.733895] [] (sys_ioctl) from [] (ret_fast_syscall+0x0/0x54)
[ 1254.741467] Exception stack(0xecf8ffa8 to 0xecf8fff0)
[ 1254.746524] ffa0: bef73ac8 017f35d0 00000003 8010a403 bef73b00 00000003
[ 1254.754707] ffc0: bef73ac8 017f35d0 00000000 00000036 00000000 00000000 00476000 00000000
[ 1254.762887] ffe0: b6f39fac bef73a2c b6f28a21 b6ec8cf8

4.14.78 - kernel
[ 74.638111] INFO: rcu_sched self-detected stall on CPU
[ 74.643290] 3-...: (2099 ticks this GP) idle=eea/140000000000001/0 softirq=5992/5992 fqs=1041
[ 74.651990] (t=2101 jiffies g=1440 c=1439 q=167)
[ 74.656793] NMI backtrace for cpu 3
[ 74.660292] CPU: 3 PID: 2005 Comm: testApp Not tainted 4.14.78-i.mx6-master #1
[ 74.667519] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 74.674090] [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
[ 74.681850] [] (show_stack) from [] (dump_stack+0x94/0xb0)
[ 74.689096] [] (dump_stack) from [] (nmi_cpu_backtrace+0x8c/0xc4)
[ 74.696940] [] (nmi_cpu_backtrace) from [] (nmi_trigger_cpumask_backtrace+0xd4/0x128)
[ 74.706521] [] (nmi_trigger_cpumask_backtrace) from [] (arch_trigger_cpumask_backtrace+0x1c/0x24)
[ 74.717155] [] (arch_trigger_cpumask_backtrace) from [] (rcu_dump_cpu_stacks+0x9c/0xd0)
[ 74.726910] [] (rcu_dump_cpu_stacks) from [] (rcu_check_callbacks+0x6e8/0x8d8)
[ 74.735884] [] (rcu_check_callbacks) from [] (update_process_times+0x40/0x70)
[ 74.744776] [] (update_process_times) from [] (tick_sched_handle+0x64/0x70)
[ 74.753488] [] (tick_sched_handle) from [] (tick_sched_timer+0x4c/0xac)
[ 74.761848] [] (tick_sched_timer) from [] (__hrtimer_run_queues+0x17c/0x320)
[ 74.770642] [] (__hrtimer_run_queues) from [] (hrtimer_interrupt+0xe4/0x2d4)
[ 74.779437] [] (hrtimer_interrupt) from [] (twd_handler+0x3c/0x48)
[ 74.787367] [] (twd_handler) from [] (handle_percpu_devid_irq+0x94/0x248)
[ 74.795916] [] (handle_percpu_devid_irq) from [] (generic_handle_irq+0x30/0x44)
[ 74.804975] [] (generic_handle_irq) from [] (__handle_domain_irq+0x60/0xc4)
[ 74.813685] [] (__handle_domain_irq) from [] (gic_handle_irq+0x58/0xb0)
[ 74.822047] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90)
[ 74.829531] Exception stack(0xed269d10 to 0xed269d58)
[ 74.834589] 9d00: ffff0004 00000000 00000000 00000000
[ 74.842778] 9d20: 00000000 00000000 00000000 00000000 ed403880 ec5da54c befcaa68 ed269d9c
[ 74.850960] 9d40: ed269d70 ed269d60 c08055fc c011852c 80030013 ffffffff
[ 74.857591] [] (__irq_svc) from [] (__arm_smccc_smc+0x10/0x20)
[ 74.865179] [] (__arm_smccc_smc) from [] (optee_smccc_smc+0x48/0x50)
[ 74.873283] [] (optee_smccc_smc) from [] (optee_do_call_with_arg+0xa4/0x118)
[ 74.882079] [] (optee_do_call_with_arg) from [] (optee_invoke_func+0xf4/0x164)
[ 74.891046] [] (optee_invoke_func) from [] (tee_ioctl+0x8a0/0x1104)
[ 74.899068] [] (tee_ioctl) from [] (do_vfs_ioctl+0x3e0/0x948)
[ 74.906564] [] (do_vfs_ioctl) from [] (SyS_ioctl+0x44/0x70)
[ 74.913892] [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x54)
[ 74.921486] INFO: rcu_sched detected stalls on CPUs/tasks:
[ 74.927034] 3-...: (2100 ticks this GP) idle=eea/140000000000000/0 softirq=5992/5992 fqs=1043
[ 74.935747] (detected by 0, t=2129 jiffies, g=1440, c=1439, q=167)
[ 74.942055] Sending NMI from CPU 0 to CPUs 3:
[ 74.946784] NMI backtrace for cpu 3
[ 74.946789] CPU: 3 PID: 2005 Comm: testApp Not tainted 4.14.78-i.mx6-master #1
[ 74.946791] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 74.946793] task: ed620000 task.stack: ed268000
[ 74.946796] PC is at __arm_smccc_smc+0x10/0x20
[ 74.946798] LR is at optee_smccc_smc+0x48/0x50
[ 74.946802] pc : [] lr : [] psr: 80030013
[ 74.946804] sp : ed269d60 ip : ed269d70 fp : ed269d9c
[ 74.946806] r10: befcaa68 r9 : ec5da54c r8 : ed403880
[ 74.946809] r7 : 00000000 r6 : 00000000 r5 : 00000000 r4 : 00000000
[ 74.946811] r3 : 00000000 r2 : 00000000 r1 : 00000000 r0 : ffff0004
[ 74.946814] Flags: Nzcv IRQs on FIQs on Mode SVC_32 ISA ARM Segment none
[ 74.946817] Control: 10c5787d Table: 3d0ec04a DAC: 00000051
[ 74.946820] CPU: 3 PID: 2005 Comm: testApp Not tainted 4.14.78-i.mx6-master #1
[ 74.946823] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 74.946826] [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
[ 74.946829] [] (show_stack) from [] (dump_stack+0x94/0xb0)
[ 74.946833] [] (dump_stack) from [] (show_regs+0x1c/0x20)
[ 74.946835] [] (show_regs) from [] (nmi_cpu_backtrace+0xb4/0xc4)
[ 74.946840] [] (nmi_cpu_backtrace) from [] (handle_IPI+0xb0/0x330)
[ 74.946842] [] (handle_IPI) from [] (gic_handle_irq+0xa0/0xb0)
[ 74.946846] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90)
[ 74.946848] Exception stack(0xed269d10 to 0xed269d58)
[ 74.946852] 9d00: ffff0004 00000000 00000000 00000000
[ 74.946855] 9d20: 00000000 00000000 00000000 00000000 ed403880 ec5da54c befcaa68 ed269d9c
[ 74.946857] 9d40: ed269d70 ed269d60 c08055fc c011852c 80030013 ffffffff
[ 74.946861] [] (__irq_svc) from [] (__arm_smccc_smc+0x10/0x20)
[ 74.946864] [] (__arm_smccc_smc) from [] (optee_smccc_smc+0x48/0x50)
[ 74.946868] [] (optee_smccc_smc) from [] (optee_do_call_with_arg+0xa4/0x118)
[ 74.946871] [] (optee_do_call_with_arg) from [] (optee_invoke_func+0xf4/0x164)
[ 74.946874] [] (optee_invoke_func) from [] (tee_ioctl+0x8a0/0x1104)
[ 74.946876] [] (tee_ioctl) from [] (do_vfs_ioctl+0x3e0/0x948)
[ 74.946879] [] (do_vfs_ioctl) from [] (SyS_ioctl+0x44/0x70)
[ 74.946883] [] (SyS_ioctl) from [] (ret_fast_syscall+0x0/0x54)

Sometime I get an network transmit timeout:

Network stacktrace -
[ 77.048691] -----------[ cut here ]-----------
[ 77.053382] WARNING: CPU: 1 PID: 0 at net/sched/sch_generic.c:465 dev_watchdog+0x264/0x280
[ 77.061778] NETDEV WATCHDOG: eth1 (qcaspi): transmit queue 0 timed out
[ 77.068378] Modules linked in: qcaspi qca_7k_common [last unloaded: qca_7k_common]
[ 77.076141] CPU: 1 PID: 0 Comm: swapper/1 Not tainted 4.19.100-i.mx6-master #1
[ 77.083387] Hardware name: Freescale i.MX6 Quad/DualLite (Device Tree)
[ 77.089965] [] (unwind_backtrace) from [] (show_stack+0x20/0x24)
[ 77.097731] [] (show_stack) from [] (dump_stack+0xc8/0xe4)
[ 77.104981] [] (dump_stack) from [] (__warn.part.0+0xb8/0xe4)
[ 77.112479] [] (__warn.part.0) from [] (warn_slowpath_fmt+0x70/0x98)
[ 77.120584] [] (warn_slowpath_fmt) from [] (dev_watchdog+0x264/0x280)
[ 77.128795] [] (dev_watchdog) from [] (call_timer_fn+0x48/0x194)
[ 77.136555] [] (call_timer_fn) from [] (expire_timers+0xd4/0x148)
[ 77.144399] [] (expire_timers) from [] (run_timer_softirq+0xac/0x1e4)
[ 77.152590] [] (run_timer_softirq) from [] (__do_softirq+0xf8/0x384)
[ 77.160700] [] (__do_softirq) from [] (irq_exit+0xc8/0x118)
[ 77.168028] [] (irq_exit) from [] (__handle_domain_irq+0x68/0xc4)
[ 77.175870] [] (__handle_domain_irq) from [] (gic_handle_irq+0x58/0xb0)
[ 77.184230] [] (gic_handle_irq) from [] (__irq_svc+0x6c/0x90)
[ 77.191716] Exception stack(0xec187f38 to 0xec187f80)
[ 77.196771] 7f20: 00072e04 00000000
[ 77.204955] 7f40: eedf73cc c0120580 ec186000 00000001 c1004d30 c1004d78 c10b4329 c0d1c44c
[ 77.213138] 7f60: 00000000 ec187f94 ec187f98 ec187f88 c010a784 c010a788 600d0013 ffffffff
[ 77.221336] [] (__irq_svc) from [] (arch_cpu_idle+0x40/0x4c)
[ 77.228749] [] (arch_cpu_idle) from [] (default_idle_call+0x30/0x3c)
[ 77.236870] [] (default_idle_call) from [] (do_idle+0xd8/0x168)
[ 77.244541] [] (do_idle) from [] (cpu_startup_entry+0x28/0x2c)
[ 77.252128] [] (cpu_startup_entry) from [] (secondary_start_kernel+0x15c/0x1a4)
[ 77.261187] [] (secondary_start_kernel) from [<101026ec>] (0x101026ec)
[ 77.268836] --[ end trace f4e46a7109557bf7 ]--
[ 77.273495] qcaspi spi0.0 eth1: Transmit timeout at -22274, latency 156

I suspect since Optee can't handle the watchdog, the kernel detects the stall, but the eth scheduler doesn't and runs in a timeout. Both watchdogs from the imx_wdog.c:115

"/soc/aips-bus@2000000/wdog@20bc000",
"/soc/aips-bus@2000000/wdog@20c0000",

are configured in the unsecure devicetree optee is using. My problem seems to relate to this issue: #1353 which was resolved in 2017.

Can someone give me pointers where I should search for the solution since I have now idea how to processd from here on.

Thank you and have nice weekend
Stanislav

@Emantor
Copy link
Contributor

Emantor commented Nov 21, 2020

For the watchdog you'll want #4116 and for the scheduling stalls this patch if you are running with CONFIG_PREEMPT_NONE. The kernel patch is waiting for s/o to test it, the PR was updated by me but has not been tested on hardware yet.

@ghost
Copy link
Author

ghost commented Nov 24, 2020

@Emantor Thanks for the info. I'll try those patches and close this issue if they work for me.

EDIT:
I've tested you patch and investigated the issue in some more depth:

your patch unfortunately doesn't solve my issue.
Maybe I didn't quite get the optee-os fdt handling:
Without a secure devicetree, bareox passes the address of its devicetree to optee which is located in the nsec memory. So basically, optee-os and barebox use the very same fdt?

The fdt I pass to barebox does define both wd nodes at the location

"/soc/aips-bus@2000000/wdog@20bc000",
"/soc/aips-bus@2000000/wdog@20c0000",

with the compatible flags your patch uses to find those nodes:

wdog1: wdog@20bc000 {
compatible = "fsl,imx6q-wdt", "fsl,imx21-wdt";
...

Also, when I run devinfo in barebox I can see in the tree at the location platform->soc.of->2000000.aips-bus@2000000.of->20bc000.wdog@20bc000.of->wdog1 (and wdog2 respectively)

Whe optee searches for the nodes, in kernel/dt.c:157 _fdt_get_status the fdt_getprop() method is called. Seemingly it finds the correct(?) node and the property status. Despite all efforts, the status stays disabled no matther where I override the status property (status = "okay") in the barebox-fdt . The return values and parameters are of the fdt_getprop call are:
E/TC:0 0 _fdt_get_status:267 PROPERTY NSEC: disabled, offset: 16956, staus: 0, len: 9 (It's a debug message I put there)

Apparently optee-os itself is capable to find nodes in general since prior to the search for the watchdog, I can see these msgs:
`E/TC:0 0 _fdt_get_status:267 PROPERTY NSEC: okay, offset: 10680, staus: 1, len: 5
I/TC: Switching console to device: /soc/aips-bus@2000000/spba-bus@2000000/serial@2020000

I can imagine 3 possibilities for this behavior:

  1. I get the status for the wrong node
  2. The property readout is buggy
  3. optee uses a different fdt or overrides some properties

Do you have an idea how to verify the node location and property offset is correct?

@Emantor
Copy link
Contributor

Emantor commented Dec 10, 2020

I just updated the PR, on my nitrogen6x everything works now.
Devinfo from barebox:

         `-- 20bc000.watchdog@20bc000.of
            `-- wdog0

devinfo wdog0:

barebox@Boundary Devices i.MX6 Quad Nitrogen6x Board:/ devinfo wdog0
Parent: 20bc000.watchdog@20bc000.of
Parameters:
  priority: 100 (type: uint32)
  running: 0 (type: enum) (values: "unknown", "1", "0")
  timeout_max: 128 (type: uint32)
barebox@Boundary Devices i.MX6 Quad Nitrogen6x Board:/

Wdog init from OP-TEE:

D/TC:0 0 call_initcalls:21 level 5 imx_wdog_init()
D/TC:0 0 imx_wdog_base:123 Wdog found at 15464

@github-actions
Copy link

This issue has been marked as a stale issue because it has been open (more than) 30 days with no activity. Remove the stale label or add a comment, otherwise this issue will automatically be closed in 5 days. Note, that you can always re-open a closed issue at any time.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant