Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant #20

Open
erdoukki opened this issue Sep 17, 2021 · 127 comments
Open

cpufreq: armada-37xx: forbid cpufreq for 1.2 GHz variant #20

erdoukki opened this issue Sep 17, 2021 · 127 comments

Comments

@erdoukki
Copy link

openwrt/openwrt@f407b2f

How can we contact Marvell to have the needed information ?

@pali
Copy link
Contributor

pali commented Sep 22, 2021

@kostapr: Hi! Could you look at this bug report?

@kostapr
Copy link

kostapr commented Sep 22, 2021

I have no problem with this patch. BTW, Ken Ma, Igal Liberman and Victor Gu are not with Marvell anymore. For the future Armada-related patches, please add Stefan (stefanc@marvell.com) and Nadav (nadavh@marvell.com) to the CC list.
Reviewed-by: Konstantin Porotchkin kostap@marvell.com

@robimarko
Copy link

@kostapr But that patch is not a solution, it's just a hotfix to get the devices booting and not constantly crashing due to voltage issues.
The solution cant be just to disable cpufreq and force the lowest frequency

@kostapr
Copy link

kostapr commented Sep 22, 2021

@robimarko I cannot comment on the problem. I personally think that not all 37xx dies are capable to work stable at 1.2GHz. However, this should be confirmed by HW design or production team at Marvell. Hopefully @haklai can add more on this matter.

@robimarko
Copy link

@kostapr Well that is normal, but the thing is that those marked and sold as 1.2GHz ones are having issues, personally, I have a lot of those in the field and they all crash currently if you allow them to scale.
I know that @pali has been trying to get solved for a while now.

@pali
Copy link
Contributor

pali commented Sep 22, 2021

There is part order number 88F3720-xx-–BVB2C120-P123 of A37xx SoC which is designed for 1.2 GHz. This SoC die has below its Marvell logo marking C120 (speed code).

So @robimarko could you confirm that you have the right 37xx die which is designed for 1.2 GHz and in this case @kostapr or @haklai could you get more information about HW design / production team where is the issue?

@robimarko
Copy link

robimarko commented Sep 22, 2021

@pali I opened one of the Esspresobin Ultras I have and the SoC PN is: 88F3720-A0 C120
Even the stock ATF/WTMI and U-boot see it as a 1.2GHz model.
I am attaching the image as well.
https://imgur.com/A02jhaw

@pali
Copy link
Contributor

pali commented Sep 22, 2021

@kostapr so for sure above @robimarko's SoC is designed for 1.2 GHz.

@stefanchulski
Copy link

@pali If you disable DFS feature and boot with 1.2GHz frequency only, do you see any crashes?

@pali
Copy link
Contributor

pali commented Sep 22, 2021

@stefanchulski currently I do not have 1.2GHz variant of A3720 SoC.

@robimarko and @erdoukki could you please do required tests for @stefanchulski?

@erdoukki
Copy link
Author

erdoukki commented Sep 22, 2021

Sure, with pleasure, as usual...
Just give me the needed patch file or binary, please.
I will also check the CPU rerefence of my Ultra.

@robimarko
Copy link

robimarko commented Sep 22, 2021

@stefanchulski If I am seeing it correctly, it's using 1200MHz by default after booting as the kernel is not scaling it anymore.

[    2.305272] Unsupported CPU frequency 1200 MHz
root@OpenWrt:/sys/devices/system/cpu# cat /sys/kernel/debug/clk/cpu/clk_rate 
1200000000

I need to really stress test it before claiming that it's stable with the WTMI set VDD.
Which in my case is:
SVC REV: 5, CPU VDD voltage: 1.213V

But I have seen samples that use 1.26V as well, and I don't think that the CPUFreq has a way to know this and uses too low voltage for most boards.

UPDATE:
Even a couple of seconds of stress testing will crash it, so it's not stable at all:

root@OpenWrt:/# stress --cpu 2 --io 2 --timeout 1h
stress: info: [2444] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd
[   55.174519] ------------[ cut here ]------------
[   55.179312] Kernel BUG at do_undefinstr+0x27c/0x290 [verbose debug info unavailable]
[   55.187300] Internal error: Oops - BUG: 0 [#1] SMP
[   55.192237] Modules linked in: pppoe ppp_async iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD pppox ppp_generic nf_nat nf_flow_table nf_conntrack ipt_REJECT xt_time xt_tcpudp xt_multiport xt_mark xt_mac xt_limit xt_comment xt_TCPMSS xt_LOG slhc rtc_pcf8563 nf_reject_ipv4 nf_log_g
[   55.250897] CPU: 1 PID: 771 Comm: loop0 Not tainted 5.10.64 #0
[   55.256909] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)
[   55.264088] pstate: 00400005 (nzcv daif +PAN -UAO -TCO BTYPE=--)
[   55.270283] pc : do_undefinstr+0x27c/0x290
[   55.274503] lr : do_undefinstr+0x58/0x290
[   55.278633] sp : ffffffc0110839d0
[   55.282045] x29: ffffffc0110839d0 x28: ffffff8001777300 
[   55.287522] x27: 0000000000000000 x26: 00000000000000e0 
[   55.292999] x25: 0000000000000000 x24: ffffffc010b3f000 
[   55.298476] x23: 0000000060400005 x22: ffffffc0107b68e0 
[   55.303952] x21: 00000000009b4000 x20: 0000000000000000 
[   55.309429] x19: ffffffc011083a40 x18: 0000000000000000 
[   55.314905] x17: 0000000000000000 x16: 0000000000000000 
[   55.320382] x15: 0000000000000000 x14: 0000000000000000 
[   55.325858] x13: 00000000000001f9 x12: 0000000000000040 
[   55.331335] x11: ffffff8001000000 x10: 0000000000000005 
[   55.336812] x9 : 0000000000000001 x8 : 00000000830b65da 
[   55.342288] x7 : 0000000000000005 x6 : ffffffc011083a10 
[   55.347765] x5 : 0000000000000000 x4 : ffffffc010a3fd20 
[   55.353242] x3 : 00000000d5300000 x2 : 0000000000000000 
[   55.358719] x1 : ffffffc010b0b0e8 x0 : 0000000060400005 
[   55.364196] Call trace:
[   55.366716]  do_undefinstr+0x27c/0x290
[   55.370581]  el1_undef+0x2c/0x4c
[   55.373904]  el1_sync_handler+0x8c/0xd0
[   55.377855]  el1_sync+0x88/0x140
[   55.381183]  __hyp_text_end+0x5c/0x77c
[   55.385044]  __wait_for_common+0xe4/0x1e4
[   55.389175]  wait_for_completion_io+0x20/0x30
[   55.393667]  submit_bio_wait+0x4c/0x64
[   55.397532]  blkdev_issue_flush+0x74/0x94
[   55.401664]  blkdev_fsync+0x2c/0x4c
[   55.405258]  vfs_fsync+0x3c/0x7c
[   55.408587]  loop_queue_work+0x368/0x97c
[   55.412632]  kthread_worker_fn+0x100/0x1d0
[   55.416852]  loop_kthread_worker_fn+0x20/0x30
[   55.421341]  kthread+0x124/0x12c
[   55.424665]  ret_from_fork+0x10/0x3c
[   55.428352] Code: d5033fdf d51b4220 17ffffcf a9025bf5 (d4210000) 
[   55.434635] ---[ end trace 8fab771838008e64 ]---
[   55.439393] Kernel panic - not syncing: Oops - BUG: Fatal exception
[   55.445855] SMP: stopping secondary CPUs
[   55.449900] Kernel Offset: disabled
[   55.453494] CPU features: 0x0000002,00002008
[   55.457891] Memory Limit: none
[   55.461037] Rebooting in 3 seconds..

@pali
Copy link
Contributor

pali commented Sep 22, 2021

My guess is that in wtmi firmware is missing some init sequence related to CPU voltage configuration. See function init_avs(): https://github.com/MarvellEmbeddedProcessors/A3700-utils-marvell/blob/master/wtmi/sys_init/avs.c

There is array otp_data[] filled by OTP values from SoC itself, but some bits are not used. And they are non-zero, so has some value, but there is no documentation what they mean... E.g. low 8 bits in otp_data[OTP_DATA_SVC_REV_ID]. Relevant header file: https://github.com/MarvellEmbeddedProcessors/A3700-utils-marvell/blob/master/wtmi/sys_init/avs.h

@pali
Copy link
Contributor

pali commented Sep 22, 2021

But I have seen samples that use 1.26V as well, and I don't think that the CPUFreq has a way to know this and uses too low voltage for most boards.

CPUFreq driver armada-37xx-cpufreq.c know this, it grabs this value from OTP (but indirectly, it reads it from register which is filled by wtmi code, which fills it from OTP). Driver uses following Marvell algorithm:

  • CPU max_freq uses vdd max(OTP, 1000mV)
  • CPU max_freq/div1 uses vdd max(OTP-100mV, 1000mV)
  • CPU max_freq/div2 uses vdd max(OTP-150mV, 1000mV)
  • CPU max_freq/div3 uses vdd max(OTP-150mV, 1000mV)

For max_freq 1200 MHz are: div1=2, div2=4, div3=6; for 1000 MHz are: div1=2, div2=4, div3=5; and for 800 MHz are: div1=2, div2=3, div3=4.

But what is source of above Marvell algorithm and these constants (specially those substracted 100mV and 150mV for div1/2/3) I do not know. I was not able to find this documented neither in Armada 3720 Functional or Hardware specification.

And I suspect that these 100mV and 150mV constants are incorrect too as for CPU with max_freq=1GHz I had to do small adjustment in cpufreq driver.

I was told that Marvell reproduced this issue on their 3720 development board last year and was preparing some fix for it, including documentation/errata update. But I have not seen anything.

So it means that somebody in Marvell must have been aware of this issue and should have know more details about it (or somebody who is not with Marvell anymore as @kostapr wrote).

Also look at Armada 3720 Errata document, there is for a long time documented issue related to 1.2GHz mode.

@pali
Copy link
Contributor

pali commented Sep 24, 2021

@stefanchulski: Do you need some more tests? Or is above crash confirmation with log from @robimarko enough?

@stefanchulski
Copy link

@pali So issue related to cpufreq as described in the patch or do you have an issue with 1.2GHz?

@pali
Copy link
Contributor

pali commented Sep 25, 2021

@stefanchulski seems that both. There is issue related to cpufreq as described on mailing list. And @robimarko has problems with 1.2GHz as described in post #20 (comment)

@stefanchulski
Copy link

@pali All other frequencies stable? Its a specific board issue occurred on many boards?

@pali
Copy link
Contributor

pali commented Sep 25, 2021

It is on many boards. Problem occurs when either running on L0 load (ie without divisor) or when switching from L1 load (uses div1) to L0.

@pali
Copy link
Contributor

pali commented Sep 25, 2021

After lot of experiments we somehow workarounded this crash on 1GHz variant of A3720 with this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=d118ac2062b5b8331c8768ac81e016617e0996ee

But fix/workaround does not work for 1.2GHz variant of A3720 and as @robimarko wrote it still crashes.

@stefanchulski
Copy link

I not familiar with all these AVS configurations on A37XX. But hardcoded values look strange, should you take into account chip skew and calculate AVS from SVC?

@robimarko
Copy link

Yes, I only have 1.2GHz A3720 models and for me, all of the boards I tried are crashing.
Only if I crank the minimal voltage up a lot do they tend to be stable, but it's way too pre-silicon sample-specific.

@pali
Copy link
Contributor

pali commented Sep 25, 2021

But hardcoded values look strange

Yes, but we have absolutely no idea what is happening here. And if you look at referenced changed from above commit dc33b62 those hardcoded values were done by Marvell developers...

should you take into account chip skew and calculate AVS from SVC?

Probably, but we have no idea how... There is missing documentation about this topic. I have not seen any SVC documentation. So this is something which is probably only internally in Marvell.

@erdoukki
Copy link
Author

erdoukki commented Sep 26, 2021

@stefanchulski If I am seeing it correctly, it's using 1200MHz by default after booting as the kernel is not scaling it anymore.

Same for me...

SVC REV: 5, CPU VDD voltage: 1.237V
Model: gti cellular cpe board
       CPU     1200 [MHz]
       L2      1200 [MHz]
       NB AXI  300 [MHz]
       SB AXI  250 [MHz]
       DDR     750 [MHz]
[    2.155142] Unsupported CPU frequency 1200 MHz                                                                                                      
root@OpenWrt:/# uname -ar                                                                                                                                                                           
Linux OpenWrt 5.10.64 #0 SMP Sun Sep 26 07:10:17 2021 aarch64 GNU/Linux                                                                                                                             
root@OpenWrt:/# cat /sys/kernel/debug/clk/cpu/clk_rate                                                                                                 
1200000000                                                                                                                                             

This one had crash quickly...
Will redo the stress test to post results !

@erdoukki
Copy link
Author

erdoukki commented Sep 26, 2021

another ULTRA

SVC REV: 5, CPU VDD voltage: 1.225V
[    2.228075] Unsupported CPU frequency 1200 MHz
root@OpenWrt:/# cat /sys/kernel/debug/clk/cpu/clk_rate                                                                                                                                              
1200000000                                                                                                                                                                                          
root@OpenWrt:~# uname -ar
Linux OpenWrt 5.4.143 #0 SMP Tue Aug 31 22:20:08 2021 aarch64 GNU/Linux
OPENWRT_RELEASE="OpenWrt 21.02.0 r16279-5cc0535800"
root@OpenWrt:/# stress --cpu 2 --io 2 --timeout 1h                                                                                                                                                  
stress: info: [2800] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd         
stress: info: [2800] successful run completed in 3600s                                                                                                                                                                                                                                                                          

UPDATE : OK

@erdoukki
Copy link
Author

erdoukki commented Sep 26, 2021

More from third ULTRA board :

root@ultra:~# uname -ar
Linux ultra 5.4.124 #0 SMP Sun Jun 13 22:02:19 2021 aarch64 GNU/Linux
TIM-1.0                                                                                                                                                                                             
mv_ddr-devel-g80be893d2b-d DDR4 16b 1GB 1CS                                                                                                                                                         
WTMI-devel-18.12.1-2efdb10f                                                                                                                                                                         
WTMI: system early-init                                                                                                                                                                             
SVC REV: 5, CPU VDD voltage: 1.097V                                                                                                                                                                 
Setting clocks: CPU 1000 MHz, DDR 800 MHz                                                                                                                                                           
CZ.NIC's Armada 3720 Secure Firmware v2021.04.09 (Aug  8 2021 14:26:28)                                                                                                                             
Running on ESPRESSObin Ultra                                                                                                                                                                        
OPENWRT_RELEASE="OpenWrt 21.02.0-rc3 r16172-2aba3e9784"

from lscpu :

CPU max MHz:                     1000.0000
CPU min MHz:                     200.0000

pretty stable :

root@ultra:~# stress --cpu 2 --io 2 --timeout 1h
stress: info: [14832] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd

UPDATE : OK

@erdoukki
Copy link
Author

erdoukki commented Sep 26, 2021

More also from my fourth ULTRA board :

root@ULTRA-5G:~# uname -ar
Linux ULTRA-5G 5.4.137 #0 SMP Sat Jul 31 17:21:01 2021 aarch64 GNU/Linux
OPENWRT_RELEASE="OpenWrt 21.02.0-rc4 r16256-2d5ee43dc6"
SVC REV: 5, CPU VDD voltage: 1.202V                                                                                                                                                                 

from lscpu

CPU max MHz:                     1200.0000
CPU min MHz:                     200.0000

pretty stable :

root@ULTRA-5G:~# stress --cpu 2 --io 2 --timeout 1h
stress: info: [22073] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd
stress: info: [22073] successful run completed in 3600s

UPDATE : OK

@pali
Copy link
Contributor

pali commented Sep 26, 2021

Due to bugs in a37xx cpu driver, reported cpu frequency (e.g. by lscpu) could be incorrect. So the best check for (maximal) cpu frequency is to use mhz userspace tool from https://github.com/wtarreau/mhz which reports correct value, even when kernel reports it incorrectly.

@robimarko
Copy link

@pali It's running at 1200MHz as that is set by WTMI and since CPUFreq is blacklisted for the 1200MHz model kernel won't touch it.

root@OpenWrt:/# mhz
count=516515 us50=21364 us250=106829 diff=85465 cpu_MHz=1208.717

@erdoukki That's the issue that depending on the exact board you test some are stable with the WTMI set voltages while others are not, for me most of them will crash.
And if CPUFreq is enabled then they will crash much faster, usually during the boot itself, so its a bug for sure.

@erdoukki
Copy link
Author

@pali @robimarko
any advice to help on tests with the official SDK10 is welcome...

@erdoukki
Copy link
Author

erdoukki commented Jan 9, 2022

Héllo all,
Have an nice and happy new year...

I get some new issues around the CPU bug on the 37xx.
One of my EspressoBin-Ultra, which is mostly stable, get some reboot, on heavy CPU and network load.

I may look at it deeper if needed, because it is one of my working ULTRA, which get reboot only one CPU load...
I use it as a 4G-Mobile routeur/gateway.

Add: I have to get in the testing of the official Marvell SDK, but it was postpone for now...
STAY TUNED

@erdoukki
Copy link
Author

erdoukki commented Jan 12, 2022

FROM SDK10 (SDK-10.3.9.0) and OpenWrt 21.02.0, r16279-5cc0535800
compile the release version with:

compile.sh a37xx_espressobin_1000_800 -r SDK-10.3.9.0

Booting an ESPRESSObin-ULTRA (one of my mostly unstable... checked before tests and confirmed to still CRASHING few seconds only after boot in OpenWrt with default kernel)

[    0.000000] Linux version 5.4.143 (builder@buildhost) (gcc version 8.4.0 (OpenWrt GCC 8.4.0 r16279-5cc0535800)) #0 SMP Tue Aug 31 22:20:08 2021              
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Ultra Board              
root@OpenWrt:/# dmesg | grep CPU              
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]              
[    0.000000] Detected VIPT I-cache on CPU0              
[    0.000000] CPU features: detected: GIC system register CPU interface              
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1              
[    0.000000] rcu:     RCU restricting CPUs from NR_CPUS=4 to nr_cpu_ids=2.              
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000d1d40000              
[    0.051306] smp: Bringing up secondary CPUs ...              
[    0.056287] Detected VIPT I-cache on CPU1              
[    0.056315] GICv3: CPU1: found redistributor 1 region 0:0x00000000d1d60000              
[    0.056348] CPU1: Booted secondary processor 0x0000000001 [0x410fd034]              
[    0.056427] smp: Brought up 1 node, 2 CPUs              
[    0.083316] CPU features: detected: 32-bit EL0 Support              
[    0.088607] CPU features: detected: CRC32 instructions              
[    0.093923] CPU features: emulated: Privileged Access Never (PAN) using TTBR0_EL1 switching              
[    0.102496] CPU: All CPU(s) started at EL2              
[    0.889208] cacheinfo: Unable to detect cache hierarchy for CPU 0              
[    2.117397] Unsupported CPU frequency 1200 MHz              

Then after few seconds...

[  269.289957] ------------[ cut here ]------------              
[  269.294726] bdi-block not registered              
[  269.298425] WARNING: CPU: 1 PID: 937 at 0xffffffc010211bec              
[  269.304077] Modules linked in: pppoe ppp_async iptable_nat xt_state xt_nat xt_conntrack xt_REDIRECT xt_MASQUERADE xt_FLOWOFFLOAD xt_CT pppox ppp_generic nf_nat nf_flow_table_hw nf_flow_tableg
[  269.356396] CPU: 1 PID: 937 Comm: ash Not tainted 5.4.143 #0              
[  269.362226] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)              
[  269.369403] pstate: 80400005 (Nzcv daif +PAN -UAO)              
[  269.374338] pc : 0xffffffc010211bec              
[  269.377929] lr : 0xffffffc010211bec              
[  269.381520] sp : ffffffc010e73d60              
[  269.384931] x29: ffffffc010e73d60 x28: ffffff803e0521c0               
[  269.390405] x27: 0000000000000000 x26: 0000000040000001               
[  269.395879] x25: 0000000000000001 x24: 0000000000000001               
[  269.401352] x23: 0000000000000000 x22: 0000000000000000               
[  269.406826] x21: 0000000000000001 x20: ffffff803cdb6878               
[  269.412300] x19: ffffff803e721d58 x18: 0000000000000000               
[  269.417773] x17: 0000000000000000 x16: 0000000000000000               
[  269.423247] x15: 0000000000000000 x14: ffffffc0109a2a10               
[  269.428721] x13: 0000000000000000 x12: ffffffc0109a2000               
[  269.434195] x11: ffffffc010946000 x10: 0000000000000010               
[  269.439668] x9 : 0000000000000000 x8 : 6465726574736967               
[  269.445142] x7 : 657220746f6e206b x6 : 0000000000000001               
[  269.450615] x5 : 0000000000000000 x4 : 0000000000000001               
[  269.456089] x3 : 0000000000000007 x2 : 0000000000000006               
[  269.461562] x1 : 0000000000000007 x0 : 0000000000000018               
[  269.467037] Call trace:              
[  269.469554]  0xffffffc010211bec              
[  269.472785]  0xffffffc0101fdbac              
[  269.476017]  0xffffffc010200bcc              
[  269.479249]  0xffffffc0101f4318              
[  269.482481]  0xffffffc0101f5a04              
[  269.485712]  0xffffffc0101f5c2c              
[  269.488945]  0xffffffc010094fac              
[  269.492178]  0xffffffc010083748              
[  269.495411] ---[ end trace de9ce09de484892c ]---              

Now entering the SDK10 tests !

Just booting with SDK10 Image (and modules) in OpenWrt 21.02.0...

Marvell>> setenv bootargs $console root=/dev/mmcblk0p2 rw rootwait net.ifnames=0 biosdevname=0  $extra_params usb-storage.quirks=$usbstoragequirks                                                                                                             
Marvell>> load usb 0 $kernel_addr_r ULTRA-SDK10/Image                                                                                                                                                                                                          
Marvell>> booti $kernel_addr_r - $fdt_addr_r                                                                                                             
## Flattened Device Tree blob at 06f00000                                                                                                             
   Booting using the fdt blob at 0x6f00000                                                                                                             
   Using Device Tree in place at 0000000006f00000, end 0000000006f05fbf                                                                                                             
root@OpenWrt:/# uname -ar                                                                                                                                                           
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux                                                                                     
root@OpenWrt:/# stress-ng --matrix 0 -t 10m
root@OpenWrt:/# dmesg | grep CPU                                                                                                                                                    
[    0.000000] Booting Linux on physical CPU 0x0                                                                                                                                    
[    0.000000] Boot CPU: AArch64 Processor [410fd034]                                                                                                                               
[    0.000000] Detected VIPT I-cache on CPU0                                                                                                                                        
[    0.000000] CPU features: enabling workaround for ARM erratum 845719                                                                                                             
[    0.000000] CPU features: kernel page table isolation disabled by kernel configuration                                                                                           
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1                                                                                                           
[    0.000000]  RCU restricting CPUs from NR_CPUS=96 to nr_cpu_ids=2.                                                                                                               
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000d1d40000                                                                                                       
[    0.000000] NO_HZ: Full dynticks CPUs: 1.                                                                                                                                        
[    0.000000]  Note: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.                                                                                                      
[    0.000000]  Offload RCU callbacks from CPUs: 1.                                                                                                                                 
[    0.116792] smp: Bringing up secondary CPUs ...                                                                                                                                  
[    0.149654] Detected VIPT I-cache on CPU1                                                                                                                                        
[    0.149682] GICv3: CPU1: found redistributor 1 region 0:0x00000000d1d60000                                                                                                       
[    0.149714] CPU1: Booted secondary processor [410fd034]                                                                                                                          
[    0.149830] smp: Brought up 1 node, 2 CPUs                                                                                                                                       
[    0.175386] CPU features: detected: GIC system register CPU interface                                                                                                            
[    0.182026] CPU features: detected: 32-bit EL0 Support                                                                                                                           
[    0.187456] CPU: All CPU(s) started at EL2                                                                                                                                       
[    1.880786] kvm [1]: GIC system register CPU interface enabled                                                                                                                   
[    2.032688] WARNING: CPU: 0 PID: 1 at drivers/phy/marvell/phy-mvebu-cp110-comphy.c:536 mvebu_comphy_probe+0x2f8/0x330                                                            
[    2.046744] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.207-10.3.9.0-2 #1                                                                                                     
[    2.245749] WARNING: CPU: 0 PID: 1 at drivers/phy/marvell/phy-mvebu-cp110-comphy.c:536 mvebu_comphy_probe+0x2f8/0x330                                                            
[    2.259806] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.14.207-10.3.9.0-2 #1                                                                                       
[    2.460047] WARNING: CPU: 0 PID: 1 at drivers/phy/marvell/phy-mvebu-cp110-comphy.c:536 mvebu_comphy_probe+0x2f8/0x330                                                            
[    2.474104] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.14.207-10.3.9.0-2 #1                                                                                       
[    3.110841] cacheinfo: Unable to detect cache hierarchy for CPU 0                                                                                                                

stressed with:

crash (but not reset):

[    2.027996] ------------[ cut here ]------------                                                                                                                                 
[    2.032688] WARNING: CPU: 0 PID: 1 at drivers/phy/marvell/phy-mvebu-cp110-comphy.c:536 mvebu_comphy_probe+0x2f8/0x330                                                            
[    2.043604] Modules linked in:                                                                                                                                                   
[    2.046744] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 4.14.207-10.3.9.0-2 #1                                                                                                     
[    2.053997] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                                                      
[    2.061166] task: ffff80002e9b8000 task.stack: ffff80002e9b4000                                                                                                                  
[    2.067260] pc : mvebu_comphy_probe+0x2f8/0x330                                                                                                                                  
[    2.071919] lr : mvebu_comphy_probe+0x2f8/0x330                                                                                                                                  
[    2.076576] sp : ffff80002e9b7be0 pstate : 60000045                                                                                                                              
[    2.081594] x29: ffff80002e9b7be0 x28: ffff80002e005800                                                                                                                          
[    2.087059] x27: ffff80002e0b9b80 x26: ffff000008ecf8a8                                                                                                                          
[    2.092525] x25: 00000000014080c0 x24: ffff000008ecfbb0                                                                                                                          
[    2.097990] x23: ffff80002e0b9680 x22: ffff80002eb51810                                                                                                                          
[    2.103456] x21: ffff80002eb51800 x20: ffff0000091665a8                                                                                                                          
[    2.108922] x19: ffff80002ffeb300 x18: 0000000000000010                                                                                                                          
[    2.114387] x17: 0000000000000003 x16: 0000000000000000                                                                                                                          
[    2.119853] x15: ffffffffffffffff x14: 0000000000000000                                                                                                                          
[    2.125318] x13: 0000000000000000 x12: 0000000078696cc0                                                                                                                          
[    2.130784] x11: 0000000000000000 x10: 00000000000009f0                                                                                                                          
[    2.136250] x9 : ffff80002e9b7950 x8 : ffff80002e9b8a50                                                                                                                          
[    2.141715] x7 : 0000000000000400 x6 : 0000000000000108                                                                                                                          
[    2.147181] x5 : 0000000000000002 x4 : 0000000000000001                                                                                                                          
[    2.152646] x3 : fffffffffffffffe x2 : ffff80002e9b7910                                                                                                                          
[    2.158112] x1 : ffff000009468ae0 x0 : 0000000000000032                                                                                                                          
[    2.163578] Call trace:                                                                                                                                                          
[    2.166089]  mvebu_comphy_probe+0x2f8/0x330                                                                                                                                      
[    2.170393]  platform_drv_probe+0x58/0xc0                                                                                                                                        
[    2.174511]  driver_probe_device+0x248/0x2e0                                                                                                                                     
[    2.178901]  __driver_attach+0xbc/0xc0                                                                                                                                           
[    2.182757]  bus_for_each_dev+0x4c/0xa0                                                                                                                                          
[    2.186696]  driver_attach+0x20/0x30                                                                                                                                             
[    2.190369]  bus_add_driver+0x1b0/0x220                                                                                                                                          
[    2.194312]  driver_register+0x60/0x100                                                                                                                                          
[    2.198255]  __platform_driver_register+0x40/0x50                                                                                                                                
[    2.203098]  mvebu_comphy_driver_init+0x18/0x20                                                                                                                                  
[    2.207755]  do_one_initcall+0x38/0x130                                                                                                                                          
[    2.211697]  kernel_init_freeable+0x184/0x220                                                                                                                                    
[    2.216178]  kernel_init+0x10/0x110                                                                                                                                              
[    2.219759]  ret_from_fork+0x10/0x24                                                                                                                                             
[    2.223436] ---[ end trace 97c6934e1fd8503a ]---                                                                                                                                 
[    2.228520] mvebu-comphy d0018300.phy: RELYING ON BOTLOADER SETTINGS                                                                                                             
[    2.235043] mvebu-comphy d0018300.phy: firmware updated needed                                                                                                                   
[    2.241036] ------------[ cut here ]------------                                                                                                                                 
[    2.245749] WARNING: CPU: 0 PID: 1 at drivers/phy/marvell/phy-mvebu-cp110-comphy.c:536 mvebu_comphy_probe+0x2f8/0x330                                                            
[    2.256665] Modules linked in:                                                                                                                                                   
[    2.259806] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.14.207-10.3.9.0-2 #1                                                                                       
[    2.268313] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                                                      
[    2.275482] task: ffff80002e9b8000 task.stack: ffff80002e9b4000                                                                                                                  
[    2.281576] pc : mvebu_comphy_probe+0x2f8/0x330                                                                                                                                  
[    2.286235] lr : mvebu_comphy_probe+0x2f8/0x330                                                                                                                                  
[    2.290892] sp : ffff80002e9b7be0 pstate : 40000045                                                                                                                              
[    2.295910] x29: ffff80002e9b7be0 x28: ffff80002e005400                                                                                                                          
[    2.301375] x27: ffff80002e0b9180 x26: ffff000008ecf8a8                                                                                                                          
[    2.306841] x25: 00000000014080c0 x24: ffff000008ecfbb0                                                                                                                          
[    2.312306] x23: ffff80002e0b9680 x22: ffff80002eb51810                                                                                                                          
[    2.317772] x21: ffff80002eb51800 x20: ffff0000091665a8                                                                                                                          
[    2.323238] x19: ffff80002ffeb580 x18: 0000000000000010                                                                                                                          
[    2.328703] x17: 0000000000000004 x16: 0000000000000000                                                                                                                          
[    2.334169] x15: ffffffffffffffff x14: ffff000089454587                                                                                                                          
[    2.339634] x13: ffff000009454595 x12: ffff000009389000                                                                                                                          
[    2.345100] x11: 0000000005f5e0ff x10: ffff80002e9b7910                                                                                                                          
[    2.350566] x9 : ffff000008726f40 x8 : 000000000000000d                                                                                                                          
[    2.356031] x7 : 776d726966203a79 x6 : 00000000000000e5                                                                                                                          
[    2.361497] x5 : 0000000000000000 x4 : 0000000000000000                                                                                                                          
[    2.366962] x3 : ffffffffffffffff x2 : ffff0000093897e0                                                                                                                          
[    2.372428] x1 : ffff80002e9b8000 x0 : 0000000000000032                                                                                                                          
[    2.377894] Call trace:                                                                                                                                                          
[    2.380405]  mvebu_comphy_probe+0x2f8/0x330                                                                                                                                      
[    2.384708]  platform_drv_probe+0x58/0xc0                                                                                                                                        
[    2.388827]  driver_probe_device+0x248/0x2e0                                                                                                                                     
[    2.393216]  __driver_attach+0xbc/0xc0                                                                                                                                           
[    2.397073]  bus_for_each_dev+0x4c/0xa0                                                                                                                                          
[    2.401012]  driver_attach+0x20/0x30                                                                                                                                             
[    2.404685]  bus_add_driver+0x1b0/0x220                                                                                                                                          
[    2.408628]  driver_register+0x60/0x100                                                                                                                                          
[    2.412570]  __platform_driver_register+0x40/0x50                                                                                                                                
[    2.417412]  mvebu_comphy_driver_init+0x18/0x20                                                                                                                                  
[    2.422070]  do_one_initcall+0x38/0x130                                                                                                                                          
[    2.426013]  kernel_init_freeable+0x184/0x220                                                                                                                                    
[    2.430492]  kernel_init+0x10/0x110                                                                                                                                              
[    2.434075]  ret_from_fork+0x10/0x24                                                                                                                                             
[    2.437747] ---[ end trace 97c6934e1fd8503b ]---                                                                                                                                 
[    2.442820] mvebu-comphy d0018300.phy: RELYING ON BOTLOADER SETTINGS                                                                                                             
[    2.449319] mvebu-comphy d0018300.phy: firmware updated needed                                                                                                                   
[    2.455364] ------------[ cut here ]------------                                                                                                                                 
[    2.460047] WARNING: CPU: 0 PID: 1 at drivers/phy/marvell/phy-mvebu-cp110-comphy.c:536 mvebu_comphy_probe+0x2f8/0x330                                                            
[    2.470964] Modules linked in:                                                                                                                                                   
[    2.474104] CPU: 0 PID: 1 Comm: swapper/0 Tainted: G        W       4.14.207-10.3.9.0-2 #1                                                                                       
[    2.482612] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                                                      
[    2.489781] task: ffff80002e9b8000 task.stack: ffff80002e9b4000                                                                                                                  
[    2.495875] pc : mvebu_comphy_probe+0x2f8/0x330                                                                                                                                  
[    2.500534] lr : mvebu_comphy_probe+0x2f8/0x330                                                                                                                                  
[    2.505191] sp : ffff80002e9b7be0 pstate : 60000045                                                                                                                              
[    2.510208] x29: ffff80002e9b7be0 x28: ffff80002e005000                                                                                                                          
[    2.515674] x27: ffff80002dcb4380 x26: ffff000008ecf8a8                                                                                                                          
[    2.521139] x25: 00000000014080c0 x24: ffff000008ecfbb0                                                                                                                          
[    2.526605] x23: ffff80002e0b9680 x22: ffff80002eb51810                                                                                                                          
[    2.532071] x21: ffff80002eb51800 x20: ffff0000091665a8                                                                                                                          
[    2.537536] x19: ffff80002ffeb800 x18: 0000000000000010                                                                                                                          
[    2.543002] x17: 0000000000000003 x16: 0000000000000000                                                                                                                          
[    2.548467] x15: ffffffffffffffff x14: 0000000000000000                                                                                                                          
[    2.553933] x13: 0000000000000000 x12: 0000000091e01c30                                                                                                                          
[    2.559398] x11: 0000000000000000 x10: 00000000000009f0                                                                                                                          
[    2.564864] x9 : ffff80002e9b7950 x8 : ffff80002e9b8a50                                                                                                                          
[    2.570330] x7 : 0000000000000400 x6 : 00000000000002b0                                                                                                                          
[    2.575795] x5 : 0000000000000002 x4 : 0000000000000001                                                                                                                          
[    2.581261] x3 : fffffffffffffffe x2 : ffff80002e9b7910                                                                                                                          
[    2.586726] x1 : ffff000009468ae0 x0 : 0000000000000032                                                                                                                          
[    2.592193] Call trace:                                                                                                                                                          
[    2.594704]  mvebu_comphy_probe+0x2f8/0x330                                                                                                                                      
[    2.599007]  platform_drv_probe+0x58/0xc0                                                                                                                                        
[    2.603126]  driver_probe_device+0x248/0x2e0                                                                                                                                     
[    2.607515]  __driver_attach+0xbc/0xc0                                                                                                                                           
[    2.611372]  bus_for_each_dev+0x4c/0xa0                                                                                                                                          
[    2.615310]  driver_attach+0x20/0x30                                                                                                                                             
[    2.618984]  bus_add_driver+0x1b0/0x220                                                                                                                                          
[    2.622926]  driver_register+0x60/0x100                                                                                                                                          
[    2.626869]  __platform_driver_register+0x40/0x50                                                                                                                                
[    2.631711]  mvebu_comphy_driver_init+0x18/0x20                                                                                                                                  
[    2.636369]  do_one_initcall+0x38/0x130                                                                                                                                          
[    2.640311]  kernel_init_freeable+0x184/0x220                                                                                                                                    
[    2.644792]  kernel_init+0x10/0x110                                                                                                                                              
[    2.648374]  ret_from_fork+0x10/0x24                                                                                                                                             
[    2.652046] ---[ end trace 97c6934e1fd8503c ]---                                                                                                                                 

PANICs l ooks like to be from something else:

[ 2.460047] WARNING: CPU: 0 PID: 1 at drivers/phy/marvell/phy-mvebu-cp110-comphy.c:536 mvebu_comphy_probe+0x2f8/0x330

more information with the working kernel from SDK10

root@OpenWrt:~# lscpu                                                                                                                                                               
Architecture:                    aarch64                                                                                                                                            
CPU op-mode(s):                  32-bit, 64-bit                                                                                                                                     
Byte Order:                      Little Endian                                                                                                                                      
CPU(s):                          2                                                                                                                                                  
On-line CPU(s) list:             0,1                                                                                                                                                
Thread(s) per core:              1                                                                                                                                                  
Core(s) per socket:              2                                                                                                                                                  
Socket(s):                       1                                                                                                                                                  
NUMA node(s):                    1                                                                                                                                                  
Vendor ID:                       ARM                                                                                                                                                
Model:                           4                                                                                                                                                  
Model name:                      Cortex-A53                                                                                                                                         
Stepping:                        r0p4                                                                                                                                               
CPU max MHz:                     1200.0000                                                                                                                                          
CPU min MHz:                     200.0000                                                                                                                                           
BogoMIPS:                        25.00                                                                                                                                              
NUMA node0 CPU(s):               0,1                                                                                                                                                
Vulnerability Itlb multihit:     Not affected                                                                                                                                       
Vulnerability L1tf:              Not affected                                                                                                                                       
Vulnerability Mds:               Not affected                                                                                                                                       
Vulnerability Meltdown:          Not affected                                                                                                                                       
Vulnerability Spec store bypass: Not affected                                                                                                                                       
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization                                                                                                            
Vulnerability Spectre v2:        Not affected                                                                                                                                       
Vulnerability Srbds:             Not affected                                                                                                                                       
Vulnerability Tsx async abort:   Not affected                                                                                                                                       
Flags:                           fp asimd aes pmull sha1 sha2 crc32 cpuid                                                                                                           

@erdoukki
Copy link
Author

erdoukki commented Jan 12, 2022

@robimarko

@erdoukki Any news? I still dont have access to SDK10 at all.

Sorry for the delay...

@erdoukki
Copy link
Author

erdoukki commented Jan 12, 2022

root@OpenWrt:/# cat /sys/kernel/debug/clk/cpu/clk_rate                                                                                                                              
1200000000                                                                                                                                                                          
root@OpenWrt:/# stress-ng --matrix 0 -t 10m
stress-ng: info:  [3227] dispatching hogs: 2 matrix
stress-ng: info:  [3227] successful run completed in 600.00s (10 mins, 0.00 secs)
root@OpenWrt:/# 
root@OpenWrt:/# stress-ng --matrix 0 -t 60m
stress-ng: info:  [3238] dispatching hogs: 2 matrix
stress-ng: info:  [3238] successful run completed in 3600.00s (1 hour, 0.00 secs)
root@OpenWrt:/# 

still no issue;

root@OpenWrt:/# stress --cpu 2 --io 2 --timeout 1h                                                                                                                                  
stress: info: [3297] dispatching hogs: 2 cpu, 2 io, 0 vm, 0 hdd                                                                                   
stress: info: [3297] successful run completed in 3600s                                                                                                                              
root@OpenWrt:/#                                                                                                                                                                     

@erdoukki
Copy link
Author

erdoukki commented Jan 12, 2022

TIM-1.0                                                                                                                                                                  
WTMI-devel-18.12.1-67f01b7                                                                                                                                                          
WTMI: system early-init                                                                                                                                                             
SVC REV: 5, CPU VDD voltage: 1.225V                                                                                                                                                 
NOTICE:  Booting Trusted Firmware                                                                                                                                                   
NOTICE:  BL1: v1.5(release):711ecd32 (Marvell-armada-18.09.4)                                                                                                                       
NOTICE:  BL1: Built : 15:20:15, Sep 18 2019                                                                                                                                         
NOTICE:  BL1: Booting BL2                                                                                                                                                           
NOTICE:  BL2: v1.5(release):711ecd32 (Marvell-armada-18.09.4)                                                                                                                       
NOTICE:  BL2: Built : 15:20:18, Sep 18 2019                                                                                                                                         
NOTICE:  BL1: Booting BL31                                                                                                                                                          
NOTICE:  BL31: v1.5(release):711ecd32 (Marvell-armada-18.09.4)                                                                                                                      
NOTICE:  BL31: Built : 15                                                                                                                                                           
                                                                                                                                                                                    
U-Boot 2017.03-armada-18.09.1-g51aa6c4772 (Sep 18 2019 - 15:19:13 +0800)                                                                                                            
                                                                                                                                                                                    
Model: gti cellular cpe board                                                                                                                                                       
       CPU     1200 [MHz]                                                                                                                                                           
       L2      1200 [MHz]                                                                                                                                                           
       NB AXI  300 [MHz]                                                                                                                                                            
       SB AXI  250 [MHz]                                                                                                                                                            
       DDR     750 [MHz]                                                                                                                                                            
DRAM:  1 GiB                                                                                                                                                                        
U-Boot DT blob at : 000000003f716f38
Marvell>> md 0xd0011500                                                                                                                                                             
d0011500: 5a69ffff 02000257 00008000 800001e1    ..iZW...........                                                                                                                   
Marvell>> usb reset                                                                                                                                                                 
resetting USB...                                                                                                                                                                    
USB0:   Register 2000104 NbrPorts 2                                                                                                                                                 
Starting the controller                                                                                                                                                             
USB XHCI 1.00                                                                                                                                                                       
USB1:   USB EHCI 1.00                                                                                                                                                               
scanning bus 0 for devices... 1 USB Device(s) found                                                                                                                                 
scanning bus 1 for devices... 3 USB Device(s) found                                                                                                                                 
       scanning usb for storage devices... 1 Storage Device(s) found                                                                                                                
Marvell>> load usb 0 $kernel_addr_r ULTRA-SDK10/Image                                                                                                                               
20722176 bytes read in 649 ms (30.4 MiB/s)                                                                                                                                          
Marvell>> ext4load mmc 0:1 $fdt_addr_r $fdt_name                                                                                                                                    
12224 bytes read in 10 ms (1.2 MiB/s)                                                                                                                                               
Marvell>> setenv bootargs $console root=/dev/mmcblk0p2 rw rootwait net.ifnames=0 biosdevname=0  $extra_params usb-storage.quirks=$usbstoragequirks                                  
Marvell>> booti $kernel_addr_r - $fdt_addr_r                                                                                                                                        
## Flattened Device Tree blob at 06f00000                                                                                                                                           
   Booting using the fdt blob at 0x6f00000                                                                                                                                          
   Using Device Tree in place at 0000000006f00000, end 0000000006f05fbf                                                                                                             
                                                                                                                                                                                    
Starting kernel ...                                                                                                                                                                 
                                                                                                                                                                                    
[    0.000000] Booting Linux on physical CPU 0x0                                                                                                                                    
[    0.000000] Linux version 4.14.207-10.3.9.0-2 (gke@PRECISION) (gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 265.0)) #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022   
[    0.000000] Boot CPU: AArch64 Processor [410fd034]                                                                                                                               
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Ultra Board                                                                                                           
root@OpenWrt:~/mhz# ./mhz                                                                                                                                                           
count=516515 us50=21525 us250=107625 diff=86100 cpu_MHz=1199.803                                                                                                                    

@erdoukki
Copy link
Author

QUICK-CRASH with KERNEL-SDK10 & 0x58e3ffff

SVC REV: 5, CPU VDD voltage: 1.225V                                                                                                                                                 
Marvell>> md 0xd0011500                                                                                                                                                             
d0011500: 5a69ffff 02000257 00008000 800001e1    ..iZW...........                                                                                                                   
Marvell>> mw 0xd0011500 0x58e3ffff                                                                                                                                                  
Marvell>> md 0xd0011500                                                                                                                                                             
d0011500: 58e3ffff 02000257 00008000 800001e1    ...XW...........                                                                                                                   
Marvell>> usb reset                                                                                                                                                                 
resetting USB...                                                                                                                                                                    
USB0:   Register 2000104 NbrPorts 2                                                                                                                                                 
Starting the controller                                                                                                                                                             
USB XHCI 1.00                                                                                                                                                                       
USB1:   USB EHCI 1.00                                                                                                                                                               
scanning bus 0 for devices... 1 USB Device(s) found                                                                                                                                 
scanning bus 1 for devices... 3 USB Device(s) found                                                                                                                                 
       scanning usb for storage devices... 1 Storage Device(s) found                                                                                                                
Marvell>> load usb 0 $kernel_addr_r ULTRA-SDK10/Image                                                                                                                               
20722176 bytes read in 648 ms (30.5 MiB/s)                                                                                                                                          
Marvell>> ext4load mmc 0:1 $fdt_addr_r $fdt_name                                                                                                                                    
12224 bytes read in 10 ms (1.2 MiB/s)                                                                                                                                               
Marvell>> setenv bootargs $console root=/dev/mmcblk0p2 rw rootwait net.ifnames=0 biosdevname=0  $extra_params usb-storage.quirks=$usbstoragequirks                                  
Marvell>> booti $kernel_addr_r - $fdt_addr_r                                                                                                                                        
## Flattened Device Tree blob at 06f00000                                                                                                                                           
   Booting using the fdt blob at 0x6f00000                                                                                                                                          
   Using Device Tree in place at 0000000006f00000, end 0000000006f05fbf                                                                                                             
                                                                                                                                                                                    
Starting kernel ...                                                                                                                                                                 
                                                                                                                                                                                    
[    0.000000] Booting Linux on physical CPU 0x0                                                                                                                                    
[    0.000000] Linux version 4.14.207-10.3.9.0-2 (gke@PRECISION) (gcc version 7.3.0 (Marvell Inc. Version: Marvell GCC7 build 265.0)) #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022   
[    0.000000] Boot CPU: AArch64 Processor [410fd034]                                                                                                                               
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Ultra Board                                                                                                           
[    0.000000] earlycon: ar3700_uart0 at MMIO 0x00000000d0012000 (options '')                                                                                                       
[    0.000000] bootconsole [ar3700_uart0] enabled                                                                                                                                   
[    0.000000] efi: Getting EFI parameters from FDT:                                                                                                                                
[    0.000000] efi: UEFI not found.                                                                                                                                                 
[    0.000000] cma: Reserved 256 MiB at 0x0000000030000000                                                                                                                          
[    0.000000] NUMA: No NUMA configuration found                                                                                                                                    
[    0.000000] NUMA: Faking a node at [mem 0x0000000000000000-0x000000003fffffff]                                                                                                   
[    0.000000] NUMA: NODE_DATA [mem 0x2ffe2000-0x2ffe3aff]                                                                                                                          
[    0.000000] Zone ranges:                                                                                                                                                         
[    0.000000]   DMA      [mem 0x0000000000000000-0x000000003fffffff]                                                                                                               
[    0.000000]   Normal   empty                                                                                                                                                     
[    0.000000] Movable zone start for each node                                                                                                                                     
[    0.000000] Early memory node ranges                                                                                                                                             
[    0.000000]   node   0: [mem 0x0000000000000000-0x0000000003ffffff]                                                                                                              
[    0.000000]   node   0: [mem 0x0000000004200000-0x000000003fffffff]                                                                                                              
[    0.000000] Initmem setup node 0 [mem 0x0000000000000000-0x000000003fffffff]                                                                                                     
[    0.000000] psci: probing for conduit method from DT.                                                                                                                            
[    0.000000] psci: PSCIv1.1 detected in firmware.                                                                                                                                 
[    0.000000] psci: Using standard PSCI v0.2 function IDs                                                                                                                          
[    0.000000] psci: MIGRATE_INFO_TYPE not supported.                                                                                                                               
[    0.000000] psci: SMC Calling Convention v1.1                                                                                                                                    
[    0.000000] percpu: Embedded 26 pages/cpu s67992 r8192 d30312 u106496                                                                                                            
[    0.000000] Detected VIPT I-cache on CPU0                                                                                                                                        
[    0.000000] CPU features: enabling workaround for ARM erratum 845719                                                                                                             
[    0.000000] Speculative Store Bypass Disable mitigation not required                                                                                                             
[    0.000000] CPU features: kernel page table isolation disabled by kernel configuration                                                                                           
[    0.000000] Built 1 zonelists, mobility grouping on.  Total 4kB pages: 257536                                                                                                    
[    0.000000] Policy zone: DMA                                                                                                                                                     
[    0.000000] Kernel command line: console=ttyMV0,115200 earlycon=ar3700_uart,0xd0012000 root=/dev/mmcblk0p2 rw rootwait net.ifnames=0 biosdevname=0 pci=pcie_bus_safe usb-storage.
quirks=                                                                                                                                                                             
[    0.000000] Bad mode in Error handler detected on CPU0, code 0xbf000001 -- SError                                                                                                
[    0.000000] Kernel panic - not syncing: bad mode                                                                                                                                 
[    0.000000] CPU: 0 PID: 0 Comm: swapper Not tainted 4.14.207-10.3.9.0-2 #1                                                                                                       
[    0.000000] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                                                      
[    0.000000] Call trace:                                                                                                                                                          
[    0.000000]  dump_backtrace+0x0/0x130                                                                                                                                            
[    0.000000]  show_stack+0x14/0x20                                                                                                                                                
[    0.000000]  dump_stack+0x9c/0xd8                                                                                                                                                
[    0.000000]  panic+0x120/0x298                                                                                                                                                   
[    0.000000]  bad_mode+0x68/0x70                                                                                                                                                  
[    0.000000]  el1_error_invalid+0x7c/0xa0                                                                                                                                         
[    0.000000]  memblock_find_in_range_node+0x68/0x280                                                                                                                              
[    0.000000]  memblock_virt_alloc_internal+0xe0/0x1a4                                                                                                                             
[    0.000000]  memblock_virt_alloc_try_nid_nopanic+0x78/0x8c                                                                                                                       
[    0.000000]  alloc_large_system_hash+0x174/0x27c                                                                                                                                 
[    0.000000]  pidhash_init+0x40/0x54                                                                                                                                              
[    0.000000]  start_kernel+0x1e0/0x3b0                                                                                                                                            
[    0.000000] Rebooting in 1 seconds..                                                                                                                                             

@erdoukki
Copy link
Author

erdoukki commented Jan 12, 2022

NOCRASH with KERNEL-SDK10 & 0x78e3ffff

SVC REV: 5, CPU VDD voltage: 1.225V                                                                                                                                                 
Marvell>> md 0xd0011500                                                                                                                                                             
d0011500: 5a69ffff 02000257 00008000 800001e1    ..iZW...........                                                                                                                   
Marvell>> mw 0xd0011500 0x78e3ffff                                                                                                                                                  
Marvell>> md 0xd0011500                                                                                                                                                             
d0011500: 78e3ffff 02000257 00008000 800001e1    ...xW...........                                                                                                                   
root@OpenWrt:/# /root/mhz/mhz                                                                                                                                                       
count=516515 us50=21521 us250=107610 diff=86089 cpu_MHz=1199.956                                                                                                                    

stress... (EDIT: OK no more crash on this one with kernel from SDK10)

@erdoukki
Copy link
Author

ANOTHER ESPRESSObin-ULTRA (which crash before the boot process end up !)

SIMPLY WORKS FINE (BOOT: OK - STRESSTEST: WIP / TBD) with the SDK10 kernel !

TIM-1.0                                                                                                                                               
mv_ddr-devel-gefcad0e2 DDR4 16b 1GB 1CS                                                                                                               
WTMI-devel-18.12.1-97f01f5f                                                                                                                           
WTMI: system early-init                                                                                                                               
SVC REV: 5, CPU VDD voltage: 1.237V                                                                                                                   
Setting clocks: CPU 1200 MHz, DDR 750 MHz                                                                                                             
CZ.NIC's Armada 3720 Secure Firmware v2021.09.07 (Oct 12 2021 13:42:17)                                                                               
Running on ESPRESSObin Ultra                                                                                                                          
NOTICE:  Booting Trusted Firmware                                                                                                                     
NOTICE:  BL1: v2.5(release):OpenWrt v2.5-12 (espressobin-ultra)                                                                                       
NOTICE:  BL1: Built : 13:42:17, Oct 12 2021                                                                                                           
NOTICE:  BL1: Booting BL2                                                                                                                             
NOTICE:  BL2: v2.5(release):OpenWrt v2.5-12 (espressobin-ultra)                                                                                       
NOTICE:  BL2: Built : 13:42:17, Oct 12 2021                                                                                                           
NOTICE:  BL1: Booting BL31                                                                                                                            
NOTICE:  BL31: v2.5(release):OpenWrt v2.5-12 (espressobin-ultra)                                                                                      
NOTICE:  BL31: Built : 13:42:17, Oct 12 2021                                                                                                          
                                                                                                                                                      
                                                                                                                                                      
U-Boot 2021.10 (Oct 12 2021 - 13:42:17 +0000)                                                                                                         
                                                                                                                                                      
DRAM:  1 GiB                                                                                                                                          
WDT:   Not starting                                                                                                                                   
Comphy chip #0:                                                                                                                                       
Comphy-0: USB3_HOST0    5 Gbps                                                                                                                        
Comphy-1: PEX0          2.5 Gbps                                                                                                                      
Comphy-2: SATA0         5 Gbps                                                                                                                        
Target spinup took 0 ms.                                                                                                                              
AHCI 0001.0300 32 slots 1 ports 6 Gbps 0x1 impl SATA mode                                                                                             
flags: ncq led only pmp fbss pio slum part sxs                                                                                                        
PCIE-0: Link up                                                                                                                                       
MMC:   sdhci@d8000: 0                                                                                                                                 
Loading Environment from SPIFlash... SF: Detected mx25u3235f with page size 256 Bytes, erase size 4 KiB, total 4 MiB                                  
OK                                                                                                                                                    
Successfully imported the Marvell hw_info parameters.                                                                                                 
Model: Globalscale Marvell ESPRESSOBin Ultra Board                                                                                                    
Net:   eth0: neta@30000 [PRIME]                                                                                                                       
Autoboot in 2 seconds, to stop use 's' key                                                                                                            
=>                                                                                                                                                    


=> version                                                                                                                                            
U-Boot 2021.10 (Oct 12 2021 - 13:42:17 +0000)                                                                                                         
                                                                                                                                                      
aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0 r17742+7-977bf5e980) 11.2.0                                                                        
GNU ld (GNU Binutils) 2.37                                                                                                                            


=> md 0xd0011500
d0011500: 5aaaffff 02000257 00008000 800001e1  ...ZW...........

=> boot                                                                                                                                               
switch to partitions #0, OK                                                                                                                           
mmc0(part 0) is current device                                                                                                                        
11911176 bytes read in 263 ms (43.2 MiB/s)                                                                                                            
12158 bytes read in 10 ms (1.2 MiB/s)                                                                                                                 
## Flattened Device Tree blob at 06f00000                                                                                                             
   Booting using the fdt blob at 0x6f00000                                                                                                            
   Using Device Tree in place at 0000000006f00000, end 0000000006f05f7d                                                                               
                                                                                                                                                      
Starting kernel ...                                                                                                                                   
                                                                                                                                                      
[    0.000000] Booting Linux on physical CPU 0x0000000000 [0x410fd034]                                                                                
[    0.000000] Linux version 5.10.72 (builder@buildhost) (aarch64-openwrt-linux-musl-gcc (OpenWrt GCC 11.2.0 r17742+7-977bf5e980) 11.2.0, GNU ld (GNU1
[    0.000000] Machine model: Globalscale Marvell ESPRESSOBin Ultra Board                                                                             

Tests with SDK10 kernel:

usb reset
load usb 0 $kernel_addr_r ULTRA-SDK10/Image
ext4load mmc 0:1 $fdt_addr_r $fdt_name
setenv bootargs $console root=/dev/mmcblk0p2 rw rootwait net.ifnames=0 biosdevname=0  $extra_params usb-storage.quirks=$usbstoragequirks
booti $kernel_addr_r - $fdt_addr_r
root@OpenWrt:/# cat /etc/os-release                                                                                                                   
NAME="OpenWrt"                                                                                                                                        
VERSION="SNAPSHOT"                                                                                                                                    
ID="openwrt"                                                                                                                                          
ID_LIKE="lede openwrt"                                                                                                                                
PRETTY_NAME="OpenWrt SNAPSHOT"                                                                                                                        
VERSION_ID="snapshot"                                                                                                                                 
HOME_URL="https://openwrt.org/"                                                                                                                       
BUG_URL="https://bugs.openwrt.org/"                                                                                                                   
SUPPORT_URL="https://forum.openwrt.org/"                                                                                                              
BUILD_ID="r17729+1-b5893a4128"                                                                                                                        
OPENWRT_BOARD="mvebu/cortexa53"                                                                                                                       
OPENWRT_ARCH="aarch64_cortex-a53"                                                                                                                     
OPENWRT_TAINTS="busybox"                                                                                                                              
OPENWRT_DEVICE_MANUFACTURER="OpenWrt"                                                                                                                 
OPENWRT_DEVICE_MANUFACTURER_URL="https://openwrt.org/"                                                                                                
OPENWRT_DEVICE_PRODUCT="Generic"                                                                                                                      
OPENWRT_DEVICE_REVISION="v0"                                                                                                                          
OPENWRT_RELEASE="OpenWrt SNAPSHOT r17729+1-b5893a4128"                                                                                                
root@OpenWrt:/# uname -ar                                                                                                                             
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux                                                       
root@OpenWrt:/# cat /sys/kernel/debug/clk/cpu/clk_rate                                                                                                
1200000000                                                                                                                                            
root@OpenWrt:/# dmesg | grep CPU                                                                                                                      
[    0.000000] Booting Linux on physical CPU 0x0                                                                                                      
[    0.000000] Boot CPU: AArch64 Processor [410fd034]                                                                                                 
[    0.000000] Detected VIPT I-cache on CPU0                                                                                                          
[    0.000000] CPU features: enabling workaround for ARM erratum 845719                                                                               
[    0.000000] CPU features: kernel page table isolation disabled by kernel configuration                                                             
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1                                                                             
[    0.000000]  RCU restricting CPUs from NR_CPUS=96 to nr_cpu_ids=2.                                                                                 
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000d1d40000                                                                         
[    0.000000] NO_HZ: Full dynticks CPUs: 1.                                                                                                          
[    0.000000]  Note: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.                                                                        
[    0.000000]  Offload RCU callbacks from CPUs: 1.                                                                                                   
[    0.116784] smp: Bringing up secondary CPUs ...                                                                                                    
[    0.149643] Detected VIPT I-cache on CPU1                                                                                                          
[    0.149669] GICv3: CPU1: found redistributor 1 region 0:0x00000000d1d60000                                                                         
[    0.149702] CPU1: Booted secondary processor [410fd034]                                                                                            
[    0.149817] smp: Brought up 1 node, 2 CPUs                                                                                                         
[    0.175373] CPU features: detected: GIC system register CPU interface                                                                              
[    0.182013] CPU features: detected: 32-bit EL0 Support                                                                                             
[    0.187443] CPU: All CPU(s) started at EL2                                                                                                         
[    1.890362] kvm [1]: GIC system register CPU interface enabled                                                                                     
[    2.479445] cacheinfo: Unable to detect cache hierarchy for CPU 0                                                                                  

root@OpenWrt:/# lscpu                                                                                                                                 
Architecture:            aarch64                                                                                                                      
  CPU op-mode(s):        32-bit, 64-bit                                                                                                               
  Byte Order:            Little Endian                                                                                                                
CPU(s):                  2                                                                                                                            
  On-line CPU(s) list:   0,1                                                                                                                          
Vendor ID:               ARM                                                                                                                          
  Model name:            Cortex-A53                                                                                                                   
    Model:               4                                                                                                                            
    Thread(s) per core:  1                                                                                                                            
    Core(s) per cluster: 2                                                                                                                            
    Socket(s):           -                                                                                                                            
    Cluster(s):          1                                                                                                                            
    Stepping:            r0p4                                                                                                                         
    CPU max MHz:         1200.0000                                                                                                                    
    CPU min MHz:         200.0000                                                                                                                     
    BogoMIPS:            25.00                                                                                                                        
    Flags:               fp asimd aes pmull sha1 sha2 crc32 cpuid                                                                                     
NUMA:                                                                                                                                                 
  NUMA node(s):          1                                                                                                                            
  NUMA node0 CPU(s):     0,1                                                                                                                          
Vulnerabilities:                                                                                                                                      
  Itlb multihit:         Not affected                                                                                                                 
  L1tf:                  Not affected                                                                                                                 
  Mds:                   Not affected                                                                                                                 
  Meltdown:              Not affected                                                                                                                 
  Spec store bypass:     Not affected                                                                                                                 
  Spectre v1:            Mitigation; __user pointer sanitization                                                                                      
  Spectre v2:            Not affected                                                                                                                 
  Srbds:                 Not affected                                                                                                                 
  Tsx async abort:       Not affected                                                                                                                 

@erdoukki
Copy link
Author

erdoukki commented Jan 13, 2022

I have reflashed the same ULTRA from snapshot to 21.02.1
I also reflash the default GST UBOOT:
cellular-cpe-bootloader-cpu-1200-ddr4-1cs-1g-atf-95ac2fcd-uboot-g057aa3fce1-utils-d5b360a-20200616-rel.bin

 TIM-1.0                                                                                                                                              
WTMI-devel-18.12.0-d5b360a                                                                                                                            
WTMI: system early-init                                                                                                                               
SVC REV: 5, CPU VDD voltage: 1.237V                                                                                                                   
NOTICE:  Booting Trusted Firmware                                                                                                                     
NOTICE:  BL1: v1.5(release):95ac2fcd (Marvell-devel-18.12.2)                                                                                          
NOTICE:  BL1: Built : 15:37:17, Jun 16 2020                                                                                                           
NOTICE:  BL1: Booting BL2                                                                                                                             
NOTICE:  BL2: v1.5(release):95ac2fcd (Marvell-devel-18.12.2)                                                                                          
NOTICE:  BL2: Built : 15:37:19, Jun 16 2020                                                                                                           
NOTICE:  BL1: Booting BL31                                                                                                                            
NOTICE:  BL31: v1.5(release):95ac2fcd (Marvell-devel-18.12.2)                                                                                         
NOTICE:  BL31: Built : 15                                                                                                                             
                                                                                                                                                      
U-Boot 2018.03-devel-18.12.3-g057aa3fce1 (Jun 16 2020 - 15:35:51 +0800)                                                                               
                                                                                                                                                      
Model: gti cellular cpe board                                                                                                                         
       CPU     1200 [MHz]                                                                                                                             
       L2      800 [MHz]                                                                                                                              
       TClock  200 [MHz]                                                                                                                              
       DDR     750 [MHz]                                                                                                                              
DRAM:  1 GiB                                                                                                                                          
SF: Detected mx25u3235f with page size 256 Bytes, erase size 64 KiB, total 4 MiB                                                                      
Comphy chip #0:                                                                                                                                       
Comphy-0: USB3_HOST0                                                                                                                                  
Comphy-1: PEX0          2.5 Gbps                                                                                                                      
Comphy-2: SATA0                                                                                                                                       
Target spinup took 0 ms.                                                                                                                              
AHCI 0001.0300 32 slots 1 ports 6 Gbps 0x1 impl SATA mode                                                                                             
flags: ncq led only pmp fbss pio slum part sxs                                                                                                        
PCIE-0: Link up                                                                                                                                       
MMC:   sdhci@d8000: 0                                                                                                                                 
Loading Environment from SPI Flash... OK                                                                                                              
Model: gti cellular cpe board                                                                                                                         
Marvell>> md 0xd0011500                                                                                                                               
d0011500: 5aaaffff 02000257 00008000 800001e1    ...ZW...........                                                                                     
root@OpenWrt:~/mhz# ./mhz                                                                                                                             
count=516515 us50=21567 us250=107773 diff=86206 cpu_MHz=1198.327                                                                                      
root@OpenWrt:~/mhz# uname -ar                                                                                                                         
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux                                                       

root@OpenWrt:~/mhz# dmesg | grep CPU                                                                                                                  
[    0.000000] Booting Linux on physical CPU 0x0                                                                                                      
[    0.000000] Boot CPU: AArch64 Processor [410fd034]                                                                                                 
[    0.000000] Detected VIPT I-cache on CPU0                                                                                                          
[    0.000000] CPU features: enabling workaround for ARM erratum 845719                                                                               
[    0.000000] CPU features: kernel page table isolation disabled by kernel configuration                                                             
[    0.000000] SLUB: HWalign=64, Order=0-3, MinObjects=0, CPUs=2, Nodes=1                                                                             
[    0.000000]  RCU restricting CPUs from NR_CPUS=96 to nr_cpu_ids=2.                                                                                 
[    0.000000] GICv3: CPU0: found redistributor 0 region 0:0x00000000d1d40000                                                                         
[    0.000000] NO_HZ: Full dynticks CPUs: 1.                                                                                                          
[    0.000000]  Note: kernel parameter 'rcu_nocbs=' contains nonexistent CPUs.                                                                        
[    0.000000]  Offload RCU callbacks from CPUs: 1.                                                                                                   
[    0.117027] smp: Bringing up secondary CPUs ...                                                                                                    
[    0.149918] Detected VIPT I-cache on CPU1                                                                                                          
[    0.149949] GICv3: CPU1: found redistributor 1 region 0:0x00000000d1d60000                                                                         
[    0.149986] CPU1: Booted secondary processor [410fd034]                                                                                            
[    0.150108] smp: Brought up 1 node, 2 CPUs                                                                                                         
[    0.175665] CPU features: detected: GIC system register CPU interface                                                                              
[    0.182305] CPU features: detected: 32-bit EL0 Support                                                                                             
[    0.187738] CPU: All CPU(s) started at EL2                                                                                                         
[    1.903391] kvm [1]: GIC system register CPU interface enabled                                                                                     
[    2.496191] cacheinfo: Unable to detect cache hierarchy for CPU 0                                                                                  

root@OpenWrt:~/mhz# lscpu                                                                                                                             
Architecture:                    aarch64                                                                                                              
CPU op-mode(s):                  32-bit, 64-bit                                                                                                       
Byte Order:                      Little Endian                                                                                                        
CPU(s):                          2                                                                                                                    
On-line CPU(s) list:             0,1                                                                                                                  
Thread(s) per core:              1                                                                                                                    
Core(s) per socket:              2                                                                                                                    
Socket(s):                       1                                                                                                                    
NUMA node(s):                    1                                                                                                                    
Vendor ID:                       ARM                                                                                                                  
Model:                           4                                                                                                                    
Model name:                      Cortex-A53                                                                                                           
Stepping:                        r0p4                                                                                                                 
CPU max MHz:                     1200.0000                                                                                                            
CPU min MHz:                     200.0000                                                                                                             
BogoMIPS:                        25.00                                                                                                                
NUMA node0 CPU(s):               0,1                                                                                                                  
Vulnerability Itlb multihit:     Not affected                                                                                                         
Vulnerability L1tf:              Not affected                                                                                                         
Vulnerability Mds:               Not affected                                                                                                         
Vulnerability Meltdown:          Not affected                                                                                                         
Vulnerability Spec store bypass: Not affected                                                                                                         
Vulnerability Spectre v1:        Mitigation; __user pointer sanitization                                                                              
Vulnerability Spectre v2:        Not affected                                                                                                         
Vulnerability Srbds:             Not affected                                                                                                         
Vulnerability Tsx async abort:   Not affected                                                                                                         
Flags:                           fp asimd aes pmull sha1 sha2 crc32 cpuid                                                                             

STRESS-NG: EDITED with results (CRASH)...

root@OpenWrt:~/mhz# stress-ng --matrix 0 -t 60m                                                                                                       
stress-ng: info:  [4673] dispatching hogs: 2 matrix                                                                                                   
[ 1744.541578] Unable to handle kernel paging request at virtual address 24c6324537b1cc                                                               
[ 1744.546778] Mem abort info:                                                                                                                        
[ 1744.549637]   Exception class = IABT (current EL), IL = 32 bits                                                                                    
[ 1744.561435]   SET = 0, FnV = 0                                                                                                                     
[ 1744.561722]   EA = 0, S1PTW = 0                                                                                                                    
[ 1744.564946] [0024c6324537b1cc] address between user and kernel address ranges                                                                      
[ 1744.577456] Internal error: Oops: 86000004 [#1] PREEMPT SMP                                                                                        
[ 1744.580347] Modules linked in:                                                                                                                     
[ 1744.583481] Process kworker/u4:2 (pid: 26, stack limit = 0xffff80002eb08000)                                                                       
[ 1744.590740] CPU: 0 PID: 26 Comm: kworker/u4:2 Not tainted 4.14.207-10.3.9.0-2 #1                                                                   
[ 1744.598352] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                        
[ 1744.605538] Workqueue: events_power_efficient phy_state_machine                                                                                    
[ 1744.611613] task: ffff80002ea6aa00 task.stack: ffff80002eb08000                                                                                    
[ 1744.617707] pc : 0x24c6324537b1cc                                                                                                                  
[ 1744.621110] lr : 0x4724c6324537b1cc                                                                                                                
[ 1744.624693] sp : ffff80002eb0baf0 pstate : 60000145                                                                                                
[ 1744.629711] x29: 472cc919449dcc93 x28: 0000000000000000                                                                                            
[ 1744.635176] x27: 0000000000000000 x26: ffff000009125b58                                                                                            
[ 1744.640642] x25: ffff0000080f8c50 x24: ffff80002d801890                                                                                            
[ 1744.646107] x23: ffff80002d801d40 x22: ffff80002d801800                                                                                            
[ 1744.651573] x21: 0000000000000003 x20: 46cefc1d466931c4                                                                                            
[ 1744.657038] x19: 46f715f84764252e x18: 0000000000000058                                                                                            
[ 1744.662504] x17: 0000000000000002 x16: 0000000000000000                                                                                            
[ 1744.667970] x15: 0000000000000080 x14: 0000000000000000                                                                                            
[ 1744.673435] x13: 0000000000000000 x12: 00000194b9b0d4d0                                                                                            
[ 1744.678901] x11: 0000000000000001 x10: 00000000000009f0                                                                                            
[ 1744.684367] x9 : ffff80002eb0b8f0 x8 : ffff80002ea6b450                                                                                            
[ 1744.689832] x7 : 00000000000005fe x6 : 0000000000000354                                                                                            
[ 1744.695298] x5 : 0000000000000002 x4 : 0000000000000001                                                                                            
[ 1744.700763] x3 : ffff80002ffb0900 x2 : ffff80002ffb0940                                                                                            
[ 1744.706229] x1 : 0000000000000000 x0 : 0000000000000000                                                                                            
[ 1744.711696] Call trace:                                                                                                                            
[ 1744.714206]  0x24c6324537b1cc                                                                                                                      
[ 1744.717258] Code: bad PC value                                                                                                                     
[ 1744.720392] ---[ end trace dbfd5f19739d29cd ]---                                                                                                   

@erdoukki
Copy link
Author

Same ULTRA with only 0x5a69 forced value before boot:

Marvell>> mw 0xd0011500 5a69ffff                                                                                                                      
Marvell>> md 0xd0011500                                                                                                                               
d0011500: 5a69ffff 02000257 00008000 800001e1    ..iZW...........                                                                                     
root@OpenWrt:/# uname -ar                                                                                                                             
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux                                                       
root@OpenWrt:/# stress-ng --matrix 0 -t 60m                                                                                                           
stress-ng: info:  [3232] dispatching hogs: 2 matrix                                                                                                   
[   38.408028] Internal error: undefined instruction: 0 [#1] PREEMPT SMP                                                                              
[   38.411819] Modules linked in:                                                                                                                     
[   38.414950] Process stress-ng (pid: 3233, stack limit = 0xffff80002c260000)                                                                        
[   38.422121] CPU: 0 PID: 3233 Comm: stress-ng Not tainted 4.14.207-10.3.9.0-2 #1                                                                    
[   38.429642] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                        
[   38.436810] task: ffff80002eb6c600 task.stack: ffff80002c260000                                                                                    
[   38.442915] pc : select_idle_sibling+0x80/0x270                                                                                                    
[   38.447563] lr : select_idle_sibling+0x24/0x270                                                                                                    
[   38.452222] sp : ffff80002ffb7620 pstate : 400001c5                                                                                                
[   38.457240] x29: ffff80002ffb7620 x28: 00000008ed050780                                                                                            
[   38.462704] x27: ffff80002e379c00 x26: ffff80002e379f30                                                                                            
[   38.468170] x25: ffff80002e9acb00 x24: ffff00000936abd8                                                                                            
[   38.473635] x23: ffff00000935a180 x22: ffff80002e379c00                                                                                            
[   38.479101] x21: 0000000000000000 x20: 0000000000000000                                                                                            
[   38.484566] x19: 0000000000000000 x18: 000000000000003a                                                                                            
[   38.490033] x17: 000000000000007f x16: 0000000000525090                                                                                            
[   38.495498] x15: 0000000000000080 x14: 0000000000000080                                                                                            
[   38.500964] x13: 0000ffffa1c3908c x12: 0000000000000023                                                                                            
[   38.506429] x11: 0000000000000020 x10: 0000000000000040                                                                                            
[   38.511895] x9 : 0000000000000400 x8 : ffff80002e400248                                                                                            
[   38.517360] x7 : ffff80002e400270 x6 : 0000000050000000                                                                                            
[   38.522826] x5 : 0000000000000000 x4 : 00000000001ed73b                                                                                            
[   38.528291] x3 : 0000800026c5e000 x2 : ffff80002e9acb00                                                                                            
[   38.533757] x1 : 00000000000001b4 x0 : 000000000005ceb4                                                                                            
[   38.539223] Call trace:                                                                                                                            
[   38.541735]  select_idle_sibling+0x80/0x270                                                                                                        
[   38.546034]  select_task_rq_fair+0x7d8/0x9c0                                                                                                       
[   38.550428]  try_to_wake_up+0xfc/0x380                                                                                                             
[   38.554278]  wake_up_process+0x14/0x20                                                                                                             
[   38.558133]  hrtimer_wakeup+0x1c/0x30                                                                                                              
[   38.561894]  __hrtimer_run_queues+0xe8/0x170                                                                                                       
[   38.566284]  hrtimer_interrupt+0xa8/0x240                                                                                                          
[   38.570410]  arch_timer_handler_phys+0x28/0x50                                                                                                     
[   38.574977]  handle_percpu_devid_irq+0x80/0x140                                                                                                    
[   38.579636]  generic_handle_irq+0x24/0x40                                                                                                          
[   38.583756]  __handle_domain_irq+0x60/0xc0                                                                                                         
[   38.587968]  gic_handle_irq+0x84/0x1c8                                                                                                             
[   38.591820]  el0_irq_naked+0x50/0x58                                                                                                               
[   38.595496] Code: f9403f21 f944b400 f275009f 91000421 (d349fc00)                                                                                   
[   38.601777] ---[ end trace 422a36949733eff9 ]---                                                                                                   
[   38.606516] Kernel panic - not syncing: Fatal exception in interrupt                                                                               
[   38.613056] SMP: stopping secondary CPUs                                                                                                           
[   38.617099] Kernel Offset: disabled                                                                                                                
[   38.620672] CPU features: 0x0,0000200c                                                                                                             
[   38.624523] Memory Limit: none                                                                                                                     
[   38.627661] Rebooting in 3 seconds..                                                                                                               

@erdoukki
Copy link
Author

erdoukki commented Jan 13, 2022

flashing my last (old but working) custom UBOOT on the same BUGGY ULTRA:

TIM-1.0                                                                                                                                               
mv_ddr-devel-gefcad0e2 DDR4 16b 1GB 1CS                                                                                                               
WTMI-devel-18.12.1-97f01f5f                                                                                                                           
WTMI: system early-init                                                                                                                               
SVC REV: 5, CPU VDD voltage: 1.237V                                                                                                                   
Setting clocks: CPU 1200 MHz, DDR 750 MHz                                                                                                             
CZ.NIC's Armada 3720 Secure Firmware v2021.09.07 (Oct 12 2021 13:42:17)                                                                               
Running on ESPRESSObin Ultra                                                                                                                          
NOTICE:  Booting Trusted Firmware                                                                                                                     
NOTICE:  BL1: v2.5(release):OpenWrt v2.5-12 (espressobin-ultra)                                                                                       
NOTICE:  BL1: Built : 13:42:17, Oct 12 2021                                                                                                           
NOTICE:  BL1: Booting BL2                                                                                                                             
NOTICE:  BL2: v2.5(release):OpenWrt v2.5-12 (espressobin-ultra)                                                                                       
NOTICE:  BL2: Built : 13:42:17, Oct 12 2021                                                                                                           
NOTICE:  BL1: Booting BL31                                                                                                                            
NOTICE:  BL31: v2.5(release):OpenWrt v2.5-12 (espressobin-ultra)                                                                                      
NOTICE:  BL31: Built : 13:42:17, Oct 12 2021                                                                                                          
                                                                                                                                                      
                                                                                                                                                      
U-Boot 2021.10 (Oct 12 2021 - 13:42:17 +0000)                                                                                                         
                                                                                                                                                      
DRAM:  1 GiB                                                                                                                                          
WDT:   Not starting                                                                                                                                   
Comphy chip #0:                                                                                                                                       
Comphy-0: USB3_HOST0    5 Gbps                                                                                                                        
Comphy-1: PEX0          2.5 Gbps                                                                                                                      
Comphy-2: SATA0         5 Gbps                                                                                                                        
Target spinup took 0 ms.                                                                                                                              
AHCI 0001.0300 32 slots 1 ports 6 Gbps 0x1 impl SATA mode                                                                                             
flags: ncq led only pmp fbss pio slum part sxs                                                                                                        
PCIE-0: Link up                                                                                                                                       
MMC:   sdhci@d8000: 0                                                                                                                                 
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux                                                       
root@OpenWrt:/# stress-ng --matrix 0 -t 60m                                                                                                           
stress-ng: info:  [3230] dispatching hogs: 2 matrix                                                                                                   

EDIT: RESULTS:


root@OpenWrt:/# stress-ng --matrix 0 -t 60m                                                                                                           
stress-ng: info:  [3230] dispatching hogs: 2 matrix                                                                                                   

CRASH (FREEZE)

[  197.597419] Internal error: undefined instruction: 0 [#1] PREEMPT SMP                                                                              
[  197.601207] Modules linked in:                                                                                                                     
[  197.604340] Process kworker/u4:2 (pid: 26, stack limit = 0xffff80002eb08000)                                                                       
[  197.611599] CPU: 0 PID: 26 Comm: kworker/u4:2 Not tainted 4.14.207-10.3.9.0-2 #1                                                                   
[  197.619211] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                        
[  197.626394] Workqueue: events_freezable_power_ disk_events_workfn                                                                                  
[  197.632651] task: ffff80002ea6aa00 task.stack: ffff80002eb08000                                                                                    
[  197.638750] pc : sd_check_events+0xf4/0x140                                                                                                        
[  197.643047] lr : sd_check_events+0xf4/0x140                                                                                                        
[  197.647344] sp : ffff80002eb0bd20 pstate : 80000145                                                                                                
[  197.652362] x29: ffff80002eb0bd20 x28: 0000000000000000                                                                                            
[  197.657827] x27: 0000000000000000 x26: ffff000009125b58                                                                                            
[  197.663294] x25: ffff80002db6a118 x24: ffff80002db6a158                                                                                            
[  197.668758] x23: 0000000000000000 x22: ffff80002db6a148                                                                                            
[  197.674225] x21: ffff80002d638000 x20: ffff80002eb0bd58                                                                                            
[  197.679690] x19: ffff80002d68bc00 x18: 0000000000000080                                                                                            
[  197.685155] x17: 0000000000000000 x16: 0000000000000000                                                                                            
[  197.690621] x15: 0000000000525090 x14: 0000000000000000                                                                                            
[  197.696086] x13: 0000000000000000 x12: 0000002db131d330                                                                                            
[  197.701552] x11: 0000000000000000 x10: 00000000000009f0                                                                                            
[  197.707018] x9 : ffff80002eb0ba50 x8 : ffff80002ea6b450                                                                                            
[  197.712484] x7 : ffff80002ea6aa00 x6 : ffff80002dafe000                                                                                            
[  197.717949] x5 : 0000000000000000 x4 : 0000000000000001                                                                                            
[  197.723415] x3 : 0000000000000000 x2 : 0000000000000000                                                                                            
[  197.728881] x1 : 000000a605568401 x0 : 0000000000000000                                                                                            
[  197.734346] Call trace:                                                                                                                            
[  197.736858]  sd_check_events+0xf4/0x140                                                                                                            
[  197.740799]  disk_check_events+0x48/0x130                                                                                                          
[  197.744921]  disk_events_workfn+0x14/0x20                                                                                                          
[  197.749045]  process_one_work+0x1d0/0x330                                                                                                          
[  197.753164]  worker_thread+0x48/0x470                                                                                                              
[  197.756928]  kthread+0x12c/0x130                                                                                                                   
[  197.760244]  ret_from_fork+0x10/0x24                                                                                                               
[  197.763919] Code: aa1503e0 f81f8e9f aa1403e3 97fe8fa4 (f2701c1f)                                                                                   
[  197.770191] ---[ end trace 64bf001ba8f205cf ]---                                                                                                   
[  204.926132] Internal error: undefined instruction: 0 [#2] PREEMPT SMP                                                                              
[  204.929919] Modules linked in:                                                                                                                     
[  204.933053] Process kworker/u4:0 (pid: 5, stack limit = 0xffff80002e9fc000)                                                                        
[  204.940222] CPU: 0 PID: 5 Comm: kworker/u4:0 Tainted: G      D         4.14.207-10.3.9.0-2 #1                                                      
[  204.948999] Hardware name: Globalscale Marvell ESPRESSOBin Ultra Board (DT)                                                                        
[  204.956183] Workqueue: events_power_efficient phy_state_machine                                                                                    
[  204.962260] task: ffff80002e9bb800 task.stack: ffff80002e9fc000                                                                                    
[  204.968357] pc : deactivate_task+0x60/0xc0                                                                                                         
[  204.972564] lr : deactivate_task+0x60/0xc0                                                                                                         
[  204.976775] sp : ffff80002e9ff940 pstate : 600001c5                                                                                                
[  204.981792] x29: ffff80002e9ff940 x28: 0000000000000000                                                                                            
[  204.987257] x27: ffff80002e348890 x26: 0000000000000000                                                                                            
[  204.992722] x25: ffff80002e9bbdf0 x24: ffff000008e3fe48                                                                                            
[  204.998189] x23: ffff00000934b018 x22: ffff80002e9bb800                                                                                            
[  205.003654] x21: ffff00000936abd8 x20: ffff00000935a180                                                                                            
[  205.009119] x19: ffff80002ffb8180 x18: 000000000000001b                                                                                            
[  205.014585] x17: 0000000000000000 x16: 0000000000000000                                                                                            
[  205.020051] x15: 0000000000000080 x14: 0000000000000000                                                                                            
[  205.025516] x13: 0000000000000000 x12: 0000000000000001                                                                                            
[  205.030982] x11: 0000000000000000 x10: 00000000000003e6                                                                                            
[  205.036448] x9 : ffff80002ffb8248 x8 : 0000000000000000                                                                                            
[  205.041913] x7 : 00000000ffffffff x6 : ffff80002e9bb8a8                                                                                            
[  205.047378] x5 : 0000800026c5e000 x4 : 0000000000000000                                                                                            
[  205.052844] x3 : 0000000000000000 x2 : 000000549741d50b                                                                                            
[  205.058309] x1 : 00000000001ed73b x0 : ffff80002ffb8180                                                                                            
[  205.063777] Call trace:                                                                                                                            
[  205.066287]  deactivate_task+0x60/0xc0                                                                                                             
[  205.070143]  __schedule+0x2a0/0x5e0                                                                                                                
[  205.073723]  schedule+0x38/0xa0                                                                                                                    
[  205.076950]  schedule_hrtimeout_range_clock+0x84/0xf0                                                                                              
[  205.082145]  schedule_hrtimeout_range+0x10/0x20                                                                                                    
[  205.086803]  usleep_range+0x50/0x70                                                                                                                
[  205.090393]  orion_mdio_wait_ready.isra.1+0x140/0x180                                                                                              
[  205.095585]  orion_mdio_smi_read+0x78/0x100                                                                                                        
[  205.099886]  __mdiobus_read+0x1c/0x40                                                                                                              
[  205.103650]  mdiobus_read_nested+0x48/0x70                                                                                                         
[  205.107861]  mv88e6xxx_smi_multi_chip_read+0x7c/0xa0                                                                                               
[  205.112968]  mv88e6xxx_read+0x38/0x80                                                                                                              
[  205.116734]  mv88e6xxx_g2_smi_phy_read+0x120/0x140                                                                                                 
[  205.121662]  mv88e6xxx_mdio_read+0x68/0xe0                                                                                                         
[  205.125870]  __mdiobus_read+0x1c/0x40                                                                                                              
[  205.129633]  mdiobus_read+0x48/0x70                                                                                                                
[  205.133218]  genphy_update_link+0x20/0x70                                                                                                          
[  205.137338]  marvell_read_status_page+0x20/0x2b0                                                                                                   
[  205.142087]  marvell_read_status+0x8c/0xd0                                                                                                         
[  205.146299]  phy_state_machine+0x3fc/0x640                                                                                                         
[  205.150511]  process_one_work+0x1d0/0x330                                                                                                          
[  205.154631]  worker_thread+0x48/0x470                                                                                                              
[  205.158396]  kthread+0x12c/0x130                                                                                                                   
[  205.161712]  ret_from_fork+0x10/0x24                                                                                                               
[  205.165386] Code: f9403c23 aa1303e0 f9400863 d63f0060 (f9400bf3)                                                                                   
[  205.171660] ---[ end trace 64bf001ba8f205d0 ]---                                                                                                   
[  205.176405] note: kworker/u4:0[5] exited with preempt_count 2                                                                                      

May be from something else than CPU ?

@erdoukki
Copy link
Author

May be from something else than CPU ?

May be I am mixing some kernel modules for the stress tools ?

Because, this box which hang/freeze or panic very quickly with mainline linux, is mostly working with the Marvell-SDK10 kernel:

root@OpenWrt:/# uptime                                                          
 11:01:33 up  3:13,  load average: 0.00, 0.01, 0.00                             
root@OpenWrt:/# uname -ar                                                       
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux

Who has some proposal for testing and studying this issue deeper ?

Advice welcome !

@pali
Copy link
Contributor

pali commented Jan 14, 2022

Well, we know from beginning that A3720 crashes when running at 1.2 GHz frequency. And it needs to be fixed. AFAIK there is no patch which is fixing this issue for 1.2 GHz mode neither in Marvell-4.14 kernel nor in mainline kernel. So posting new and new crash log does not bring nothing new, I guess everybody knows it from first few posts and people rather unsubscribe from spamming thread. Has Marvell provided to you privately any fix for this issue?

@erdoukki
Copy link
Author

Well, we know from beginning that A3720 crashes when running at 1.2 GHz frequency. And it needs to be fixed.

Sure, you're right, and I agreed !

Some (more) technical details of the ULTRA on which I do my tests (the most buggy I have):

root@OpenWrt:/# uname -ar                                                                                                                                                                                   
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux                                                                                                             

It is the SDK10 Kernel I compile for testing the NDA-SDK10 from Marvell.

root@OpenWrt:/# /root/mhz/mhz                                                                                                                                                                               
count=516515 us50=21522 us250=107611 diff=86089 cpu_MHz=1199.956

the CPU is working at 1.2 GHz

root@OpenWrt:/# cat /sys/kernel/debug/clk/cpu/clk_rate                                                                                                                                                      
1200000000                                                                                                                                                                                                  

CPU Governor are not implemented and the CPU is always at 1.2 GHz

root@OpenWrt:/# uptime                                                                                                                                                                                      
 17:45:10 up  9:57,  load average: 0.08, 0.02, 0.00                                                                                                                                                         

The BOX do not bug/freeze/crash/oops at all.

AFAIK there is no patch which is fixing this issue for 1.2 GHz mode neither in Marvell-4.14 kernel nor in mainline kernel. So posting new and new crash log does not bring nothing new, I guess everybody knows it from first few posts and people rather unsubscribe from spamming thread.

Sorry about this, and apologize.
I have shared all of my tests results, in the transparent way of the hope to get some clues from anyone listening, but as you said, I get more shout alone in the desert land...

Has Marvell provided to you privately any fix for this issue?

They only give me access to the SDK with a NDA and not directly, but with my own contacts.
I do not know if it includes some undisclosed and private patch, but Marvell officially said to my contact that this SDK is the only supported and they have fixed the issue with it !
So I have done the promise tests and I get very good results with only a SDK10 compiled kernel, which is used with a 21.02 OpenWrt on bugg ULTRA which works at 1.2 GHz...

The tests may be also buggy because of mixed kernel / libs and my OpenWrt based system...
But they look promising !?
No ???

What I see, is two ultra which crash on mainline linux but look to not bug and work with SDK10 kernel.
I ask then if I could do more tests, what direction to took, and BEFORE looking and comparing all patches included in the SDK10 to search the one which may fix the issue or the ones which help to fix it !

Theses SDK are based on linux-4.14.x
I only test the SDK-10.3.9.0-QA from 20211010 which integrate 2685 patches
There is also SDK-10.3.10.0-PR1 from 20211009 which integrate 2714 patches
There is also SDK-10.3.10.0-PR2 from 20211109 which integrate 2725 patches
I use the a37xx_espressobin_1000_800 supported build name for the script and only tested, for now, with the QA.
And only the linux kernel and modules generated by default.

There is also a SDK-11 based on linux-5.4.x where are removed all a37xx supported build names.

So, because I do not very know closely these SDK from marvell, I do not know if they include more, less, or all from the Marvell Public Git patches, but I think not, because of the NDA...
I have already, some years ago, had some private patch who help me to build, debug, share and then give to the community the initial SDIO driver for Kirkwood and SheevaPlug and I upstream it to linux, but, yes, it is a long time ago story.
This story only get me the mind that some obscure opportunity may help the community !

So, sorry again for this long message, no spam, it is again my own and personal facility to transparently share the works.

@pali,
Thanks to get me more help on this study.
So, again, what do you think I can test to confirm the SDK10 resolve this issue ?
How can I help to fix this issue ?
May I build a complete OpenWrt with the linux kernel and patches from the SDK10-QA ?
May I give a try to the SDK10-PR ?
May I look deeper to the SDK11 for resetting the a37xx support ?
May I compare the all patches from Marvell Git ?
May I compare them all from the mainline branch linux-4.14.x they are looking to be based on ?
May I rebuild the stress tools without the cross kernel components to confirm my doubts about the reason they still crash ?

I am completely wrong with my first results analysis ?
Feel free to contact me directly if you prefer closed discussion about all this !

I do it all at free time, for free, no sponsor, no more target that help to fix this issue...

@pali
Copy link
Contributor

pali commented Jan 14, 2022

I think that no more tests are needed unless people from Marvell explicitly ask what they need (or somebody else who is going to fix it). It is now up to Marvell to provide fix. If you have NDA contract with Marvell, you could report this issue to them and ask them what they need for fixing this issue.

@pali
Copy link
Contributor

pali commented Jan 14, 2022

@stefanchulski or anybody from Marvell: Could you please provide some reply/feedback what is needed for fixing this issue? And if you need some more tests from @erdoukki with 1.2GHz A3720?

@erdoukki
Copy link
Author

I can confirm the SDK10 is mostly stable at 1.2 GHz...
Here is a last test, report;

root@OpenWrt:/# stress -c 4                                                                                                                                                                                 
stress: info: [3337] dispatching hogs: 4 cpu, 0 io, 0 vm, 0 hdd                                                                                                                                             
^C                                                                                                                                                                                                          
root@OpenWrt:/# uptime                                                                                                                                                                                      
 08:58:32 up  2:24,  load average: 3.67, 3.93, 3.97                                                                                                                                                         
root@OpenWrt:/# uname -ar                                                                                                                                                                                   
Linux OpenWrt 4.14.207-10.3.9.0-2 #1 SMP PREEMPT Wed Jan 12 15:53:40 CET 2022 aarch64 GNU/Linux                                                                                                             
root@OpenWrt:/# /root/mhz/mhz                                                                                                                                                                               
count=516515 us50=21559 us250=107772 diff=86213 cpu_MHz=1198.230                                                                                                                                            

As you can see, I stress only the CPU, which stay at 1.2 GHz, for more than 2 hours, with the SDK10 kernel, and no FREEZE, nor BUG, or OOPS !

It is, again, my mostly buggy EspressoBin-ULTRA, which just hang before the end of normal bootup with the OpenWrt 21.02.x default kernel...

@pali
Copy link
Contributor

pali commented Feb 16, 2022

@kostapr Could you please advice who to ask for help or any feedback here?

@erdoukki
Copy link
Author

erdoukki commented Mar 5, 2022

Marvell published a new release of the SDK10 on 2022.02.

@robimarko
Copy link

@erdoukki How did you get to run at 1200MHz?
I literally copied the cpufreq driver from SDK10 and to me it looks like its just reverted to the state before upstream fixes.
It doesn't allow scaling at all, its just running at 750MHz regardless of anything.

@erdoukki
Copy link
Author

erdoukki commented Mar 17, 2022

@robimarko Can you share your actual work (private fork or anyway?)
I can test on my own.

I have not modified the SDK10 kernel sources.
I only made a tweak to compile for 1200MHz.

I flash the kernel and modules obtained with the SDK10 compilation in a OpenWrt snapshot image obtained with OpenWrt ToolChain.
Then my tests and tweaks on the CPU governor shown the results I shared here, as working at 1200MHz.

I can share the simple patch used on SDK10 if you want...

@robimarko
Copy link

There is nothing to share really, I just replaced the cpufreq driver in 5.4 kernel with the one from SDK10.
If the CLK drivers are replaced as well then it just crashes with random errors, basically meaning that its scaling but it will crash like upstream drivers.

@erdoukki
Copy link
Author

Initially, Marvell confirm that the SDK10 was the only supported, and also that the 1200 MHz freeze was solved in the latest SDK10, (end of 2020).
Said by my reseller/contact as the official Marvell statement !

Why do you tests only parts of SDK10 on top of community kernel ?
I do not get the point...
If the tests you've done also get the CPU at 750MHz only and still freezing, the problems is then confirmed to be from out of SDK10 !?
And also at another speeds than 1200 MHz...

I want to repeat that I have proposed another possible solution to study the problem.

My tests shown that the SDK10 do not froze at 1200 MHz.
And may confirm what Marvell said !
May be theses tests may be enhanced and better states or simply reproduced ?

For SDK10, to build kernel and modules, I use this patch:
SDK10-QA-Add_a37xx_espressobin_1200_750-supported_builds.patch

--- scripts/ci/supported_builds.txt	2022-02-26 14:50:55.986705243 +0100
+++ scripts/ci/supported_builds.txt	2022-02-26 14:44:40.925372005 +0100
@@ -17,6 +17,7 @@
 a37xx_ddr4_v3_B_800_800
 a37xx_ddr4_v3_C_800_800
 a37xx_espressobin_1000_800
+a37xx_espressobin_1200_750
 a3900_A
 a3900_B
 a70x0

Hope this can help...

@robimarko
Copy link

robimarko commented Mar 17, 2022

Why wouldn't I test it on mainline kernels?
I quite literally replaced the CPUFreq and the clock drivers, and guess what?
CPUFreq starts scaling to 1200MHz but it won't even boot fully and crash like in mainline.

It doesn't freeze on 750MHz obviously, I have been forced to run it at 750MHz aka the DDR clock for a while now cause if you disable CPUFreq then it will be left on 1200MHz as WTMI set it there and crash under light load and with CPUFreq it won't even boot properly.
SDK10 has the DVFS fixed in the changelog but it's a way older version, it's not a recent one.

I am interested in solving this for everybody, not just running SDK10 and pretending its all fine now cause it isn't.

@erdoukki
Copy link
Author

I am interested in solving this for everybody

I am sure you want to, as I want also...

not just running SDK10

It is only to verify the veracity of what Marvell officially pretend...

pretending its all fine now cause it isn't.

Who say that ?

I just keep my answer to your only latest sentence, no problem at all, anyway, on what you said, or what I may understood.

I may say again, and precise, what I suggest as a proposal of direction to get a possible solution for this overall problem.

#1. Did Marvell words of SDK10 fixes the 1200 MHz crashes true ?
I have tested the SDK10 and its kernel and modules, without any modifications other than adding the 1200 MHz, to verify the states and results of my most buggy espressobin-ultra board: results; no more freeze, no more hang, no more crash.
Someone can try to reproduce himself, not to make my own tests in default, but only to confirm or infirm, to best evaluate the tests results, to enhanced the answer of this #1 question as TRUE/FALSE.

Then, if it is really OK, we can say there is a CLUE or a solution from the SDK10 which may be found and offered to the community and latest kernel...
This may be wonderful to have Marvell give it, point it, offer it, help on it, but we are not in a dreamed world, isn't it ?

#2. We can then try some directions to find the problem (BUG) or, better, find the solution (FIX(ES)) ?
We can make a deep analysis of the patches, isolate the not up-streamed patches, may be others...
We can port or update or mix or compare drivers or any components valuable for the analysis of this ISSUE.

#3. Then we can make some code, compile, and debug, with deeper tests ?

#4. Then we can "slashing champagne" ?
And publicly states this experiences to Marvell clients, to Electronics manufacturer, to Open Source supporters as a good and valuable experience, but it is only a state of dreamed world, for now, and a hope to help us to keep the way !

Sorry to not be technical, or just a little...
I only made proposal to share my own experiences and feeling on this ISSUE, again, and again !
Before been perfect, any code need only to exist, at first...

For now;
I have to tweak undocumented memory with the helps of you and other to get my own crazy box boot again...
I have get some wonderful results which look to promise and proof these SoC is working at heavy load...
I prefer now be more and more patient, prudent, to reproduce and get a real good way, if only one exist, to check this as resolved at the end !

keep this work in private if you prefer, or share any experiences with me to help on this analysis gandalf@gk2.net

@robimarko
Copy link

The last part wasn't directed at you at all, I agree with at least most things you said.

@erdoukki
Copy link
Author

New SDK10-QA-10.22.03 !

@pali
Copy link
Contributor

pali commented Jun 13, 2022

kostapr commented on 9 Oct 2021

@pali, as far as I know this thread was passed to our support team.

@kostapr @stefanchulski Could you check what is the current state? Because this issue is still present.

@erdoukki
Copy link
Author

erdoukki commented Aug 1, 2022

@pali, as far as I know this thread was passed to our support team. I am part of the development group and simply do not have enough bandwidth for supporting SOCs that are not in active development stage.

Will, sometime, someone, get some few time, to report and help ?
Thank @kostapr in advance

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants