Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Talos II - 0.5.0 - One CPU : Doesn't boot #80

Closed
tlaurion opened this issue Apr 26, 2022 · 57 comments
Closed

Talos II - 0.5.0 - One CPU : Doesn't boot #80

tlaurion opened this issue Apr 26, 2022 · 57 comments

Comments

@tlaurion
Copy link

tlaurion commented Apr 26, 2022

Dasharo version
0.5.0 release from https://docs.dasharo.com/variants/talos_2/releases/

Dasharo variant
Workstation

Affected component(s) or functionality
Memory initialization fails ( 1x M393A1K43BB0-CRC in B1 memory slot)

Brief summary
The coreboot output stops at

coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x12421 in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms


coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
 R_T[0] = 10
 R_T[1] = 10
 R_T[2] = 79
 W_T[0] = 0
 W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
 R_T[0] = 12
 R_T[1] = 12
 R_T[2] = 95
 W_T[0] = 0
 W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)

coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x12421 in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms


coreboot-4.14-517-gc92383f92c Tue Apr 12 12:28:24 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
 R_T[0] = 10
 R_T[1] = 10
 R_T[2] = 79
 W_T[0] = 0
 W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
 R_T[0] = 12
 R_T[1] = 12
 R_T[2] = 95
 W_T[0] = 0
 W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)

How reproducible
Always (Single 16 cores CPU, one RAM module: 8GB
(More info on changes needed to be documented on non-flashing instructions under #79)

How to reproduce

Steps to reproduce the behavior:
On laptop:

user@captive-portal:~$ sha256sum -c raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc.sha256
raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc: OK
user@captive-portal:~$ sha256sum -c raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc.sha256
raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc: OK
user@captive-portal:~$ scp *v0.5.0*.ecc root@talos:/tmp/
raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc 100%   28KB   1.3MB/s   00:00    
raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc 100% 2309KB   3.1MB/s   00:00    

On a seperated SSH connection to BMC:

pflash -r /tmp/talos.pnor
pflash -P HBB -p /tmp/raptor-cs_talos-2_bootblock_v0.5.0.signed.ecc -F /tmp/talos.pnor
pflash -P HBI -p /tmp/raptor-cs_talos-2_coreboot_v0.5.0.rom.signed.ecc -F /tmp/talos.pnor
systemctl stop mboxd
mboxd -f 64M -w 1M -b file:/tmp/talos.pnor -v

On another seperate SSH connection to BMC:

mboxctl --lpc-state
    “LPC Bus Maps: BMC Memory”
obmcutil poweron

Expected behavior
Ram init succeeds and net steps are engaged

Actual behavior
Stops at

SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)
@SergiiDmytruk
Copy link
Member

The bug is here. if (chips & (1 << chip)) is missing at least here.

@macpijan
Copy link
Contributor

macpijan commented Apr 27, 2022

@tlaurion As said on the call, we do not have single CPU setup right now, and it would be troublesome to switch back and forth.

Thanks for the report, it is very valuable. Maybe we can try with one or two iterations of fixes which you can test on your side.
If that does not help, we might need to rearrange the lab or think of a more efficient way (maybe testing remotely on your hardware?)

@SergiiDmytruk
Copy link
Member

it would be troublesome to switch back and forth.

I made function that returns mask of chips to return 0x01 to find the errors. The other issue is with OCC startup. I think commit with fix for OCC just got lost.

Ignoring second CPU isn't the same as installing only one, but it's a good approximation. We might want to make a build option for that or do something else that allows limiting number of CPUs.

@macpijan
Copy link
Contributor

@SergiiDmytruk Thanks.

@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170

@tlaurion
Copy link
Author

For reference, my hostboot logs on same machine

--== Welcome to Hostboot hostboot-a2ddbf3/hbicore.bin ==--

  3.07002|secure|SecureROM valid - enabling functionality
  5.51144|Booting from SBE side 0 on master proc=00050000
  5.54800|ISTEP  6. 5 - host_init_fsi
  5.72083|ISTEP  6. 6 - host_set_ipl_parms
  6.02733|ISTEP  6. 7 - host_discover_targets
  8.98001|HWAS|PRESENT> DIMM[03]=2000000000000000
  8.98002|HWAS|PRESENT> Proc[05]=8000000000000000
  8.98003|HWAS|PRESENT> Core[07]=00003C0000000000
  9.06120|ISTEP  6. 8 - host_update_master_tpm
  9.55213|SECURE|Security Access Bit> 0x0000000000000000
  9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000
  9.55228|ISTEP  6. 9 - host_gard
  9.87348|HWAS|Deconfig HUID 0x0003000A, Physical:/Sys0/Node0/DIMM6
  9.87352|HWAS|Deconfig HUID 0x0003000B, Physical:/Sys0/Node0/DIMM7
  9.87467|HWAS|FUNCTIONAL> DIMM[03]=2000000000000000
  9.87468|HWAS|FUNCTIONAL> Proc[05]=8000000000000000
  9.87469|HWAS|FUNCTIONAL> Core[07]=00003C0000000000
  9.87780|ISTEP  6.11 - host_start_occ_xstop_handler
 10.84372|ISTEP  6.12 - host_voltage_config
 10.95588|ISTEP  7. 1 - mss_attr_cleanup
 11.60999|ISTEP  7. 2 - mss_volt
 11.82222|ISTEP  7. 3 - mss_freq
 12.14141|ISTEP  7. 4 - mss_eff_config
 12.82049|ISTEP  7. 5 - mss_attr_update
 12.83176|ISTEP  8. 1 - host_slave_sbe_config
 13.01644|ISTEP  8. 2 - host_setup_sbe
 13.02951|ISTEP  8. 3 - host_cbs_start
 13.03087|ISTEP  8. 4 - proc_check_slave_sbe_seeprom_complete
 13.05204|ISTEP  8. 5 - host_attnlisten_proc
 13.08736|ISTEP  8. 6 - host_p9_fbc_eff_config
 13.09937|ISTEP  8. 7 - host_p9_eff_config_links
 13.14822|ISTEP  8. 8 - proc_attr_update
 13.14947|ISTEP  8. 9 - proc_chiplet_fabric_scominit
 13.16541|ISTEP  8.10 - proc_xbus_scominit
 13.17879|ISTEP  8.11 - proc_xbus_enable_ridi
 13.19928|ISTEP  8.12 - host_set_voltages
 13.21404|ISTEP  9. 1 - fabric_erepair
 13.29682|ISTEP  9. 2 - fabric_io_dccal
 13.29882|ISTEP  9. 3 - fabric_pre_trainadv
 13.30459|ISTEP  9. 4 - fabric_io_run_training
 13.30670|ISTEP  9. 5 - fabric_post_trainadv
 13.30870|ISTEP  9. 6 - proc_smp_link_layer
 13.31264|ISTEP  9. 7 - proc_fab_iovalid
 13.46288|ISTEP  9. 8 - host_fbc_eff_config_aggregate
 13.47825|ISTEP 10. 1 - proc_build_smp
 13.58943|ISTEP 10. 2 - host_slave_sbe_update
 14.62127|ISTEP 10. 4 - proc_cen_ref_clk_enable
 14.64082|ISTEP 10. 5 - proc_enable_osclite
 14.64181|ISTEP 10. 6 - proc_chiplet_scominit
 14.68409|ISTEP 10. 7 - proc_abus_scominit
 14.68546|ISTEP 10. 8 - proc_obus_scominit
 14.68682|ISTEP 10. 9 - proc_npu_scominit
 14.71088|ISTEP 10.10 - proc_pcie_scominit
 14.75217|ISTEP 10.11 - proc_scomoverride_chiplets
 14.75489|ISTEP 10.12 - proc_chiplet_enable_ridi
 14.75749|ISTEP 10.13 - host_rng_bist
 14.76043|ISTEP 10.14 - host_update_redundant_tpm
 14.76317|ISTEP 11. 1 - host_prd_hwreconfig
 14.79285|ISTEP 11. 2 - cen_tp_chiplet_init1
 14.79725|ISTEP 11. 3 - cen_pll_initf
 14.80072|ISTEP 11. 4 - cen_pll_setup
 14.80374|ISTEP 11. 5 - cen_tp_chiplet_init2
 14.80772|ISTEP 11. 6 - cen_tp_arrayinit
 14.81076|ISTEP 11. 7 - cen_tp_chiplet_init3
 14.82940|ISTEP 11. 8 - cen_chiplet_init
 14.83243|ISTEP 11. 9 - cen_arrayinit
 14.83544|ISTEP 11.10 - cen_initf
 14.83924|ISTEP 11.11 - cen_do_manual_inits
 14.84190|ISTEP 11.12 - cen_startclocks
 14.84632|ISTEP 11.13 - cen_scominits
 14.84950|ISTEP 12. 1 - mss_getecid
 15.10883|ISTEP 12. 2 - dmi_attr_update
 15.13412|ISTEP 12. 3 - proc_dmi_scominit
 15.18279|ISTEP 12. 4 - cen_dmi_scominit
 15.18575|ISTEP 12. 5 - dmi_erepair
 15.24568|ISTEP 12. 6 - dmi_io_dccal
 15.24777|ISTEP 12. 7 - dmi_pre_trainadv
 15.25649|ISTEP 12. 8 - dmi_io_run_training
 15.26825|ISTEP 12. 9 - dmi_post_trainadv
 15.27092|ISTEP 12.10 - proc_cen_framelock
 15.27372|ISTEP 12.11 - host_startprd_dmi
 15.27641|ISTEP 12.12 - host_attnlisten_memb
 15.27940|ISTEP 12.13 - cen_set_inband_addr
 15.28241|ISTEP 13. 1 - host_disable_memvolt
 15.44098|ISTEP 13. 2 - mem_pll_reset
 15.48146|ISTEP 13. 3 - mem_pll_initf
 15.49836|ISTEP 13. 4 - mem_pll_setup
 15.52314|ISTEP 13. 6 - mem_startclocks
 15.53972|ISTEP 13. 7 - host_enable_memvolt
 15.55267|ISTEP 13. 8 - mss_scominit
 15.87821|ISTEP 13. 9 - mss_ddr_phy_reset
 15.92635|ISTEP 13.10 - mss_draminit
 16.01169|ISTEP 13.11 - mss_draminit_training
 16.38683|ISTEP 13.12 - mss_draminit_trainadv
 16.40070|ISTEP 13.13 - mss_draminit_mc
 16.43340|ISTEP 14. 1 - mss_memdiag
 18.44189|ISTEP 14. 2 - mss_thermal_init
 18.47948|ISTEP 14. 3 - proc_pcie_config
 18.52297|ISTEP 14. 4 - mss_power_cleanup
 18.52686|ISTEP 14. 5 - proc_setup_bars
 18.55430|ISTEP 14. 6 - proc_htm_setup
 18.56616|ISTEP 14. 7 - proc_exit_cache_contained
 18.64554|ISTEP 15. 1 - host_build_stop_image
 20.48206|================================================
 20.49654|Error reported by fapi2 (0x3300) EID 0x90000227
 20.49786|  No WOF table match found
 20.49787|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 20.49787|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 20.49788|  UserData1  Number of cores : 0x0004000200000064
 20.49789|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 20.49790|------------------------------------------------
 20.49790|  Callout type             : Procedure Callout
 20.49791|  Procedure                : EPUB_PRC_HB_CODE
 20.49792|  Priority                 : SRCI_PRIORITY_HIGH
 20.49793|------------------------------------------------
 20.49793|  Callout type             : Hardware Callout
 20.49795|  Target                   : Physical:/Sys0/Node0/Proc0
 20.49795|  Deconfig State           : NO_DECONFIG
 20.49796|  GARD Error Type          : GARD_NULL
 20.49797|  Priority                 : SRCI_PRIORITY_MED
 20.49797|------------------------------------------------
 20.49798|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
 20.49799|================================================
 20.71743|ISTEP 15. 2 - proc_set_pba_homer_bar
 20.79928|ISTEP 15. 3 - host_establish_ex_chiplet
 20.82571|ISTEP 15. 4 - host_start_stop_engine
 20.87112|ISTEP 16. 1 - host_activate_master
 22.05514|ISTEP 16. 2 - host_activate_slave_cores
 22.16728|ISTEP 16. 3 - host_secure_rng
 22.17455|ISTEP 16. 4 - mss_scrub
 22.20929|ISTEP 16. 5 - host_load_io_ppe
 22.23870|ISTEP 16. 6 - host_ipl_complete
 22.61928|ISTEP 18.11 - proc_tod_setup
 22.73876|ISTEP 18.12 - proc_tod_init
 22.74294|ISTEP 20. 1 - host_load_payload
 23.38540|ISTEP 20. 2 - host_load_hdat
 24.73240|ISTEP 21. 1 - host_runtime_setup
 26.63992|================================================
 26.63993|Error reported by fapi2 (0x3300) EID 0x90000256
 26.63994|  No WOF table match found
 26.63995|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 26.63995|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 26.63996|  UserData1  Number of cores : 0x0004000200000064
 26.63997|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 26.63998|------------------------------------------------
 26.63999|  Callout type             : Procedure Callout
 26.63999|  Procedure                : EPUB_PRC_HB_CODE
 26.64000|  Priority                 : SRCI_PRIORITY_HIGH
 26.64001|------------------------------------------------
 26.64002|  Callout type             : Hardware Callout
 26.64003|  Target                   : Physical:/Sys0/Node0/Proc0
 26.64004|  Deconfig State           : NO_DECONFIG
 26.64004|  GARD Error Type          : GARD_NULL
 26.64005|  Priority                 : SRCI_PRIORITY_MED
 26.64006|------------------------------------------------
 26.64006|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
 26.64007|================================================
 30.11356|htmgt|OCCs are now running in ACTIVE state
 34.34112|ISTEP 21. 2 - host_verify_hdat
 34.36810|ISTEP 21. 3 - host_start_payload
[   35.181479495,5] OPAL skiboot-9858186 starting...
[   35.181483122,7] initial console log level: memory 7, driver 5
[   35.181485237,6] CPU: P9 generation processor (max 4 threads/core)
[   35.181487084,7] CPU: Boot CPU PIR is 0x0054 PVR is 0x004e1201
[   35.181490035,7] OPAL table: 0x30103830 .. 0x30103e10, branch table: 0x30002000
[   35.181493147,7] Assigning physical memory map table for nimbus
[   35.181496208,7] Parsing HDAT...
[   35.181497493,7] SPIRA-S found.
[   35.181500165,6] BMC #0: HW version 3, SW version 2, chip DD1.0
[   35.181573498,6] SP Family is ibm,ast2500,openbmc
[   35.181579994,7] LPC: IOPATH chip id = 0
[   35.181581326,7] LPC: FW BAR       = f0000000
[   35.181582857,7] LPC: MEM BAR      = e0000000
[   35.181584303,7] LPC: IO BAR       = d0010000
[   35.181585742,7] LPC: Internal BAR = c0012000
[   35.181598542,7] LPC UART: base addr = 3f8 (3f8) size = 1 clk = 1843200, baud = 115200
[   35.181601241,7] LPC: BT [0, 0] sms_int: 0, bmc_int: 0
[   35.182842118,5] HDAT I2C: found e3p1 - unknown@1c dp:ff (ff:)
[   35.182938290,5] HDAT I2C: found e3p1 - unknown@1d dp:ff (ff:)
[   35.182981372,5] HDAT I2C: found e3p0 - unknown@19 dp:ff (ff:)
[   35.183024148,5] HDAT I2C: found e3p1 - unknown@1e dp:ff (ff:)
[   35.183065251,5] HDAT I2C: found e3p0 - unknown@1b dp:ff (ff:)
[   35.183106029,5] HDAT I2C: found e3p1 - unknown@1f dp:ff (ff:)
[   35.183162356,5] HDAT I2C: found e3p0 - unknown@1a dp:ff (ff:)
[   35.183211752,5] HDAT I2C: found e3p0 - unknown@18 dp:ff (ff:)
[   35.183934952,5] CHIP: Chip ID 0000 type: P9N DD2.10
[   35.184291461,5] PLAT: Detected Talos platform
[   35.184352236,5] PLAT: Detected BMC platform ast2500:openbmc
[   35.201105130,5] CPU: All 16 processors called in...
[   35.266716473,7] LPC: Routing irq 10, policy: 0 (r=1)
[   35.266717473,7] LPC: SerIRQ 10 using route 0 targetted at OPAL
[   36.374445970,5] HIOMAP: Negotiated hiomap protocol v2
[   36.374512392,5] HIOMAP: Block size is 64KiB
[   36.374559447,5] HIOMAP: BMC suggested flash timeout of 8s
[   36.374589305,5] HIOMAP: Flash size is 64MiB
[   36.374627095,5] HIOMAP: Erase granule size is 64KiB
[   38.486837922,5] FLASH: Found system flash: (unnamed) id:0
[   39.277537988,7] LPC: Routing irq 4, policy: 0 (r=1)
[   39.277539066,7] LPC: SerIRQ 4 using route 1 targetted at OPAL
[   39.277694733,5] OCC: All Chip Rdy after 0 ms
[   40.074902821,3] STB: VERSION verification FAILED. log=0xffffffffffff8160
[   41.192679074,3] STB: IMA_CATALOG verification FAILED. log=0xffffffffffff8160
[   41.384604683,3] CAPP: Error loading ucode lid. index=201d1
[   41.398817729,5] PCI: Resetting PHBs and training links...
[   42.419398089,5] PCI: Probing slots...
[   42.476358942,3] PCI: PHB  (80000) not found
[   42.476847088,3] PCI: PHB  (80001) not found
[   42.477738924,3] PCI: PHB  (80003) not found
[   42.478659084,5] PCI Summary:
[   42.479475473,5] PHB#0000:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot2 (16x) 
[   42.481349662,5] PHB#0001:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=CPU1 Slot1 (8x) 
[   42.483357056,5] PHB#0002:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..ff SLOT=Builtin SAS 
[   42.484697264,5] PHB#0003:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin USB 
[   42.486595728,5] PHB#0003:01:00.0 [EP  ] 104c 8241 R:02 C:0c0330 (      usb-xhci) LOC_CODE=Builtin USB
[   42.488494066,5] PHB#0004:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..01 SLOT=Builtin Ethernet 
[   42.490306090,5] PHB#0004:01:00.0 [EP  ] 14e4 1657 R:00 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
[   42.492133493,5] PHB#0004:01:00.1 [EP  ] 14e4 1657 R:00 C:020000 (      ethernet) LOC_CODE=Builtin Ethernet
[   42.494059776,5] PHB#0005:00:00.0 [ROOT] 1014 04c1 R:00 C:060400 B:01..02 SLOT=BMC 
[   42.495807605,5] PHB#0005:01:00.0 [ETOX] 1a03 1150 R:04 C:060400 B:02..02 LOC_CODE=BMC
[   42.498043037,5] PHB#0005:02:00.0 [PCID] 1a03 2000 R:41 C:030000 (           vga) LOC_CODE=BMC
[   42.504305115,5] IPMI: Resetting boot count on successful boot
[   42.506128825,5] INIT: Waiting for kernel...
[   54.323916712,3] STB: BOOTKERNEL verification FAILED. log=0xffffffffffff8160
[   54.326205744,5] INIT: 64-bit LE kernel discovered
[   54.483191117,5] INIT: Starting kernel at 0x20011000, fdt at 0x306f1ba8 204105 bytes
                             
zImage starting: loaded at 0x0000000020011000 (sp: 0x000000002145bed8)
Allocating 0x23f4098 bytes for kernel...
Decompressing (0x0000000000000000 <- 0x0000000020020000:0x0000000021459984)...
Done! Decompressed 0x220c5d8 bytes

@tlaurion
Copy link
Author

tlaurion commented Apr 27, 2022

@SergiiDmytruk Thanks.

@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170

@macpijan :


user@captive-portal:~$ scp /home/user/QubesIncoming/Insurgo/coreboot-binary/*.ecc root@talos:/tmp/
root@talos's password: 
bootblock.signed.ecc                                                                                                100%   28KB   1.3MB/s   00:00    
coreboot.rom.signed.ecc                                                                                             100% 2309KB   3.2MB/s   00:00    
user@captive-portal:~$ ssh -l root talos
root@talos's password: 
root@talos:~# obmcutil poweroff
root@talos:~# mboxctl --lpc-state
LPC Bus Maps: BMC Memory
root@talos:~# systemctl stop mboxd
root@talos:~# systemctl start mboxd
root@talos:~# mboxctl --lpc-state
LPC Bus Maps: Flash Device
root@talos:~# pflash -r /tmp/talos.pnor
Reading to "/tmp/talos.pnor" from 0x00000000..0x04000000 !
[==================================================] 100% ETA:0s     
root@talos:~# pflash -P HBI -p /tmp/coreboot.rom.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/coreboot.rom.signed.ecc" at 0x00425000..0x00666200 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:~# pflash -P HBB -p /tmp/bootblock.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/bootblock.signed.ecc" at 0x00205000..0x0020c002 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:~# mboxctl --backend file:/tmp/talos.pnor
SetBackend: Success
root@talos:~# obmcutil poweron

root@talos:~# cat /var/log/obmc-console.log

SIGTERM received, booting...
Requesting system poweroff
[  799.110662] reboot: Power down
[  855.188539622,5] OPAL: Shutdown request type 0x0...
st type

coreboot-4.14-525-g9488c148cd Wed Apr 27 13:38:10 UTC 2022 bootblock starting (log level: 7)...
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
FMAP: Found "FLASH" version 1.1 at 0x20000.
FMAP: base = 0x0 size = 0x200000 #areas = 4
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
CBFS: mcache @0xf8231000 built for 10 files, used 0x1f0 of 0x2000 bytes
CBFS: Found 'fallback/romstage' @0x80 size 0x12297 in mcache @0xf823102c
BS: bootblock times (exec / console): total (unknown) / 2 ms


coreboot-4.14-525-g9488c148cd Wed Apr 27 13:38:10 UTC 2022 romstage starting (log level: 7)...
IPMI: romstage PNP BT 0xe4
Get BMC self test result...Function Not Implemented
Initializing IPMI BMC watchdog timer
IPMI BMC watchdog initialized and started.
Initializing FSI...
Initialized FSI (chips mask: 0x01)
Building MVPDs...
starting istep 8.1
ending istep 8.1
starting istep 8.2
ending istep 8.2
starting istep 8.3
ending istep 8.3
starting istep 8.4
ending istep 8.4
starting istep 8.9
ending istep 8.9
starting istep 8.10
ending istep 8.10
starting istep 8.11
ending istep 8.11
starting istep 9.2
ending istep 9.2
starting istep 9.4
ending istep 9.4
starting istep 9.6
ending istep 9.6
starting istep 9.7
ending istep 9.7
starting istep 10.1
ending istep 10.1
starting istep 10.6
Base epsilon values read from table:
 R_T[0] = 10
 R_T[1] = 10
 R_T[2] = 79
 W_T[0] = 0
 W_T[1] = 21
Scaled epsilon values based on +20 percent guardband:
 R_T[0] = 12
 R_T[1] = 12
 R_T[2] = 95
 W_T[0] = 0
 W_T[1] = 26
ending istep 10.6
starting istep 10.10
ending istep 10.10
starting istep 10.12
ending istep 10.12
starting istep 10.13
ending istep 10.13
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
MEMD partition has ECC
MEMD is in 0x03cef200 through 0x03cfb917
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 50
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 51
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address 53
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D4
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D5
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D6
I2C transfer failed to complete (0x04011f0104000000)
No memory DIMM at address D7
SPD @ 0x52
SPD: module type is DDR4
SPD: module part number is M393A1K43BB0-CRC    
SPD: banks 16, ranks 1, rows 16, columns 10, density 8192 Mb
SPD: device width 8 bits, bus width 64 bits
SPD: module size is 8192 MB (per channel)
starting istep 13.2
ending istep 13.2
starting istep 13.3
putRing took 3 ms
ending istep 13.3
starting istep 13.4
ending istep 13.4
starting istep 13.6
ending istep 13.6
starting istep 13.8
Please FIXME: ATTR_MSS_RUNTIME_MEM_THROTTLED_N_COMMANDS_PER_SLOT
ending istep 13.8
starting istep 13.9
ending istep 13.9
starting istep 13.10
CCS took 2 us (3 us timeout), 1 instruction(s)
CCS took 2 us (2 us timeout), 14 instruction(s)
RCD dump for I2C address 0x5a:
0xf820fa30: 80 b3 40 42 30 00 00 00 02 01 00 03 cb e3 c0 0d  ..@B0...........
0xf820fa40: 00 00 39 00 00 00 00 00 00 00 07 00 00 00 00 00  ..9.............
ending istep 13.10
starting istep 13.11
CCS took 2 us (7 us timeout), 2 instruction(s)
Write Leveling starting
CCS took 13 us (92 us timeout), 5 instruction(s)
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
Initial Pattern Write done
DQS alignment starting
CCS took 11 us (44 us timeout), 1 instruction(s)
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
Read Centering done
Write Centering starting
CCS took 8848 us (11314 us timeout), 7 instruction(s)
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x12380 size 0xcbe5 in mcache @0xf82

@tlaurion tlaurion changed the title Talos II - Once CPU : Doesn't boot Talos II - 0.5.0 - One CPU : Doesn't boot Apr 27, 2022
@SergiiDmytruk
Copy link
Member

We also have 8 GiB stick at SPD @ 0x52 on first CPU, but limiting memory to that doesn't trigger the issue.

There is also passing memory data from romstage to ramstage, but it looks quite reliable.

@krystian-hebel any ideas what can go wrong on moving to ramstage and how to debug it? Setting log level to "spew" doesn't seem to add anything to the output.

0xF000F = 221d104900008040

DD21. Hope we didn't mess up with attribute values for different DDs...

@tlaurion Does it work with two CPUs or you haven't tried it?

@tlaurion
Copy link
Author

tlaurion commented Apr 27, 2022

@SergiiDmytruk didn't try 2 cpus, no. For the moment, I am testing one CPU one ram module, basically Talos lite setup.

Shipped board was rejected for sale since some (unknown) ram slots were non-functional. I could buy more RAM to test second CPU later on if there is a way to test one CPU setup even if 2 CPUs are inserted.

I would not see myself adding and removing those often and would prefer testing one CPU setup here until you guys can software disable the second CPU and move from there.

@krystian-hebel
Copy link

Seems it hangs during loading of ramstage, because final romstage line (BS: romstage times ...) isn't printed. Maybe something wrong with memory address translation set in 14.5?

First step of debugging would be checking in GUI if it checkstoped or "just" watchdog timed out. Then, for trying to pinpoint what exactly checkstops one has to read appropriate SCOMs in search of failure. Maybe it is worth trying to prepare a script for dumping all of those, although this kind of depends on installed hardware (e.g. which cores on CPU are enabled) and I don't know if reading all registers is safe.

@krystian-hebel
Copy link

@tlaurion can you try installing DIMM in other slots? This looks like memory issue, but it's hard to tell whether it is hardware or configuration issue at this point. On the other hand memory training seems to be successful...

@tlaurion
Copy link
Author

@krystian-hebel Hostboot boots, as per #80 (comment) provided logs.

The test report that came with the board stated that hostboot test run failed on some other ram slots, but the one stated per user manual for cpu0 one ram module is working.

@tlaurion
Copy link
Author

@krystian-hebel I have not changed jumpers either, if that is a needed thing (secure boot is on by default).

I thought it was not, since booting from mounted ROM. But if anything else needs to be done, please advise.

@krystian-hebel
Copy link

9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000 suggests it is disabled.

@tlaurion could you log in the GUI after it hangs and check if https://<BMC IP>/#/server-health/event-log shows anything interesting? Especially if there is a fresh "checkstop" reported.

@tlaurion
Copy link
Author

tlaurion commented Apr 27, 2022

@tlaurion could you log in the GUI after it hangs and check if https://<BMC IP>/#/server-health/event-log shows anything interesting? Especially if there is a fresh "checkstop" reported.

@krystian-hebel:

Four errors are reported from hostboot boot, yes. Export in json from that page:
{"/xyz/openbmc_project/logging/entry/72":{"AdditionalData":["_PID=1224","ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 03 2c d5 64 ff 00 00 00 03 2c fc 0c bc 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 56 90 00 02 56 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4c 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 15 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 70 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 36 ae 38 00 00 00 00 40 36 b5 cc 00 00 00 00 40 36 c2 88 00 00 00 00 40 35 bb f0 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 1a 25 d7 c9 c6 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 25 d7 d8 47 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 1a 25 d7 e6 c2 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 1a 25 d7 f1 a5 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 1a 25 d7 fc 13 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 1a 25 d8 0a d1 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 1a 25 d8 15 6a 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 1a 25 d8 20 04 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 25 d8 2e ac 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 1a 26 21 19 37 00 00 02 55 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 1a 26 21 23 51 00 00 02 55 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 05 dc 03 01 00 00 00 00 00 02 ac ff 01 90 00 00 00 06 00 00 00 00 00 00 61 74 6d 65 6c 2c 32 38 63 31 32 38 2c 73 70 64 2c 64 69 6d 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 05 dc 03 01 00 00 00 00 00 02 ae ff 01 90 00 00 00 06 00 00 00 00 00 00 61 74 6d 65 6c 2c 32 38 63 31 32 38 2c 73 70 64 2c 64 69 6d 6d 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 01 00 03 36 18 03 00 00 00 00 00 00 ff","CALLOUT_INVENTORY_PATH=/xyz/openbmc_project/inventory/system"],"Id":72,"Message":"org.open_power.Host.Error.Event","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082386994,"Version":"2.7.0-dev-571-g67efd9872","associations":[["callout","fault","/xyz/openbmc_project/inventory/system"]]},"/xyz/openbmc_project/logging/entry/73":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 02 64 ff 97 3c 00 00 00 02 65 26 24 c3 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 61 90 00 02 61 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4b 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 0f 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 58 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 32 77 40 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 14 05 2a a7 ce 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2a b6 fb 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 14 05 2a c6 a1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 14 05 2a d2 8a 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 14 05 2a dd cf 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 14 05 2a ed b1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 14 05 2a f9 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 14 05 2b 05 13 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2b 14 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 14 05 74 02 51 00 00 01 de 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 14 05 74 0d 15 00 00 01 de 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 01 89 00 00 00 00 00 00 00 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 80 00 00 00 f8 51 aa b8 00 00 00 00 00 51 b8 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 57 e8 00 00 00 00 00 00 00 00 02 08 10 08 02 01 11 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 18 00 00 00 00 00 00 00 00 02 08 10 04 02 01 12 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 00 00 00 00 00 00 00 00 00 02 10 04 08 02 01 0f 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 59 56 00 00 00 00 00 00 00 06 00 00 00 00 00 66 8b 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20","PROCEDURE=85","_PID=1224"],"Id":73,"Message":"org.open_power.Host.Error.MaintenanceProcedure","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082387182,"Version":"2.7.0-dev-571-g67efd9872","associations":[]},"/xyz/openbmc_project/logging/entry/74":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 02 64 ff 97 3c 00 00 00 02 65 26 24 c3 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 61 90 00 02 61 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4b 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 0f 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 58 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 32 77 40 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 14 05 2a a7 ce 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2a b6 fb 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 14 05 2a c6 a1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 14 05 2a d2 8a 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 14 05 2a dd cf 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 14 05 2a ed b1 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 14 05 2a f9 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 14 05 2b 05 13 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 14 05 2b 14 c3 00 00 01 de 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 14 05 74 02 51 00 00 01 de 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 14 05 74 0d 15 00 00 01 de 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 01 89 00 00 00 00 00 00 00 00 00 00 00 00 00 00 45 00 00 00 00 00 00 00 80 00 00 00 f8 51 aa b8 00 00 00 00 00 51 b8 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 57 e8 00 00 00 00 00 00 00 00 02 08 10 08 02 01 11 00 00 00 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 18 00 00 00 00 00 00 00 00 02 08 10 04 02 01 12 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 58 00 00 00 00 00 00 00 00 00 02 10 04 08 02 01 0f 00 00 00 00 40 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 40 99 59 56 00 00 00 00 00 00 00 06 00 00 00 00 00 66 8b 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20 00 00 06 20","_PID=1224","CALLOUT_INVENTORY_PATH=/xyz/openbmc_project/inventory/system"],"Id":74,"Message":"org.open_power.Host.Error.Event","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082391898,"Version":"2.7.0-dev-571-g67efd9872","associations":[["callout","fault","/xyz/openbmc_project/inventory/system"]]},"/xyz/openbmc_project/logging/entry/75":{"AdditionalData":["ESEL=00 00 df 00 00 00 00 20 00 04 12 90 6f aa 00 00 50 48 00 30 01 00 33 00 00 00 00 03 32 72 b6 03 00 00 00 03 32 99 d4 34 42 00 00 0f 00 00 00 00 00 00 00 00 00 00 00 00 90 00 02 90 90 00 02 90 55 48 00 18 01 00 09 00 8a 03 40 00 00 00 00 00 ff ff 00 00 00 00 00 00 50 53 00 50 01 01 00 00 02 00 00 09 33 2d 00 48 00 00 00 e0 00 00 10 00 00 00 00 00 00 20 00 00 00 04 00 02 00 00 00 64 00 00 0b b8 00 00 00 12 42 43 38 41 33 33 32 44 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 20 55 44 00 1c 01 06 01 00 02 54 41 4b 00 00 00 06 00 00 00 55 00 01 f9 20 00 00 00 00 55 44 00 24 01 06 01 00 01 54 41 4b 00 00 00 05 00 00 00 00 00 00 00 00 00 00 00 4e 23 01 00 02 00 05 00 00 55 44 00 0c 01 0b 01 00 15 01 00 00 55 44 00 10 01 04 01 00 0f 9f de 6a 00 01 00 00 55 44 00 7c 00 0c 01 00 00 13 04 02 00 64 0b b8 10 00 00 be 0b 54 10 00 00 be 08 ca 08 00 00 a0 0d ac 08 00 00 a0 0b b8 04 00 00 5a 0c 80 04 00 00 5a 0a 8c 16 00 00 be 0a be 16 00 00 be 08 56 16 00 00 a0 08 ca 16 00 00 a0 08 98 14 00 00 a0 09 60 14 00 00 a0 08 ca 12 00 00 be 0a f0 12 00 00 be 08 ba 12 00 00 82 08 ca 12 00 00 82 08 98 10 00 00 82 08 fc 10 00 00 82 08 98 55 44 00 70 01 03 01 00 00 00 00 00 00 03 f9 5c 00 00 00 00 00 03 e0 f4 00 00 00 00 40 40 90 64 00 00 00 00 40 3f c4 74 00 00 00 00 40 3b 53 e4 00 00 00 00 40 3b e7 90 00 00 00 00 40 39 78 0c 00 00 00 00 40 39 91 90 00 00 00 00 40 36 ae 38 00 00 00 00 40 36 b5 cc 00 00 00 00 40 36 c2 88 00 00 00 00 40 35 bb f0 00 00 00 00 00 00 27 14 55 44 00 38 01 01 01 00 48 6f 73 74 62 6f 6f 74 20 42 75 69 6c 64 20 49 44 3a 20 68 6f 73 74 62 6f 6f 74 2d 61 32 64 64 62 66 33 2f 68 62 69 63 6f 72 65 2e 62 69 6e 00 55 44 00 70 01 04 01 00 0f 9f de 6a 00 05 00 00 07 5f 1d f4 30 32 41 41 38 38 33 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0b ac 54 02 59 41 31 39 33 34 31 32 30 31 32 34 00 00 00 00 00 00 05 22 a1 58 01 85 18 00 53 24 e7 0e 5b 2c 00 00 00 43 88 f7 00 00 55 44 00 14 01 08 01 00 00 00 00 01 00 00 00 08 00 00 00 05 55 44 03 fc 01 15 31 00 01 28 00 42 46 41 50 49 00 00 00 00 00 00 00 00 00 00 00 00 00 00 03 f4 00 00 00 00 00 00 03 f4 00 00 00 0b 00 00 00 00 00 00 00 1a 30 cf 0e 0a 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 09 00 00 00 00 00 11 bd 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 30 cf 1c 67 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0a 00 00 00 00 00 13 b5 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 08 56 00 00 00 64 00 00 00 1a 30 cf 2a 90 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0b 00 00 00 00 00 15 ae 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 16 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a be 00 00 00 64 00 00 00 1a 30 cf 34 fe 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0c 00 00 00 00 00 17 a6 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0a 8c 00 00 00 64 00 00 00 1a 30 cf 3f b1 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0d 00 00 00 00 00 19 9f 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 04 00 00 00 00 00 00 00 5a 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0c 80 00 00 00 64 00 00 00 1a 30 cf 4e 4d 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0e 00 00 00 00 00 1b 97 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0b b8 00 00 00 64 00 00 00 1a 30 cf 59 13 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 0f 00 00 00 00 00 1d 90 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 08 00 00 00 00 00 00 00 a0 00 00 00 00 00 00 07 4b 00 00 00 00 00 00 0d ac 00 00 00 64 00 00 00 1a 30 cf 64 1f 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 10 00 00 00 00 00 1f 88 a0 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 08 ca 00 00 00 64 00 00 00 1a 30 cf 72 e0 00 00 02 55 00 48 43 4f fb ed 70 b1 00 00 02 01 00 00 00 00 00 00 00 11 00 00 00 00 00 21 81 20 00 00 00 00 00 01 f8 80 00 00 00 00 00 00 00 01 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 10 00 00 00 00 00 00 00 be 00 00 00 00 00 00 07 4a 00 00 00 00 00 00 0b 54 00 00 00 64 00 00 00 1a 31 19 48 32 00 00 02 55 00 00 43 4f 52 d7 9c 36 00 00 04 73 00 00 00 1c 00 00 00 1a 31 19 52 bf 00 00 02 55 00 10 43 4f 3f f2 02 3d 00 00 05 67 00 00 00 00 02 00 00 01 00 00 00 00 00 00 00 40 00 00 00 2c 55 44 00 30 01 15 31 00 01 28 00 42 46 41 50 49 5f 44 42 47 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 55 44 00 30 01 15 31 00 01 28 00 42 53 43 41 4e 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 28 00 00 00 00 00 00 00 00 80 00 03 63 07 01 10 3f 80 00 03 64 07 01 10 3f 80 00 03 65 07 01 10 3f 80 00 03 66 07 01 10 3f 80 00 03 67 07 01 10 3f 80 00 03 68 07 01 10 3f 80 00 03 69 07 01 10 3f 80 00 03 6a 07 01 10 3f 80 00 03 6b 07 01 10 3f 80 00 07 60 07 01 10 3f 80 00 07 61 07 01 10 3f 80 00 07 62 07 01 10 3f 80 00 07 63 07 01 10 3f 80 00 07 64 07 01 10 3f 80 00 07 65 07 01 10 3f 80 00 07 66 07 01 10 3f 80 00 07 67 07 01 10 3f 80 00 07 68 07 01 10 3f 80 00 07 69 07 01 10 3f 80 00 07 6a 07 01 10 3f 80 00 07 6b 07 01 10 3f 80 00 0b 60 07 01 10 3f 80 00 0b 61 07 01 10 3f 80 00 0b 62 07 01 10 3f 80 00 0b 63 07 01 10 3f 80 00 0b 64 07 01 10 3f 80 00 0b 65 07 01 10 3f 80 00 0b 66 07 01 10 3f 80 00 0b 67 07 01 10 3f 80 00 0b 68 07 01 10 3f 80 00 0b 69 07 01 10 3f","PROCEDURE=85","_PID=1224"],"Id":75,"Message":"org.open_power.Host.Error.MaintenanceProcedure","Purpose":"xyz.openbmc_project.Software.Version.VersionPurpose.BMC","Resolved":false,"Severity":"xyz.openbmc_project.Logging.Entry.Level.Error","Timestamp":1651082392034,"Version":"2.7.0-dev-571-g67efd9872","associations":[]}}

Hostboot boot reported (snippets) from above logs:

20.48206|================================================
 20.49654|Error reported by fapi2 (0x3300) EID 0x90000227
 20.49786|  No WOF table match found
 20.49787|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 20.49787|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 20.49788|  UserData1  Number of cores : 0x0004000200000064
 20.49789|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 20.49790|------------------------------------------------
 20.49790|  Callout type             : Procedure Callout
 20.49791|  Procedure                : EPUB_PRC_HB_CODE
 20.49792|  Priority                 : SRCI_PRIORITY_HIGH
 20.49793|------------------------------------------------
 20.49793|  Callout type             : Hardware Callout
 20.49795|  Target                   : Physical:/Sys0/Node0/Proc0
 20.49795|  Deconfig State           : NO_DECONFIG
 20.49796|  GARD Error Type          : GARD_NULL
 20.49797|  Priority                 : SRCI_PRIORITY_MED

and

26.63992|================================================
26.63993|Error reported by fapi2 (0x3300) EID 0x90000256
26.63994|  No WOF table match found
26.63995|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
26.63995|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
26.63996|  UserData1  Number of cores : 0x0004000200000064
26.63997|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
26.63998|------------------------------------------------
26.63999|  Callout type             : Procedure Callout
26.63999|  Procedure                : EPUB_PRC_HB_CODE
26.64000|  Priority                 : SRCI_PRIORITY_HIGH
26.64001|------------------------------------------------
26.64002|  Callout type             : Hardware Callout
26.64003|  Target                   : Physical:/Sys0/Node0/Proc0
26.64004|  Deconfig State           : NO_DECONFIG
26.64004|  GARD Error Type          : GARD_NULL
26.64005|  Priority                 : SRCI_PRIORITY_MED
26.64006|------------------------------------------------
26.64006|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin

@tlaurion
Copy link
Author

tlaurion commented Apr 27, 2022

9.55214|SECURE|Secure Mode Disable (via Jumper)> 0x8000000000000000 suggests it is disabled.

Right. Secure Mode came disabled for both CPUs, jumper being set at 2nd and 3rd PINs for both CPUs.

@krystian-hebel
Copy link

@tlaurion those events (unfortunately?) aren't checkstops, which may make debugging harder. Most likely those are just logs from those errors reported by Hostboot. Feel free to clear that log so it won't keep showing them. Still, I'd like to ask you to dump some SCOM registers that would help if it were a checkstop, because some errors may still be masked at that time:

  1. Copy fir_scoms.txt to /tmp/fir_scoms.txt on BMC
  2. Start the platform and wait for it to get to the point where it hangs
  3. Run on BMC (this takes almost 2 minutes on our platform, may take more time if yours is stuck in a loop):
while read scom; do
        if [[ "$scom" == "0x"* ]]; then
                pdbg -P pib0 getscom $scom
        else
                echo "$scom"
        fi
done < /tmp/fir_scoms.txt > /tmp/scom_dump.log
  1. Send /tmp/scom_dump.log here.

It is important to get the full dump from the first run - in case of problem in one place it may report further errors when trying to read additional registers, and that can obscure the real issue. Ideally those registers should be accessed in proper order, depending on results of earlier reads, but asking for them one at a time would take too much.

@tlaurion
Copy link
Author

@krystian-hebel

The problem here, it testing :

@SergiiDmytruk Thanks.

@tlaurion Binaries from this MR are here: https://github.com/Dasharo/coreboot/suites/6282695824/artifacts/224773170

Is that the machine bootloops, so that when I run the script:

root@talos:~# while read scom; do
>         if [[ "$scom" == "0x"* ]]; then
>                 pdbg -P pib0 getscom $scom
>         else
>                 echo "$scom"
>         fi
> done < /tmp/fir_scoms.txt > /tmp/scom_dump.log
pdbg: Failed to write to 0x0000101c (0000000000001007): Cannot send after transport endpoint shutdown

Here is still the output of it, prior of it giving pdbg: Failed to write to 0x0000101c (0000000000001007): Cannot send after transport endpoint shutdown:
scom_dump.log

@krystian-hebel
Copy link

IIRC it would stop bootlooping after 3 or 4 tries (there is a way to reduce that number, but Hostboot doesn't like it), but I think all important registers were dumped. I'll take a look an will let you know when I find anything.

@tlaurion
Copy link
Author

For some reason, next boot didn't hit a bootloop.

Running:

while read scom; do
        if [[ "$scom" == "0x"* ]]; then
                pdbg -P pib0 getscom $scom
        else
                echo "$scom"
        fi
done < /tmp/fir_scoms.txt > /tmp/scom_dump.log

obmc-console.log
scom_dump.log

Here is scp'ed logs

@miczyg1
Copy link
Contributor

miczyg1 commented Apr 28, 2022

Maybe we should consider some scripts or guides which information to dump and how t do it to help us debugging issues.

@tlaurion
Copy link
Author

@macpijan would it be useful to have scom dump from successful hostboot booting petitboot?

@krystian-hebel
Copy link

Those shouldn't matter, I kind of know what to expect there. What would help would be fuller dump when it hangs and a checkstop happens (i.e. when second register on the list, 0x500f001c, is non-zero). If you could remove from input file all cache chiplets except chiplet 4 (0x14xxxxxx left, other 0x1xxxxxxx removed), and all core chiplets except chiplet 18 (0x32xxxxxx left, other 0x2xxxxxxx and 0x3xxxxxxx removed) and try again, it should take less time and hopefully dump all/most of the registers before reboot happens.

I don't know how #80 (comment) didn't get to the checkstop, instead it just stopped in a random place.

@tlaurion
Copy link
Author

tlaurion commented May 2, 2022

@krystian-hebel: not sure what a checkstop is. (0x500f001c is zero here)

The logs collected here were collected from machine not stuck in a bootloop. The machine is still on now, not having rebooted automatically.

So I understand that to test and report the proper logs, it is required to boot successfully from hostboot first (to reset state and produce bootloops, for which a checkstop is what is triggered when a bootloop occurs and where system reboots automatically?)

fir_scoms_issue80.txt
obmc-console.log
scom_dump_issue80.log

@tlaurion
Copy link
Author

tlaurion commented May 2, 2022

Checkstop is whenever 0x500f001c is non-zero. Whether platform reboots or stops depends on the configuration, I thought that it would do 4 reboots and get stuck on 5th. It should be reported in those logs in GUI that I asked earlier, but from those logs it seems that something else happens instead.

Thanks for the clarification.

There are org.open_power.Host.Boot.Error.Checkstop when booting coreboot, but they only contain PIDs and seemed irrelevant. Content of such logs from last boot attempt were:
_PID=6707 and _PID=6404

@tlaurion
Copy link
Author

tlaurion commented May 5, 2022

@krystian-hebel anything else you would need ?

@krystian-hebel
Copy link

@tlaurion I updated binaries at https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT, please check with those. I expect it to die on https://github.com/Dasharo/coreboot/blob/raptor-cs_talos-2/ram_debug_fixes/src/soc/ibm/power9/istep_13_11.c#L896. If that is the case, I would like to see both console dump and SCOM dump, if it doesn't hang there just the console output (may be just from start of istep 13.11 onward) will be enough.

As the comment above line from link says, there is a lot of code in the workaround missing. I haven't implemented it because all of our DIMMs were nice enough to not require it, so I had no way of testing whether it works. Seems that this may have just changed 🙂

@tlaurion
Copy link
Author

tlaurion commented May 6, 2022

@krystian-hebel

root@talos:~# cd /tmp/v0.5.0_debug_ram/
root@talos:/tmp/v0.5.0_debug_ram# sha256sum *
3f82006b01cbbdb5ca4b1bb7c56088cad7e5fa7cdbcc9275e1e36905334c882e  bootblock.signed.ecc
7c8f7da210b5facb89724347f4096d7bff8024e2d9e046a132f52f0286f3f86c  coreboot.rom.signed.ecc
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: BMC Memory
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --backend vpnor
Failed to post message: Connection timed out
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: Flash Device
root@talos:/tmp/v0.5.0_debug_ram# pflash -r /tmp/talos.pnor
Reading to "/tmp/talos.pnor" from 0x00000000..0x04000000 !
[==================================================] 100% ETA:0s     
root@talos:/tmp/v0.5.0_debug_ram# pflash -P HBB -p /tmp/v0.5.0_debug_ram/bootblock.signed.ecc -F /tmp/talos.pnor                             
About to program "/tmp/v0.5.0_debug_ram/bootblock.signed.ecc" at 0x00205000..0x0020c002 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:/tmp/v0.5.0_debug_ram# pflash -P HBI -p /tmp/v0.5.0_debug_ram/coreboot.rom.signed.ecc -F /tmp/talos.pnor
About to program "/tmp/v0.5.0_debug_ram/coreboot.rom.signed.ecc" at 0x00425000..0x00666200 !
WARNING ! This will modify your HOST flash chip content !
Enter "yes" to confirm:yes
Programming & Verifying...
[==================================================] 100%
Updating actual size in partition header...
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --backend file:/tmp/talos.pnor
SetBackend: Success
root@talos:/tmp/v0.5.0_debug_ram# mboxctl --lpc-state
LPC Bus Maps: BMC Memory

Will be reusing:
Reusing https://github.com/Dasharo/dasharo-issues/files/8603193/fir_scoms_issue80.txt
and

while read scom; do
        if [[ "$scom" == "0x"* ]]; then
                pdbg -P pib0 getscom $scom
        else
                echo "$scom"
        fi
done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump.log

Ouput:

starting root@talos:/tmp/v0.5.0_debug_ram# cat /var/log/obmc-console.log 
0000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Leveling done
Initial Pattern Write starting
CCS took 5 us (38 us timeout), 5 instruction(s)
DP 0
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 1
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 2
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 3
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 4
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Initial Pattern Write done
DQS alignment starting
CCS took 10 us (44 us timeout), 1 instruction(s)
DP 0
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 1
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 2
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 3
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 4
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
DQS alignment done
Read Clock Alignment starting
CCS took 8 us (82 us timeout), 1 instruction(s)
DP 0
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 1
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 2
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 3
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 4
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Clock Alignment done
Read Centering starting
CCS took 36 us (120 us timeout), 1 instruction(s)
DP 0
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 1
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 2
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 3
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 4
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000000000 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Read Centering done
Write Centering starting
CCS took 8449 us (11314 us timeout), 7 instruction(s)
DP 0
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000003a0e - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 1
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000003e1a - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 2
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000001a1a - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 3
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000001a3a - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 4
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000003a00 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Write Centering done
Coarse write/read starting
CCS took 5 us (24 us timeout), 1 instruction(s)
DP 0
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000003a0e - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 1
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000003e1a - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 2
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000001a1a - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 3
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000001a3a - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
DP 4
	0x0000000000000000 - RD_VREF_CAL_ERROR
	0x0000000000000000 - DQ_BIT_DISABLE_RP0
	0x0000000000000000 - DQS_BIT_DISABLE_RP0
	0x0000000000000000 - WR_ERROR0
	0x0000000000000000 - RD_STATUS0
	0x0000000000000000 - RD_LVL_STATUS2
	0x0000000000000000 - RD_LVL_STATUS0
	0x0000000000003a00 - WR_VREF_ERROR0
	0x0000000000000000 - WR_VREF_ERROR1
0x0000000000000000 - APB_ERROR_STATUS0
0x0000000000000000 - RC_ERROR_STATUS0
0x0000000000000000 - SEQ_ERROR_STATUS0
0x0000000000000000 - WC_ERROR_STATUS0
0x0000000000000000 - PC_ERROR_STATUS0
0x0000000000000000 - PC_INIT_CAL_ERROR
0x0000000000008000 - PC_INIT_CAL_STATUS
0x0000000000000000 - IOM_PHY0_DDRPHY_FIR_REG
0x0000000000000000 - MBACALFIRQ
Coarse write/read done
ending istep 13.11
starting istep 13.13
ending istep 13.13
starting istep 14.1
ending istep 14.1
starting istep 14.2
ending istep 14.2
starting istep 14.3
Initializing PEC0...
Initializing PEC1...
Initializing PEC2...
Initializing PHB0...
Initializing PHB1...
Initializing PHB2...
Initializing PHB3...
Initializing PHB4...
Initializing PHB5...
ending istep 14.3
starting istep 14.5
ending istep 14.5
0xF000F = 221d104900008040
CBMEM:
IMD: root @ 0xffeff000 254 entries.
IMD: root @ 0xffefec00 62 entries.
FMAP: area COREBOOT found @ 20200 (1965568 bytes)
FFS header at 0x80060300ffff7000
PNOR base at 0x80060300fc000000
HBI partition has ECC
HBI is in 0x00426200 through 0x0175f037
CBFS: Found 'fallback/ramstage' @0x12840 size 0xcbe7 in mcache @root@talos:/tmp/v0.5.0_debug_ram# 

No checkstop?

root@talos:/tmp/v0.5.0_debug_ram# pdbg -P pib0 getscom 0x500f001c
p0: 0x00000000500f001c = 0x0000000000000000 (/kernelfsi@0/pib@1000)
root@talos:/tmp/v0.5.0_debug_ram# 
root@talos:/tmp/v0.5.0_debug_ram# while read scom; do
>         if [[ "$scom" == "0x"* ]]; then
>                 pdbg -P pib0 getscom $scom
>         else
>                 echo "$scom"
>         fi
> done < /tmp/fir_scoms_issue80.txt > /tmp/scom_dump.log
root@talos:/tmp/v0.5.0_debug_ram#

Attached are the scom logs:
scom_dump.log

@krystian-hebel
Copy link

@tlaurion to exclude possibility of bad support for CPU, could you test if it works (or at least behaves differently) when this DIMM is installed in other slots, even though instruction tells to use this one?

@tlaurion
Copy link
Author

tlaurion commented May 10, 2022

@macpijan : user's manual suggests memory bank population for one DIMM to be in B1, on one/two CPU configuration.

You suggest that I put my single M393A1K43BB0-CRC in what other slot?

EDIT: outside channel answer of @krystian-hebel "Any one should do."
Putting in D1.

@tlaurion
Copy link
Author

hmmm. Memory stick in D1:

16.90082|================================================
 16.91730|Error reported by prdf (0xE500) EID 0x90000366
 16.91731|  PRD Signature            : 0x240005 0xFFFF0044
 16.91874|  Signature Description    : pu.mca:k0:n0:s0:p00:c5 () RDR: Repairs needed but unavailable
 16.91875|  UserData1   : 0x0024000500000000
 16.91876|  UserData2   : 0xffff004400000000
 16.91876|------------------------------------------------
 16.91877|  Callout type             : Hardware Callout
 16.91878|  Target                   : Physical:/Sys0/Node0/DIMM6
 16.91879|  Deconfig State           : DELAYED_DECONFIG
 16.91880|  GARD Error Type          : GARD_Predictive
 16.91880|  Priority                 : SRCI_PRIORITY_HIGH
 16.91881|------------------------------------------------
 16.91882|  Hostboot Build ID: hostboot-a2ddbf3/hbicore.bin
 16.91882|================================================
 17.26972|System Shutting DownTo Perform Reconfiguration After Deconfig
 17.26973|IPMI: Initiate power cycle
 17.27087|Stopping istep dispatcher
 17.99366|IPMI: shutdown complete
15.12352|ISTEP 13. 7 - host_enable_memvolt
 15.13551|ISTEP 13. 8 - mss_scominit
 15.47751|ISTEP 13. 9 - mss_ddr_phy_reset
 15.53023|ISTEP 13.10 - mss_draminit
 15.61190|ISTEP 13.11 - mss_draminit_training
 16.03449|System Shutting DownTo Perform Reconfiguration After Recoverable Error
 16.03451|IPMI: Initiate power cycle
 16.05795|Stopping istep dispatcher
 16.41643|IPMI: shutdown complete

@tlaurion
Copy link
Author

tlaurion commented May 10, 2022

Memory stick now in A1.
Debugged with @krystian-hebel.

It seems that the issue above was linked to components being faulty and the Talos deactivating those components to continue on other working ones as described here.

To make A1 work, it was necessary to wipe from instructions:
pflash -P GUARD -c

Note: On my board, D1 slot is faulty (let's remember the donation came from a board that failed DUT_HW_GUARD_FAILURE), with the device Test report specifying that "2x ram slots fail".


What we know as of now if that A1 and B1 (blue slots, under CPU0) works, and hostboot is able to train (as coreboot) the M393A1K43BB0-CRC stick.

It seems that the CPU revision in my setup is not known from DEFCONFIG (from Hostboot):

34.38209|ISTEP 21. 1 - host_runtime_setup
 36.26054|================================================
 36.26055|Error reported by fapi2 (0x3300) EID 0x900003C7
 36.26056|  No WOF table match found
 36.26056|  ModuleId   0x10 fapi2::MOD_FAPI2_PLAT_PARSE_WOF_TABLES
 36.26057|  ReasonCode 0x332d fapi2::RC_WOF_TABLE_NOT_FOUND
 36.26058|  UserData1  Number of cores : 0x0004000200000064
 36.26059|  UserData2  WOF Power Mode (1=Nominal, 2=Turbo) : 0x00000bb800000012
 36.26060|------------------------------------------------
 36.26061|  Callout type             : Procedure Callout
 36.26062|  Procedure                : EPUB_PRC_HB_CODE
 36.26062|  Priority                 : SRCI_PRIORITY_HIGH
 36.26063|------------------------------------------------
 36.26064|  Callout type             : Hardware Callout
 36.26066|  Target                   : Physical:/Sys0/Node0/Proc0
 36.26066|  Deconfig State           : NO_DECONFIG
 36.26067|  GARD Error Type          : GARD_NULL
 36.26068|  Priority                 : SRCI_PRIORITY_MED
 36.26069|------------------------------------------------

@tlaurion
Copy link
Author

@krystian-hebel @macpijan some updates?

@krystian-hebel
Copy link

Interesting find. Soon after entering ramstage processor hits an unknown instruction and as a result exception happens. Normally, at that point there should be no interrupt handlers installed, but for some reason around that particular exception (@ 0x840) I see something that looks like proper PPC64 code:

root@talos:~# pdbg -p 0 -c 18 -t 0 getmem 0x0 0x1000 | hexdump -C
[==================================================] 100%
00000000  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000e20  38 61 01 70 4b ff fb 8d  2c 23 00 00 41 82 ff 6c  |8a.pK...,#..A..l|
00000e30  e8 bf 00 18 e9 3f 00 10  38 80 00 00 7c a3 28 50  |.....?..8...|.(P|
00000e40  7c 69 1a 14 48 00 25 cd  60 00 00 00 e8 9f 00 18  ||i..H.%.`.......|
00000e50  e8 7f 00 10 38 a0 00 01  48 00 26 c5 60 00 00 00  |....8...H.&.`...|
00000e60  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00001000

From there it jumps into unused memory (all 0s), hits another exception and goes back into this handler.

There are some open questions that I'll try to debug more:

  • Why coreboot hit bad instruction in the first place?
  • Why is there a code near interrupt vectors?
  • Is this a copy of part of coreboot code?
    • Seems so, 0x820-0x85f is a copy (or alias?) of 0xf8243aa0-0xf8243adf
    • Surrounding bytes don't match
  • Does this code change between boots?
    • In few attempts I got exactly the same code, so either it doesn't change or I am (un)lucky to always get the same results
  • Why exactly 64 bytes?
    • Cache line is 128 bytes
    • Beginning of copied code is aligned to neither of these values
  • How does this connect to errors in memory calibration?

On memory training front: there are no errors when only 1D write centering is performed, instead of 2D centering. This is one of the workarounds Hostboot does when full algorithm fails, but according to my notes all registers that Hostboot tests for failure are reporting a success, even for 2D training. I will have to dig into Hostboot's code again and compare it with my notes.

Nevertheless, even with forced 1D write centering, the problem with "random" data in interrupt handlers' area still happens. Other parts of RAM are properly zeroed so it is most likely working as it should. It may be a different issue altogether and this is what I'm assuming right now. At least at that point we have RAM and can install our own interrupt handlers to help with debugging, assuming they won't get overwritten.

@SergiiDmytruk constructed a list of differences between different DD revisions of CPU, but it seems that Hostboot doesn't always use everything it defines. There may be more code depending on DD versions that doesn't use attributes listed there. Some things are set before FAPI (used to access attributes, among other things) is enabled, like HRMOR or XIVE.

@krystian-hebel
Copy link

More findings:

  • 0x820-0x85f is a copy of what was at 0xf8243aa0-0xf8243adf, but after that hang 0xf8243ac0++ is zeroed. Copy in lower memory isn't truncated.
root@talos:~# pdbg -p 0 -c 18 -t 0 getmem 0xf8243a90 0x50 | hexdump -C
[==================================================] 100%
00000000  60 00 00 00 e8 bf 00 18  e8 9f 00 10 7f 86 e3 78  |`..............x|
00000010  38 61 01 70 4b ff fb 8d  2c 23 00 00 41 82 ff 6c  |8a.pK...,#..A..l|
00000020  e8 bf 00 18 e9 3f 00 10  38 80 00 00 7c a3 28 50  |.....?..8...|.(P|
00000030  00 00 00 00 00 00 00 00  00 00 00 00 00 00 00 00  |................|
*
00000050
  • Problem occurs immediately after returning from memset(ramstage.bss, 0, sizeof(ramstage.bss + ramstage.heap)) called from cbfs_prog_stage_load. Return should jump to 0xf8243ac8 which is zeroed at that point, hence the exception.
  • According to register values, memset finishes successfully after clearing full range. Its code definitely fits in L1I cache, probably even in IFB (instruction fetch buffer, can hold 24 instructions in SMT4 mode), so even if it were overwritten it should still be able to execute.
  • Call to memset is at 0xf8243ac4 so that cache line should have been accessed and LRU should not choose it to be victimized. Memory is zeroed there on 64B boundary (half of cache line, fetching it to L1D is supported and AFAICT enabled), contrary to what happened in 0x820-0x85f range.
  • Normally, before RAM is initialized, checkstop happens when uninitialized memory is accessed, unless it is explicitly cleared in L3 cache without write-back (dcbz instruction). This doesn't happen here, memset is able to clear additional ~2.6MB which together with what is initialized by bootblock is more than 10MB of available L3 cache (for the only core executing at that time). It seems as if L3 thought it could write back to RAM, but couldn't read it later.
  • pdbg asks main CPU to read memory, so it still goes through all cache levels. It can be told to use cache-inhibited read, but real access type depends on type of first access, it is stored in SLB or one of other lookaside buffers.

@pietrushnic
Copy link

I just hope we don't chase broken hardware here. Please note we may have some more remote setups of Talos II for coparison of behavior soon. I'm working on that.

@krystian-hebel
Copy link

@pietrushnic Hostboot works with this hardware, so even if it is not perfect, it should at least hit one of TODO workaround paths in coreboot. For some reason this does not happen, no error is detected until it breaks fatally.

@pietrushnic
Copy link

@krystian-hebel understood, that means we need improvements on our side and we don't what the heck is going. I hope you can invent some diagnostics steps. Also maybe we should redirect this to mailing list, maybe someone there would have ideas how to debug that further?

@krystian-hebel
Copy link

@tlaurion I have another request: I need logs from talos.pnor from https://cloud.3mdeb.com/index.php/s/A8qNefnkaBYDbsT. This is Hostboot with customized (lots of irrelevant stuff suppressed, 80% faster and produces smaller log) debug output. On our machine this takes ~45 minutes to boot and produces ~30MB of log, this may be slightly less with one CPU, one DIMM. It should fit in ramfs, even with PNOR image also stored there. Read it with obmc-console-client > /tmp/boot.log because /var/log/obmc-console.log has limited size.

@tlaurion
Copy link
Author

tlaurion commented May 24, 2022

Full logs provided off channel.

@tlaurion
Copy link
Author

@krystian-hebel any updates?

@krystian-hebel
Copy link

Sorry for the delay, I was busy with other projects and then reading provided log (almost 200k lines). Based on it, and additional SCOM dumps performed on platform, I have some rough idea of what happens. There were actually multiple errors (11 to be exact) reported in FIR, but some of them probably were caused by trying to load exception handler that does not exist.

We messed up a bit during simplification done to various isteps, like this one. As a result, a bit that tells that current CPU is the only one in the system is not set. Because of that, when code tries to write to RAM, a message is sent to other CPU that it should flush and invalidate its L3 cache. That other CPU does not exist, so main CPU doesn't receive an acknowledgement to that message. After quick and dirty fix in place (hardcoding it to 1 CPU version) "only" 4 errors were reported.

Further issues were caused by misunderstanding what ATTR_PROC_FABRIC_X_LINKS_CNFG is - we thought it is total number of X links, instead it is number of X links in use. For Nimbus this is 1 when there are 2 CPUs, 0 otherwise. With some additional hacks we got down to 2 reported errors, and a bit different manifestation of the problem - now platform doesn't reboot, instead it gets into infinite loop, but this may be caused by different layout of code with regard to cache lines.

I also noticed some differences that will become important when we get to OCC initialization, but we're not that far just yet. Also, if OCC was (partially) started earlier, it should be able to gather FIR SCOM, basically what I tried to get with scripts few comments above. This would however require starting OCC (at least) twice, and as I haven't thought it would be necessary, we decided to skip it in coreboot. AFAIK there is no ready-to-use tool for parsing those dumps.

Right now we're about to remove one CPU from our platform and see if we get exactly the same issue, as at least some of this problems should happen on every 1 CPU platform, if I understood it correctly.

@krystian-hebel
Copy link

Update: with additional changes coreboot is able to boot to ramstage, now it stops at No WOF table match found issue.

Final cause was wrong Power Bus frequency. Hostboot did read it from MVPD which we mimicked in coreboot, but what we haven't noticed is that the value is overwritten with a hardcoded one later. Hardcoded value is identical to value from MVPD for our CPU, which is why it worked in the first place.

WOF is another issue altogether, but I think it shouldn't be too hard to fix. We just didn't thought that there are processors which aren't included in WOF table, but this is what community testing is all about - to catch such corner cases.

@tlaurion
Copy link
Author

Latest tests with unreleased version supported my single CPU!

Current issue now is to have Heads payload output to vga console, which seems to miss either AST+DRM in kernel config and/or proper skiboot passed arguments.

Off-channel notes:

@tlaurion
Copy link
Author

@krystian-hebel I'm searching for the tool that was developped to sit on bmc and collect the logs without nohup. Can you tag me and point me to where it is? That should be added in a debugging page for Talos II board.

@macpijan
Copy link
Contributor

I think it was here: https://github.com/3mdeb/openpower-coreboot-docs/pull/74/files

Please let me know what you believe what be the best place to put it

@tlaurion
Copy link
Author

@macpijan : https://docs.dasharo.com/variants/talos_2/overview/ should have minimally a link to https://github.com/3mdeb/openpower-coreboot-docs

But a debugging page draft would be useful, pointing directly to https://github.com/3mdeb/openpower-coreboot-docs/blob/main/devnotes/scat/README.md to facilitate bug reporting?

@krystian-hebel
Copy link

Maybe we should move all user-level documentation to docs.dasharo.com and leave openpower-coreboot-docs just for developer stuff like logs, istep analysis, early design considerations etc? There is some overlap between those two repositories, e.g. release info or flashing instructions. What's worse, they are slightly different...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

6 participants