Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze shortly after booting Linux on MSI Z690-A DDR4 WiFi #977

Open
esawady opened this issue Aug 2, 2024 · 12 comments
Open

Freeze shortly after booting Linux on MSI Z690-A DDR4 WiFi #977

esawady opened this issue Aug 2, 2024 · 12 comments

Comments

@esawady
Copy link

esawady commented Aug 2, 2024

Dasharo version
1.1.3, built locally (with SeaBIOS as the payload rather than EDK II, since I was having trouble compiling EDK II)

Also tried officially-built 1.1.1, with identical results. Would be happy to try any other builds you provide me if that'd help with debugging; I have tools for external flashing, so I'm not worried about bricking anything

Edit: Also tried a standard 1.1.3 build, with identical results

Hardware
MSI PRO Z690-A DDR4 WiFi
i3-12300T
1x 16GiB G.Skill F4-2666C19-16GIS (in slot A2)
No external GPU, tried with an RX580 as well and it didn't change anything
Full DTS logs available if you'd like them, unsure if they have any sensitive information

Symptoms
SeaBIOS/EDK II and Grub work perfectly fine, and are (as far as I can tell) stable. Around 15s (can measure more precisely if that'd be helpful; it's relatively variable in terms of how far the boot gets) after booting to Linux, the display freezes. If I leave it there for long enough, it eventually reboots.

Debugging steps I've tried so far
I've tried reflashing the stock BIOS in order to check if coreboot somehow caused hardware damage. The stock BIOS still works fine.

I've tried a couple different Linux distributions and bootloaders. Alpine, antiX, and DTS all behaved identically. Syslinux under SeaBIOS froze immediately after loading the kernel, whereas Grub (under both SeaBIOS and EDK II) froze later in the boot process.

I've tried booting with init=/bin/sh and debug on the Linux command line, and running dmesg -w before it freezes, both of which should in theory print kernel logs to the screen. No logs appeared during the freeze; I'm pretty sure it's not a kernel panic. I tried to set up netconsole, but wasn't able to get it to work.

I've tried reading cbmem logs from Grub; I wasn't able to figure out a way to exfiltrate them onto a proper computer, but the brief skim I did didn't show anything that looked egregious. If you can point me towards ways to get those logs out, I'd be happy to share them.

Edit: memtest86, FreeBSD, OpenBSD, and 9front work; NetBSD doesn't. See below for details.

Current suspicions
I'm wondering if perhaps something's up with the DRAM? I could see that leading to pseudorandom crashes, and that's the hardware component I'm least convinced other people have tested before. I could also see this being some sort of watchdog that's somehow not getting kicked properly, but I'm not familiar enough with x86_64 internals to know how to chase down that thought. I could also try booting other OSes? Given that it's failing while Linux is running, maybe trying out a BSD would lead to something different?

@zirblazer
Copy link

Have you tried building the standard Dasharo 1.1.3 binary (You can build the binary from source even if you're not a suscriptor) instead of rolling something custom like you are currently doing? https://docs.dasharo.com/unified/msi/building-manual/
Try with that first. SeaBIOS was never tested, nor upstream EDKII.

@esawady
Copy link
Author

esawady commented Aug 3, 2024

Just checked, a standard 1.1.3 build behaves identically to both the official 1.1.1 binary and my 1.1.3 CONFIG_PAYLOAD_SEABIOS=y build.

(And, since I now have an EFI build again with which to test it, the freeze also reproduces with DTS)

@esawady
Copy link
Author

esawady commented Aug 5, 2024

memtest86 (the PassMark one, version 7.3) fully passes, FreeBSD, OpenBSD, and 9front (!) all work. NetBSD and bunnix cause the same freeze-then-reboot as Linux. I would check Windows, but I don't have access to a Windows boot USB.

The last log message the NetBSD kernel prints is "uhid6 at uhidev0 reportid 252: input=63, output=63, feature=0", which (per cross-referencing with some OpenBSD kernel logs I happened to see) is the last motherboard port it's probing for, and would be followed by the case's ports.

Bunnix prints some messages about loading the kernel and boot modules, clears the screen, and prints a single "[" before freezing.

I'll try to see if I can bisect one of these further.

@esawady esawady changed the title Freeze shortly after loading Linux on MSI Z690-A DDR4 WiFi Freeze shortly after booting Linux or NetBSD on MSI Z690-A DDR4 WiFi Aug 5, 2024
@zirblazer
Copy link

zirblazer commented Aug 5, 2024

How are you flashing it? Have you tried flashing using MSI FlashBIOS with the standard 1.1.3 Dasharo binary? Do you changed any options from default in the setup menu (Like ME disabled)?

You are literally the first person in two years that reports freezing issues booting Linux, cause I don't recall any other. And there is nothing wrong on the base Hardware side. What devices you have connected? I only recall some strange major slow down or hang/freeze during POST with USB Flash Drives plugged into any of the 4 USB Ports that are on the same column on the back of the Motherboard, so avoid those. If you have a PS/2 Keyboard try disconnecting it too, since I recall some models being problematic, but PS/2 went though like two tweaking passes by the time of 1.1.3.

@esawady
Copy link
Author

esawady commented Aug 5, 2024

Full order of flashes:

  • 1.1.1, using DTS. Disabled ME, unfortunately before checking if it froze, since I didn't expect to have issues past the bootloader
  • SeaBIOS 1.1.3, using an external programmer. No changes in the setup menu
  • Stock MSI BIOS, using FlashBIOS
  • Standard 1.1.3, using FlashBIOS. No changes in the setup menu. I did eventually modify the boot order, to put the USB drive I'm booting test OSes from first, but things were broken before I did that

All of the Dasharo builds had the same freeze, and the MSI BIOS worked both before I flashed anything and when I reflashed it using FlashBIOS

The only thing I have plugged in is the USB drive I'm booting from, and I've tried putting it in various different USB ports, both on the motherboard and on the case. On occasion I've also had my (non-PS/2) keyboard plugged in, but that's not affected the freeze

Honestly the thing that really confuses me is that FreeBSD works

@esawady
Copy link
Author

esawady commented Aug 6, 2024

Aha! Found a suspicious entry in the FreeBSD dmesg (MADT: Ignoring local APIC ID 387323156 (too high)), and turns out adding nolapic to the Linux command line fixes the freeze.

I didn't see the 12300T on the HCL, could there be some sort of CPU-specific issue with LAPIC initialization?

@zirblazer
Copy link

The entire LGA 1700 lineup is composed of three different dies: Alder Lake C0 (8P + 8E), Alder Lake H0 (6P, no E), Raptor Lake B0 (8P + 16E). Different models with the same die aren't all that much different from each other, and there are examples of all of them working.
I suppose that you may want to upload some kind of log from a working Linux distribution (dmesg and cbmem, could be from DTS).

@esawady
Copy link
Author

esawady commented Aug 6, 2024

cbmem: https://p.d2evs.net/EvYC~E~M8vDtE.txt
dmesg (Alpine 3.20 extended, with maxcpus=7): https://p.d2evs.net/HOoTgVTGn5UFM.txt

I investigated a bit more, and found that maxcpus=7 (without nolapic) also fixes the freeze, and is (I think) less invasive of a change. Note that the 12300T has 8 threads, and maxcpus=8 freezes. All 8 threads show up in /proc/cpuinfo under the stock BIOS, as well as in FreeBSD under Dasharo (which doesn't freeze).

Are you sure those are the only three dies? The 12300T is 4P + 0E - I couldn't find anything below the 12400 on the HCL, with 6P + 0E.

@zirblazer
Copy link

zirblazer commented Aug 6, 2024

I was told that there is a MADT related patch from a mere two weeks ago to fix an ESXi issue that is literally your case: Dasharo/coreboot#538
If you want to test it, you may have to compile on your own.

Don't ask me why you're the first one to report freezes booting Linux due to this in about two years, unless it is a recent regression and you were compiling latest dev branch or something.

@esawady
Copy link
Author

esawady commented Aug 7, 2024

Built the dasharo-4.21 branch (which has that patch), partial success. Linux still freezes without maxcpus=7, though now substantially earlier in the boot - around 2s in, judging by the kernel logs that're on-screen when it freezes. Maybe it used to be waiting for some sort of timeout looking for cores before doing whatever's causing the freeze, and isn't anymore?

Update: it also doesn't reboot anymore, it just stays frozen there indefinitely. Not sure what that means.

On the bright side, NetBSD now boots, and the FreeBSD dmesg no longer has that MADT message.

Oh, and I have no clue either why I'm the first person with this issue. It happened on the official 1.1.1 binaries, and up until you suggested looking into that patch, I'd only been compiling the 1.1.3 release commit, so it wasn't any sort of recent regression on a dev branch. The only thing left that I can think of is some sort of CPU-specific bug, but... at this point I'm kinda stumped as to what that could be.

I guess maybe the next step could be to investigate why fixing the MADT caused Linux to freeze earlier?

@esawady esawady changed the title Freeze shortly after booting Linux or NetBSD on MSI Z690-A DDR4 WiFi Freeze shortly after booting Linux on MSI Z690-A DDR4 WiFi Aug 7, 2024
@esawady
Copy link
Author

esawady commented Aug 7, 2024

Telling Grub to use the stock BIOS's DSDT fixes the freeze as well. Since it might be relevant, the output of acpidump under dasharo-4.21 is https://p.d2evs.net/K0ds6Yi3QBd5b.txt. I can share the output of acpidump under the stock bios as well if that'd be helpful (and not muddy the waters licensing-wise).

@esawady
Copy link
Author

esawady commented Aug 8, 2024

Freeze also happens on booting with maxcpus=7 then hotplugging the 8th core (cpu7) after boot, which makes setting up netconsole much easier. dmesg logs (with acpi.debug_level and acpi.debug_layer both set to 0xffffffff), starting from echo 1 >/sys/devices/system/cpu/cpu7/online and ending after the freeze: https://p.d2evs.net/3nGrHiUDL4CvB.txt

Note that this is, afaict, identical to the equivalent logs from enabling cpu6 (which doesn't cause the freeze): https://p.d2evs.net/J9vntXA0bokmn.txt

Also note that the freeze only happens on enabling cpu7, irrespective of which other cores are online.

(Oh, and I tried some magic sysrq keys, none of them have any effect after the freeze)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants