Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

lemp10: system occasionally resets efi boot options with no user intervention #238

Closed
jthornhill opened this issue Sep 8, 2021 · 17 comments · Fixed by #260
Closed

lemp10: system occasionally resets efi boot options with no user intervention #238

jthornhill opened this issue Sep 8, 2021 · 17 comments · Fixed by #260
Assignees

Comments

@jthornhill
Copy link

I have a lemp10 running the most recent firmware. I installed Arch Linux on it and configured a boot option using efibootmgr.

The configuration looks like this, normally:

BootCurrent: 0005
Timeout: 2 seconds
BootOrder: 0005
Boot0000  UiApp	MemoryMapped(11,0x840000,0xffffff)/FvFile(462caa21-7614-4503-836e-8ab6f4662331)
Boot0001* Samsung SSD 970 EVO Plus 1TB 	PciRoot(0x0)/Pci(0x6,0x0)/Pci(0x0,0x0)/NVMe(0x1,00-25-38-55-11-90-F4-F6)N.....YM....R,Y.
Boot0005* Arch Linux	HD(1,GPT,2610ead8-2a0e-460b-a68c-d60a23387745,0x800,0x100000)/File(\vmlinuz-linux)[elided]

Occasionally, in circumstances I cannot consistently reproduce, the efi configuration resets, discarding all boot options except for UiApp and the generic internal NVME option (i.e., 0000 and 0001 in the list above). This will occur even if I deactivate UiApp. The system will display the blue "Boot option restored" screen and my changes are lost.

I've taken to keeping a boot disk around, and I re-add my efi menu option using efibootmgr (I have the precise command written to a small shell script I leave in the ESP partition).

I have been completely unable to detect a pattern to this behavior, other than that it seems to occur at the most inopportune times (like, when I'm in a hurry to get something done...). It can happen when doing a full shutdown / power on cycle, it can happen during a reboot, but in both cases sometimes I can go several boots without seeing the issue again.

@crawfxrd
Copy link
Member

crawfxrd commented Sep 8, 2021

Check if you are experiencing #218. If so, you can update coreboot and flash to resolve the issue until the fix is released.

"Boot Option Restored" is GRUB, tracked in #216.

@aesrentai
Copy link

I have hit this many times and it's quite irritating having to pull out a recovery USB. I also cannot detect any noticable pattern despite hitting this issue almost daily for over a week. The only solution is to use efibootmgr to restore the entries.

Keep in mind that I am currently using Qubes 4.1 beta, not PopOS, but I highly doubt this is an issue.

@crawfxrd
Copy link
Member

Considering this a duplicate unless it can be shown otherwise.

@aesrentai
Copy link

I built the latest coreboot + EDK2 (including the CMOS updates) and flashed and can verify that this issue still appears. In addition, the only reliable test case I could find (booting my Tails USB) continues to reproduce the error. I can confirm that this is a seperate error from #216. I'm not sure what logging output you would want to help verify that this issue has not been fixed, but I can get whatever you need.

@crawfxrd
Copy link
Member

Check it was cleared: #218 (comment)

Does it happen every time after you boot Tails?

Dump nvram (nvramtool -x) before shutdown and after again after powering on to compare them.

@crawfxrd crawfxrd reopened this Oct 13, 2021
@aesrentai
Copy link

I can confirm it happens every time I boot Tails. Booting from the USB, then running efibootmgr --verbose does not list my Qubes entry. Rebooting without the Tails USB inserted leads to the "No Bootable Media Detected" black screen. Using efibootmgr to restore the entry allows me to boot again. This is 100% reproducible as far as I can tell.

I'll dump the nvram and check SMMSTORE tomorrow, it's sort of a pain on both Qubes and Tails (Qubes because I'm not moving random files to my admin domain, and Tails doesn't work on my university wifi not to mention the fact that it's just hard to use in general).

One thing I can say is that since flashing the new firmware (with the updates) is that these random boot entry resets feel much less frequent, which is why I didn't reply until now (didn't know if I was just getting really unlucky). I feel like fixing the CMOS helped, but there's probably some similar edge case issue here.

@aesrentai
Copy link

aesrentai commented Oct 17, 2021

I had time to take a look at this this weekend. Attached are my two nvram and cbmem dumps. They're very underwhelming with no clear problems (nothing referencing CMOS, and the nvram appears to be identical).

Qubes-- WORKING
Tails-- NOT WORKING

I can definitely confirm that this is an issue (getting these dumps took ages) and that booting tails deletes all EFI entries with 100% reliability. For completionism, I also rebuilt my ROM and reflashed it to be 100% sure that I included the correct patches and, yes, it still happens.

This also really doesn't make sense because it's clear there is some error here but it should show up somewhere in the nvram dump as that's where EFI boot entries are stored.

Was anyone at S76 able to reproduce this? For me it's as simple as 1. Boot Tails USB 2. Reboot to no EFI entries :C

@crawfxrd
Copy link
Member

You have the output of cbmem but not cbmem -c, which dumps the coreboot boot log. That is were you will see if SMMSTORE is being cleared due to a CMOS read error. If so, it would require a coreboot patch to dump the CMOS table before it's reset to see exactly where it's been modified.

@crawfxrd crawfxrd self-assigned this Oct 20, 2021
@crawfxrd
Copy link
Member

Build: 2021-10-23_dfc8d23 [1]
This build logs the CMOS values to cbmem console before coreboot rewrites them [2].
It does not disable clearing the SMMSTORE.

Instructions for using this are here: #145 (comment)

@aesrentai
Copy link

Not working: https://pastebin.com/1TvZdCXz
Working: https://pastebin.com/zbP8feeV

This is also very underwhelming as they appear essentially identical. I should note that the not working dump is slightly unusual in that I only took the dump after I naturally got the "no boot entries" (ie, the no boot entries screen appeared randomly as it sometimes does, then I booted tails after it already appeared, as opposed to what normally happens which is my boot entries are still there but booting into Tails clears them).

I'll work with the EC tool sometime this weekend, I'm slammed for the rest of the week (yay midterms).

@crawfxrd
Copy link
Member

the no boot entries screen appeared randomly as it sometimes does, then I booted tails after it already appeared

If you rebooted after it happened you won't see the message. You need to be able to dump it on the boot that it happens to see the message.

@crawfxrd crawfxrd mentioned this issue Oct 28, 2021
28 tasks
@crawfxrd crawfxrd linked a pull request Nov 10, 2021 that will close this issue
28 tasks
@TheDarkTrumpet
Copy link

I'm not entirely sure I'm running into the same situation or not, but I think I tracked down a similar issue, and what causes it.

I found that if I improperly shut down (Hold power button) while on, it has a tendency of clearing something which causes all OSes to not be detected, at all.

I ended up pulling from the Master branch on firmware-open, and manually updating to what's there. The issue still occurs. I verified the new version was correctly applied.

I haven't found a workaround yet. But since I've been trying to install various OSes, and trying to work through issues such as sleep and the like not working properly, I've had to shut down by force a number of times. And each time, it clears everything out and makes reinstallation necessary at this point.

If my issue is different than other's, I can open a ticket. I'm still working at getting a working system, but once I do, I can run any scripts needed if needed.

@crawfxrd
Copy link
Member

To confirm, you are on a lemp10?

If you have a CH341A programmer and a spare machine (in case something goes wrong) you could try #260. It removes the CMOS option that erases the SMMSTORE and uses fault tolerant writes for the new format.

trying to work through issues such as sleep and the like not working properly

If this is related to TBT/USB4 (#199), that may also be fixed.

@TheDarkTrumpet
Copy link

Hi crawfxrd. Yeah, I have a lemp10 (Just arrived yesterday).

I can look into a CH341A, but I may need to contact S76 help desk first to verify anything I do won't void the warranty (paid 2 years of warranty).

In terms of stuff connected, not using USBC at all on the laptop. I'm using the barrel adapter and the USBA port (for Linux install). So far found this behavior appears to be consistent with Qubes, and Debian so far. I think I'll try installing PopOS and seeing if it happens there. To debug a bit further what appears to be causing it. What I do know for failures so far is:

  1. Qubes - Happened twice. I was testing the ability for the machine to sleep. So I'd try closing the lid, opening, and couldn't get a display. The second time I explicitly attempted to suspend the laptop and had the same behavior. In both cases, I held the power button down to force shut down the machine, and it resulted (both times) in everything being cleared out. Drive configuration was an LVM across both drives, pretty standard install.
  2. Debian - Happened once. Installed without LVM, but did use encryption on one drive. Installed to the drive that came with the machine (I added a second). At luks unlock screen, held down the power button before unlocking. Thinking I mistyped the password, and couldn't unlock.

I'll work at really pin pointing when it fails vs doesn't, and with what OSes. My end goal is to likely go with Qubes, but since I am waiting for 4.1 to be fully released (maybe end of month), I have no real rush to get this working right now. So if there's stuff you want me to try messing with, I'm totally on board with helping where I can.

@TheDarkTrumpet
Copy link

I did some more testing. I wonder if this may be an issue with the july release of the firmware, or, has to do with if the machine is shut down properly before pressing and holding the power button.

I tested this idea with both Pop-OS as well as Qubes. And, in both cases, it didn't clear the efi boot partition. So I find that interesting, and not entirely sure if it's the firmware update or if it's the initial shutdown. I'm thinking it may have been the firmware update - primarily because I recall in my very first time running into this, I installed qubes and restarted twice before trying sleeping.

@TheDarkTrumpet
Copy link

TheDarkTrumpet commented Nov 14, 2021

Did some more testing. Had a weird issue happen on this end. I was working in Qubes last night, worked fine. Shut down the laptop, got up in the morning, and qubes wouldn't load. EFI boot entry went away.

I checked issue 218. And yeah, this all appears to be a duplicate of of 218. The workaround in 218 got me back into Qubes. It looks like "the big one" is closer to being finished, so I may wait for that to complete, and simply reapply the workaround since I left the reinstallation of pop-os on that other drive anyways. So it's an annoyance, but not a big one (pun intended)

@aesrentai
Copy link

I'm likely going to switch to my Heads firmware very soon and it looks like this issue is not widespread and/or is going to be affected by the rebase. If you see fit to close this issue I've no problem with that. I've continued to experience it over the past few weeks but life has been hectic and I haven't had time to debug this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants