-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Dual boot options keep appearing and disappearing in dual boot on one drive setup #218
Comments
This is very similar to a problem I'm running into. I have a lemp9. I have two drives, One has pop-os and the other has windows. I was running firmware rev 2020-09-17-f10af76 from September to May. Then I updated to 2021-03-11_50eedc2 because I wanted the battery thresholds. Now it loses the windows boot loader. I can repair it through a windows USB key and it will work again for a couple reboots. Sometimes the record will persist if I boot pop-os, and sometimes it won't. Doesn't seem to matter if I am booting windows to windows, it will still lose the record sometimes. The windows drive has bitlocker turned on, but I was able to reproduce the problem after turning off bitlocker (decrypting the drive). Obviously, running without it presents security issues on a portable device. When it's working efibootmgr reports... BootCurrent: 0001 When it's not working I get |
@mbk5631 Does the systemd-boot menu show Windows in its menu like it did for @Raikiri? If it's showing up there, I would recommend just using that. If not, you may need to copy some Windows bootloader files into the Pop!_OS EFI partition so you can create a systemd-boot entry manually (or get the automatic one to show up.) There's a community post here that includes a copy command that might work: https://pop-planet.info/forums/threads/copy-the-microsoft-bootloader-into-pops-efi-beginners-guide.357/ |
@jacobgkau I can select Windows using the systemd boot but it's not really a viable solution because the next step is to re-enter the BitLocker recovery key each time. It doesn't read the key from the TPM chip. My work-around is each time I boot into Linux, which is most of the time, I run sudo efibootmgr -c -L "Windows Boot Manager" -l "\EFI\Microsoft\Boot\bootmgfw.efi" -d /dev/nvme0n1 -p 1 Then the next reboot will show the Windows boot manager in the firmware boot menu. |
BTW just to clarify this is a regression. It worked in firmware from last September. |
I'm observing similar symptoms on darp7 on firmware 2021-04-07_236914e, which is I believe the last released. In my case the boot menu "reverts" to state with just single single entry named after the SSD drive. My workaround is also re-add entries using efibootmgr, which stays there for couple reboot cycles and then disappears again. I was not able to pinpoint and verify what is causing this "reversion", but I have a vague suspicion it has something to do with PopOS updater. |
When it happens, check if SMMSTORE was cleared by CMOS variable.
You can also try |
ok I built coreboot this morning. Here's the output of coreboot output of make -C coreboot/util/cbmem was SMMSTORE: CMOS reset, clearing store Full text for cbmem is below. I have the output for smmstore - since I can't read most of it I'll mail it to you rather than posting it. cbmem console
|
ok, I'm really not the slightest bit familiar with this code but I can only find (via grep) one place this message is generated, and in that same place is the only place I see the variable preserve_smmstore being set. clear_store_on_reset appears to be the only place preserve_smmstore is set. If preserve_smmstore does not return success and if the return message was not CMOS_CHECKSUM_INVALID then the preserve lval is set in the first block. That preserve value is never written. The only case in which preserve_smmstore is set is if the return value when fetched was CMOS_CHECKSUM_INVALID, at which point smmstore_clear_region is called and if that's successful, preserve_smmstore is set. So on our machines is get_option(preserve_smmstore) returning something besides CB_SUCCESS and besides CB_CMOS_CHECKSUM_INVALID? The only get_option I found in the tree is in payloads/libpayload/drivers/options.c. It only returns 0 and 1 and doesn't appear to use any of the enumerations. Is it the right one??? I could be totally wrong here... Anyway back to my day job |
CMOS values can be viewed with
|
ok thanks! so right now it returns pop-os:~/firmware-open$ sudo ./coreboot/util/nvramtool/nvramtool -a So the value does exist. There's a cmos_get_option in option.c. Is that the one that's supposed to be called? Ignore my earlier comments if that's true. looks like there's an inline in option.h wrapping it with get_option. Missed that earlier. The only cmos option the boot complains about not finding is poweron_after_fail, which isn't this one. BTW you check for checksum after finding the variable and only then do you return CMOS_CHECKSUM_INVALID So I have to assume it's finding the variable then failing the checksum test. The only way the reset gets executed. (there’s no printk for successful search) If CMOS_CHECKSUM_INVALID is returned to clear_store_on_reser then smstore_clear_region is called. If that succeeds then preservE_smmstore is overwritten. If that call does not succeed then the previous value of preserve_smmstore is not changed and remains present. The failure of smmstore_clear_region is not logged, so based on the code it was successful or CMOS_CHECKSUM_INVALID in fact was not returned. cmos_checksum_valid does not log but the only two conditions I see for calling smmstore_clear_region are if the variable is not present or if CMOS_CHECKSUM_INVALID is returned. It’s now present, and since it was not logged as a missing variable on reboot then the value returned should have been CMOS_CHECKSUM_INVALID. Do you agree? |
If I ask nvramtool for the checksum (nvramtool -c) it gives me 0x1 $ sudo ./nvramtool -c If I dump everything I get several checksums that are not 0x1... ? $ sudo ./nvramtool -Y enumerations checksums |
|
Can force the issue by syncing time on Windows. |
Windows is writing the CMOS RTC century byte (0x32) and invalidating the checksum. coreboot loads the default of 0 for the option and SMMSTORE is cleared. |
FTR I'm seeing this issue even on machine which has only PopOS on it, and the UEFI menu contains "extra" boot options for iPXE and UEFI shell. I.e. no Windows involved, and the only OS which gets booted up is PopOS. |
Can you reliably reproduce it by booting only Pop? You can try dumping the CMOS ( |
On the other hand, I'm almost 100% sure UEFI Boot Manager items sometimes disappear even if the only OS started on that machine is Pop. I've nuked Pop installation and replaced it with Arch yesterday, but I have already witnessed Boot Manager items disappearing as well, so I will monitor it and try the
Do I need to do some setup? (I'm executing it under root and |
@pspacek This issue was identified and fixed in system76/coreboot#72, which is why the issue is marked as closed. It won't be part of the next firmware update, but once an update dated later than August 20th is released, then the fix should be included. In the meantime, you could try building and flashing updated firmware locally on your system to confirm if that PR fixed the issue. First, install Rust nightly from rustup.rs, then run these commands:
The flashing script will power off the machine. After flashing, you can revert to normal firmware by using the normal firmware manager to "update." If you're still seeing the issue with the latest version of |
Hi @jacobgkau is building and flashing my own version of the open-firmware still the best solution, or is there a new official firmware I can install? My system (a galp5, support ticket: 63754) is still on:
|
Hi! When your instructions say: |
Oh man, we're still waiting on a new galp5 firmware release! @zancas yes, I believe you're looking for |
OK, I think it's completely nuts that this still isn't actually fixed. It's been a problem on lemp10 for about a year now. Notably there was a firmware release that fixed this bug, but introduced other show stopper bugs, and so it was pulled. This is the kind of thing that will dissuade me from buying System76 ever again. |
Yeah I got to that point already. Been 15 months now I've been putting up with this. I'm not about to build my own firmware - concerned about bricking or winding up with worse - and I have work to do. At this point I'm trying to decide whether I'm ready to buy a new laptop about 3 years prematurely, It was ok but not ideal as long as I could boot into Windows fairly easily depending on whether I needed to do a meeting or not but, well, geez! Probably will go back to Mac. |
@mbk5631 out of curiosity, what do you boot into Windows for? Every work meeting requires Windows? That sounds like hell... |
@curiousercreative Anything involving Zoom or other meeting software or bluetooth. Meeting software eventually bogs down. Somehow the wifi is sometimes flakey on some less common AP manufacturers (like what we have in this building) where it winds up renegotiating regularly. Neither pipewire nor pulseaudio work well for anything but playback. Anything requiring me to produce a word, or excel, or powerpoint that someone not using LibreOffice is going to see - somehow I almost always find the features that aren't 99.9% compatible. This has gotten somewhat better with a Microsoft Office online subscription, but that's costing me more than just buying the base product for a Mac. Adobe product support (I don't need much). I have tried using VMs and wine, but the level of tinkering is high and even though I have a separate drive for Windows, it just doesn't work well. Linux is my preferred environment - it's the one in which I'm most comfortable, but I'm also quite comfortable on Macs (can always drop into bash). |
A note - I don't mind booting into Windows for some things, and the issue with the AP flakiness appears to have been resolved recently. But this defect where I have to keep rewriting the boot record and then escape past the encryption key that doesn't get found on first reboot is a nuisance on something they should have patched a year ago. The boot record rewrite is scripted, so I just have to boot linux/log in to get Windows to work again - but that's bad enough as it slows me down - but Windows also doesn't always find the encryption key so when that happens I have to escape out of the "enter your bitlocker key" page and then it will find the key on the second pass. |
@mbk5631 are you on the pop-os chat server? There's a System76 channel, but I'd be curious if others have a pleasant dual boot experience. I'd search and discuss on the chat server to see if solutions or workarounds exist. Generally, it sounds like you're having all kinds of pains. I started migrating from macOS about two years ago and I wasted a lot of effort trying to keep a macOS VM running and using it a couple times a week. Ultimately, my experience improved immensely when I dropped it altogether. Most Windows or macOS users will not migrate to Linux when they discover that a piece of software doesn't work well on their OS, they'll just ignore it and notify whoever cares that it doesn't work on their OS. Whether or not you're feeling that bold is for you to decide. More helpful perhaps, I wouldn't recommend running Linux Desktop with your software requirements. I understand the desire to use Linux, against all odds even, but I bet your life will improve dramatically once you amicably part ways with either Linux or the software that's not running well or flat out incompatible. Ultimately, we'd all like to see these bugs squashed but I understand S76 is quite limited on firmware resources in comparison to outstanding issues and I don't imagine this is as high priority as several other longstanding issues that are likely impacting more S76 users. |
@mbk5631 and not to discount your experience by any means, just more data points, I haven't experienced any WiFi flakiness over the past year and a half. We had a bluetooth kernel bug preventing suspend a couple months ago, but I've been back to solid bluetooth lately. I'm not in meetings all day and my work uses Google Meet (I have it installed as a Chromium PWA), but I've never had any trouble with Zoom when I do participate in them. The only problem I can relate to is while I was running Wayland (I ran for about a year until recently) screen sharing wasn't working in Zoom last time I tried. Running X11 for the past couple months on my galp5 and screen sharing works as expected as does general video call performance. In contrast to others in a conference room, I see consistent load and fan speeds throughout long video calls (in part because I don't have any effects on my own image). All of this is to say, there may be remedies. The MS Office and Adobe software on the other hand, that's more what my last comment is about with cutting your losses and recognizing that Linux Desktop might not be the best fit. |
I went into this knowing I needed to dual boot. I'm mostly ok with doing that when I need to - I spent 20 years as a linux-based developer, and I've done that before. Dual boot worked fine for the first few months I had my lemp9. Then it was broken. The fix sits in git, but doing the build of just that one fix and applying it to my laptop, on which I depend, is not something I'm about to do. At this point I do need MS and Adobe far more than I did when I bought the laptop, so the result is I'm spending more time in Windows than I used to. But frankly, I should be able to do that as needed and not have to rewrite the boot menu each time. Never had trouble with that in a laptop before. |
Thank you all for the feedback. I'll reopen this issue, but I must ask that we remain focused on tracking the bug on this bug tracking platform, not venting frustrations. |
@jthornhill my intention was not user blame. I think we're all on the same page here and validate your experience and others as something that needs fixing. Here's hoping we see a fix sooner than later, I'd look into it myself if it were impacting me. |
On my Oryx Pro I wanted to have Windows 10 and to keep the stock PopOS on the same physical drive. I know it's not advised and I know I was sort of going the hard mode when I chose to, but I still gave it a try.
So before installing Windows 10, open firmware boot menu had these options:
PopOS
PopOS recovery
Samsung EVO 1TB (name of my ssd device)
Right after installing Windows 10 I checked the boot menu again and saw this:
PopOS
PopOS recovery
Windows Boot Loader
Samsung EVO 1TB (name of my ssd device)
Then I ran a Windows Update and only this option was left:
Samsung EVO 1TB
So windows 10 updater effectively has erased both its entry from the boot menu as well as the popos entry for some reason.
Choosing this single available option boots me into systemd loader where I can select either PopOS or Windows Boot Loader, but all other entries were gone from the bios boot options.
So I ran windows boot recovery thingie from its installation drive and Windows Boot Loader re-appeared in the firmware boot menu after that, but some time later, it disappeared again.
So right now I'm at the stage where I still have just the physical drive listed as my only boot option in the bios, which boots me into systemd loader where I am able to log in either Windows or PopOS, but I still expect the firmware boot menu to work as well, but its options are missing.
Am I doing something wrong? Is it working as intended? Is there a way to restore (and keep) boot options available from the firmware?
The text was updated successfully, but these errors were encountered: