-
Notifications
You must be signed in to change notification settings - Fork 564
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Node stuck in boot loop on Talos v1.8.0 upgrade #9369
Comments
Thanks for reporting this, but we'll need some kernel logs to understand what is wrong. You can also try adding |
How do I get any kernel logs if the machine never boots? I've added the |
You have GRUB menu where you can select previous version of Talos if you did an upgrade. There's no boot to maintenance mode option, I'm not sure what you're talking about. |
Yeah, unfortunately I booted to the previous version (v1.7.6) and tried again. This was a bad decision, as I now don't have that option anymore. It did boot into "the old" version just fine and came back to a healthy state. But in any case, I can't get any kernel logs for v1.8.0 from the v1.7.6, right? The boot menu has a "reset and return to maintenance mode" option. |
IMG_2046.movMaybe this short video of the boot process can help shed a light on the issue? |
I'm not quite sure what this might be, I guess only serial console can help here. It seems to panic around device detection process, might be a bug in Linux which will be fixed in the follow-up releases. |
@PGimenez people in the Home-operations discord report that they needed to add @smira do you think this could also be my issue? The MS-01 runs either 13th gen or 12th gen Intel CPUs. Mine is a 12th gen. |
I'm confused why it would do a reboot still, not having drivers for i915, and moreover for intel-mei shouldn't lead to a reboot. |
I'm just relaying information. It may very well be that these two drivers aren't related to the issue at all. |
had same issue with an intel n100 device |
Had the same issue with the same hardware as @tpretz. Using the drivers mentioned above also allowed me to boot into the system. |
If anyone could submit the logs from the successful boot ( |
sure, let me know if you need anything else |
I also had the same issue and resolved it with the drivers specified in this thread |
I can confirm that building your own image on https://factory.talos.dev with the extensions
fixed the boot loop issue on my 12th gen Alderlake N100. EDIT: |
Maybe you didn't pass your custom image to the machineconfig? if you don't specify an |
Ahh, I see. Well, yes that fixed it for me. Thanks! :) |
Same here, also forgot to put the custom image with extensions. Maybe talosctl upgrade could warn the users to not to put vanilla if you already have extensions on the node. |
I finally got around to try I'll close this issue for now, as adding |
Talos bootloops on my control plane nodes without these extensions Didn't test without mei extension on workers, but they're similar enough see: siderolabs/talos#9369
Thank you, it feels like i915 without firmware might lead to the reboot because |
Why was the fix not included in the new version 1.8.1? |
As far as I know, this is not considered a Talos "issue". The drivers were probably dropped from the new Linux kernel (just guessing here), which means you'll have to add them using the Talos Image Factory. Just add the |
Okay thanks for the update. I just ask, because I find it very handy to bring it in the default image and don't have to specify and build it everytime on our own. |
Right, agreed. This is caused by the decision made by the Talos guys, to not build a lot of different images (as per the v1.8.0 release notes). |
You are right. It works only with Intel i915 |
This fixed my issue too, thanks @rothgar for mentioning this 💪 This is the second issue that I've hit when upgrading Talos from v1.7 to v1.8 |
If you boot from a USB, you must add the above plugin to boot. customization:
systemExtensions:
officialExtensions:
- siderolabs/i915-ucode
- siderolabs/mei
As of now, you need to go to https://factory.talos.dev/ and boot from a USB with the above plugin installed. You should also use the pre-installed image for installation. This approach resolves the infinite boot loop issue. |
Bug Report
While upgrading my cluster to Talos v1.8.0, my first node (haven't tried upgrading the other two) is stuck in a boot loop. My machine gets past the initial Linux boot splash screen (showing the number of cores etc.), but then reboots just before reaching the dashboard.
Environment
Talos version: v1.8.0
Kubernetes version:
Client Version: v1.31.1
Kustomize Version: v5.4.2
Server Version: v1.30.3
Platform:
Intel Nuc 12
The text was updated successfully, but these errors were encountered: