Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

talos 1.7.0: proxmox nocloud network configuration is not applied #8625

Closed
rgl opened this issue Apr 20, 2024 · 16 comments · Fixed by #8637
Closed

talos 1.7.0: proxmox nocloud network configuration is not applied #8625

rgl opened this issue Apr 20, 2024 · 16 comments · Fixed by #8637
Assignees

Comments

@rgl
Copy link
Contributor

rgl commented Apr 20, 2024

Bug Report

Description

While trying to migrate from talos 1.6.7 to 1.7.0 at https://github.com/rgl/terraform-proxmox-talos/tree/upgrade-to-talos-1.7.0 I've noticed that talos is using dhcp instead of picking up the network configuration from the nocloud cloud-init data disk.

Please note that branch diff only changes the talos version from 1.6.7 to 1.7.0, everything else is the same.

Logs

The proxmox cloud-init configuration, where you can see the IP address:

image

The talos console, where you can see it picked another IP address (e.g. from DHCP), instead of using the could-init data disk:

image

Environment

  • Talos version: 1.7.0
  • Kubernetes version: 1.29.4
  • Platform: nocloud in proxmox 8.1
@Sad-Soul-Eater
Copy link

I also can confirm that issue on a fresh v1.7.0 VM. But VMs updated using talosctl from v1.6.4 to v1.7.0 works fine.

@sanmai-NL
Copy link
Contributor

Have you regenerated the ISO image?

@Sad-Soul-Eater
Copy link

Have you regenerated the ISO image?

I have an issue with this image: https://factory.talos.dev/image/a7bcadbc1b6d03c0e687be3a5d9789ef7113362a6a1a038653dfd16283a92b6b/v1.7.0/nocloud-amd64.raw.xz

And an image from the latest release: https://github.com/siderolabs/talos/releases/download/v1.7.0/nocloud-amd64.raw.xz

Both my test VMs are cloned from the fresh template generated by Packer.

Or do you mean something another?

@smira
Copy link
Member

smira commented Apr 22, 2024

Please, submit full kernel logs (console logs) so that we can start looking into this issue.

@Sad-Soul-Eater
Copy link

Sad-Soul-Eater commented Apr 22, 2024

@smira

For both VMs set up identical settings, with 1.6.4 cloud-init works, with 1.7.0 doesn't

1.7.0 dmesg

1.6.4 dmesg

@smira
Copy link
Member

smira commented Apr 22, 2024

@Sad-Soul-Eater I don't see anything which immediately stands out as an issue

can you please do talosctl read /system/state/platform-network.yaml for both versions of Talos?

@Sad-Soul-Eater
Copy link

@smira

Talosctl returns an error on 1.7.0

talosctl --talosconfig talosconfig-170 read /system/state/platform-network.yaml
error reading: rpc error: code = Unknown desc = stat /system/state/platform-network.yaml: no such file or directory

That's from 1.6.4:

1.6.4
addresses:
    - address: 192.168.10.22/16
      linkName: eth0
      family: inet4
      scope: global
      flags: permanent
      layer: platform
    - address: 10.1.1.22/8
      linkName: eth1
      family: inet4
      scope: global
      flags: permanent
      layer: platform
links:
    - name: eth0
      logical: false
      up: true
      mtu: 0
      kind: ""
      type: netrom
      layer: platform
    - name: eth1
      logical: false
      up: true
      mtu: 0
      kind: ""
      type: netrom
      layer: platform
routes:
    - family: inet4
      dst: ""
      src: ""
      gateway: 192.168.1.1
      outLinkName: eth0
      table: main
      priority: 1024
      scope: global
      type: unicast
      flags: ""
      protocol: static
      layer: platform
hostnames:
    - hostname: test-164
      domainname: ""
      layer: platform
resolvers:
    - dnsServers:
        - 192.168.1.1
      layer: platform
timeServers: []
operators: []
externalIPs: []
metadata:
    platform: nocloud
    hostname: test-164
    instanceId: 511e1af3c355bc945cb24afd01be7ea3725569ca

@smira
Copy link
Member

smira commented Apr 22, 2024

ok, 1.7.0 no config is a bug for sure, but now need to figure out why :)

@smira
Copy link
Member

smira commented Apr 22, 2024

I think I know where the problem is, but the fix will take some time to be developed.

A workaround might be to try using network-based nocloud configuration or sticking with Talos 1.6

@smira smira self-assigned this Apr 23, 2024
@smira
Copy link
Member

smira commented Apr 23, 2024

@Sad-Soul-Eater if I were to ask you to test an image of Talos Linux to confirm if the bug got fixed or not, what would you prefer? disk image? ISO?

@Sad-Soul-Eater
Copy link

@smira Disk image

@smira
Copy link
Member

smira commented Apr 23, 2024

smira added a commit to smira/talos that referenced this issue Apr 23, 2024
With Talos 1.7+, more storage drivers are split as modules, so the
devices might not be discovered by the time platform config is going to
be loaded. Explicitly wait for udevd to settle down before trying to
probe a CD.

Fixes siderolabs#8625

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
@Sad-Soul-Eater
Copy link

@smira Cloud init works now

image

@smira
Copy link
Member

smira commented Apr 23, 2024

ok, thanks for testing the fix.

Talos v1.7.1 will have it backported

@Jorgevillada
Copy link

Hey @smira, thank you for the fix. There is a ETA for 1.7.1?

@smira
Copy link
Member

smira commented Apr 29, 2024

This week 🤞

smira added a commit to smira/talos that referenced this issue May 1, 2024
With Talos 1.7+, more storage drivers are split as modules, so the
devices might not be discovered by the time platform config is going to
be loaded. Explicitly wait for udevd to settle down before trying to
probe a CD.

Fixes siderolabs#8625

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit c5b59df)
smira added a commit to smira/talos that referenced this issue May 1, 2024
With Talos 1.7+, more storage drivers are split as modules, so the
devices might not be discovered by the time platform config is going to
be loaded. Explicitly wait for udevd to settle down before trying to
probe a CD.

Fixes siderolabs#8625

Signed-off-by: Andrey Smirnov <andrey.smirnov@siderolabs.com>
(cherry picked from commit c5b59df)
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 29, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants