🌱 Set provider ID through kubelet in ubuntu templates #1551

lentzi90 · 2023-05-15T14:18:35Z

What this PR does / why we need it:

To avoid issues with potential mismatches between node names and
openstack servers we can configure the kubelet to set the provider ID.
Otherwise, the cloud controller will try to match nodes and servers
based on just the name. The names can differ because of special
characters, like dots. When this happens, the cloud controller will be
unable to match them and thus believe that the node has no underlying
server.

Also removes some unused (I think) cluster templates and kustomizations.

I will make a follow up PR for the flatcar templates.

Which issue(s) this PR fixes (optional, in fixes #<issue number>(, fixes #<issue_number>, ...) format, will close the issue(s) when PR gets merged):
Fixes #1547 or at least works around it.

Special notes for your reviewer:

Please confirm that if this PR changes any image versions, then that's the sole change this PR makes.
I tagged this as 🌱 not sure if it makes sense

TODOs:

squashed commits
if necessary:
- includes documentation
- adds unit tests

/hold

Some of the kustomizations and cluster-templates seem to be no longer used and can be removed. These were left over when the default template was switched to use the external cloud provider.

k8s-ci-robot · 2023-05-15T14:18:40Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: lentzi90

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [lentzi90]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

netlify · 2023-05-15T14:18:41Z

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

Name	Link
🔨 Latest commit	`8c7b85e`
🔍 Latest deploy log	https://app.netlify.com/sites/kubernetes-sigs-cluster-api-openstack/deploys/64706adae285560008bd93ba
😎 Deploy Preview	https://deploy-preview-1551--kubernetes-sigs-cluster-api-openstack.netlify.app
📱 Preview on mobile	Toggle QR Code... Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify site settings.

lentzi90 · 2023-05-15T14:19:41Z

@tormath1 can you confirm if the flatcar template is used/needed or not? I removed it because there was no make target for it, so I assumed it was just forgotten.

mdbooth

/lgtm

tormath1 · 2023-05-15T16:27:15Z

@tormath1 can you confirm if the flatcar template is used/needed or not? I removed it because there was no make target for it, so I assumed it was just forgotten.

Thanks for the ping - I think it has been missed from the previous PR to promote the external-cloud-provider as the default one (https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1522/files). So we can remove it :)

jichenjc · 2023-05-16T01:24:32Z

/test pull-cluster-api-provider-openstack-e2e-test

jichenjc · 2023-05-16T01:35:39Z

I am curious on {{ instance_id }} comes from.. is this a Cluster API definition and CAPO feed for it somewhere?
I didn't see it used currently in CAPO now ..so curious how the template become real provider ID..

lentzi90 · 2023-05-16T05:34:19Z

I am curious on {{ instance_id }} comes from.. is this a Cluster API definition and CAPO feed for it somewhere? I didn't see it used currently in CAPO now ..so curious how the template become real provider ID..

It comes with the instance data from cloud-init. Basically all of the kubeadm config will be injected through cloud-init (or ignition) and can have templated values that is filled in from the instance data.

lentzi90 · 2023-05-16T07:42:03Z

Hmm I'm trying to figure out where the issue is for flatcar but not finding any relevant logs. Any clue @tormath1 ?

lentzi90 · 2023-05-25T09:03:11Z

Documenting some debugging progress here.

The service that is responsible for setting the variables for flatcar is coreos-metadata:

$ systemctl cat coreos-metadata                                                                                                                              
# /usr/lib/systemd/system/coreos-metadata.service                                                                                                                                                          
[Unit]                                                                                                                                                                                                     
Description=Flatcar Metadata Agent                                                                                                                                                                         
                                                                                                                                                                                                           
[Service]                                                                                                                                                                                                  
Type=oneshot                                                                                                                                                                                               
RemainAfterExit=yes                                                                                                                                                                                        
Restart=on-failure                                                                                                                                                                                         
RestartSec=10                                                                                                                                                                                              
Environment=COREOS_METADATA_OPT_PROVIDER=--cmdline                                                                                                                                                         
ExecStart=/usr/bin/coreos-metadata ${COREOS_METADATA_OPT_PROVIDER} --attributes=/run/metadata/flatcar                                                                                                      
ExecStartPost=/usr/bin/sed --in-place "s/AFTERBURN/COREOS/g ; s/AWS/EC2/g ; s/GCP/GCE/g" /run/metadata/flatcar                                                                                             
ExecStartPost=/usr/bin/ln -fs /run/metadata/flatcar /run/metadata/coreos                                                                                                                                   
                                                                                                                                                                                                           
[Install]                                                                                                                                                                                                  
RequiredBy=metadata.target                                                                                                                                                                                 
Alias=afterburn.service

It ran successfully and filled in the /run/metadata/flatcar file:

$ systemctl status coreos-metadata                                                                                                                           
● coreos-metadata.service - Flatcar Metadata Agent                                                                                                                                                         
     Loaded: loaded (/usr/lib/systemd/system/coreos-metadata.service; disabled; preset: disabled)                                                                                                          
     Active: active (exited) since Thu 2023-05-25 08:17:17 UTC; 3min 1s ago                                                                                                                                
   Main PID: 1090 (code=exited, status=0/SUCCESS)                                                                                                                                                          
        CPU: 108ms                                                                                                                                                                                         
                                                                                                                                                                                                           
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.369 INFO Fetch successful                                                                          
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.369 INFO Fetching http://169.254.169.254/latest/meta-data/instance-id: Attempt #1                  
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.422 INFO Fetch successful                                                                          
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.423 INFO Fetching http://169.254.169.254/latest/meta-data/instance-type: Attempt #1                
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.475 INFO Fetch successful                                                                          
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.475 INFO Fetching http://169.254.169.254/latest/meta-data/local-ipv4: Attempt #1                   
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.528 INFO Fetch successful                                                                          
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.528 INFO Fetching http://169.254.169.254/latest/meta-data/public-ipv4: Attempt #1                  
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.578 INFO Fetch successful                                                                          
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal systemd[1]: Finished coreos-metadata.service.

However, the instance ID is incorrect, as @tormath1 had already found out:

$ cat /run/metadata/flatcar                                                                                                                                  
COREOS_OPENSTACK_IPV4_LOCAL=10.6.0.65                                                                                                                                                                      
COREOS_OPENSTACK_IPV4_PUBLIC=172.24.4.96                                                                                                                                                                   
COREOS_OPENSTACK_INSTANCE_TYPE=m1.medium                                                                                                                                                                   
COREOS_OPENSTACK_HOSTNAME=cluster-e2e-nfkf4l-control-plane-pvng4.novalocal                                                                                                                                 
COREOS_OPENSTACK_INSTANCE_ID=i-00000005

As you an see, the metadata service gets the data from http://169.254.169.254/latest/meta-data. But there is another layer of data at http://169.254.169.254/openstack/latest/meta-data:

$ curl http://169.254.169.254/openstack/latest/meta_data.json
{"uuid": "36db0850-9770-41ee-8310-eca39bb62547", "public_keys": {"cluster-api-provider-openstack-sigs-k8s-io": "ssh-rsa AAA..."}, "keys": [{"name": "cluster-api-provider-openstack-sigs-k8s-io", "type": "ssh", "data": "ssh-rsa
 AAA..."}], "hostname"
: "cluster-e2e-6tznc9-control-plane-zxn8l.novalocal", "name": "cluster-e2e-6tznc9-control-plane-zxn8l", "launch_index": 0, "availability_zone": "testaz1", "random_seed": "CM0...", "project_id": "d2f2d9be8c2544d3b417ecf581dee54b", "devices": [], "dedicated_cpus": []}

The uuid has the correct value that we are after. When I compare with the cloud-init instance data I can also there see the wrong instance ID in one place, but it is correct in a few other places. I'm not sure where to go from here.

Could it be that this is an issue in the coreos-metadata service? I.e. it is fetching the wrong data?
Or could it be that the devstack is not configured correctly?
Maybe it is all correct and we just have to manually get the correct instance ID in a different way?

mdbooth · 2023-05-25T09:53:38Z

There are 2 problems here:

coreos-metadata (aka afterburn?) is using OpenStack's EC2 metadata service rather than OpenStack's native metadata service.
It seems there's a bug in ec2 metadata which means it's returning i- rather than the actual uuid.

And a third problem:
3. ec2 metadata is deep frozen, hopelessly bitrotted, and chances are nobody's going to fix it.

We should probably try to fix it in afterburn, tbh. @tormath1 what's the relationship between afterburn and coreos-metadata?

lentzi90 · 2023-05-25T10:33:59Z

I can open an issue in afterburn. (From what I can see the coreos-metadata is the same as afterburn.) For some reason they are checking the EC2 service when fetching from network. When using config-drive it combines some fields from the EC2 service and some from OpenStack, which seems really weird to me.

https://github.com/coreos/afterburn/blob/f22131c775094c559baf5c5d712425f341f3556e/src/providers/openstack/configdrive.rs#L146-L151

mdbooth · 2023-05-25T11:01:32Z

I can open an issue in afterburn. (From what I can see the coreos-metadata is the same as afterburn.) For some reason they are checking the EC2 service when fetching from network. When using config-drive it combines some fields from the EC2 service and some from OpenStack, which seems really weird to me.

https://github.com/coreos/afterburn/blob/f22131c775094c559baf5c5d712425f341f3556e/src/providers/openstack/configdrive.rs#L146-L151

Beat you to it: coreos/afterburn#930

mdbooth · 2023-05-25T11:17:22Z

Because they're separate templates, we have the option to fix this in Flatcar independently right? I wonder if we move on with the Ubuntu fix for now and raise a separate issue to fix it for Flatcar. @lentzi90 @tormath1 Thoughts?

lentzi90 · 2023-05-25T11:28:34Z

Yes that works for me! I can split this PR into one for ubuntu and one for flatcar.

kustomize/v1alpha7/flatcar/patch-flatcar.yaml

This commit sets the provider ID in the ubuntu templates. To avoid issues with potential mismatches between node names and openstack servers we can configure the kubelet to set the provider ID. Otherwise, the cloud controller will try to match nodes and servers based on just the name. The names can differ because of special characters, like dots. When this happens, the cloud controller will be unable to match them and thus believe that the node has no underlying server.

mdbooth

/lgtm

mdbooth · 2023-05-26T09:38:39Z

kustomize/v1alpha7/flatcar/patch-flatcar.yaml

+          # Fixme(lentzi90): This is here just to override the value set in the default
+          # kustomization. It will be replaced with a value that works for flatcar in
+          # https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1564


You added this comment in both places other than the one where I asked for it 🤣 I'll try not to take it personally.

It's not worth updating imho. The important thing is that that somebody looking will see the comment, and they'll see one of these.

OMG sorry! I'm not sure how I managed to miss the first occurance 🤦 😂

mdbooth · 2023-05-26T09:39:50Z

templates/cluster-template-without-lb.yaml

@@ -20,6 +20,7 @@ spec:
        nodeRegistration:
          kubeletExtraArgs:
            cloud-provider: external
+            provider-id: openstack:///'{{ instance_id }}'


Any idea why we're not inheriting this from default? The answer to this question would be outside the scope of this PR, though.

This is the generated template! So it is actually inherited from default and that is why we see it here. 🙂 The without-lb patch is over here and does not have any provider-id set.

lentzi90 · 2023-05-26T09:52:57Z

/hold cancel

Cleanup unused files

1c7fb2e

Some of the kustomizations and cluster-templates seem to be no longer used and can be removed. These were left over when the default template was switched to use the external cloud provider.

k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels May 15, 2023

k8s-ci-robot requested review from mdbooth and seanschneeweiss May 15, 2023 14:18

k8s-ci-robot added approved Indicates a PR has been approved by an approver from all required OWNERS files. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. labels May 15, 2023

mdbooth reviewed May 15, 2023

View reviewed changes

k8s-ci-robot assigned mdbooth May 15, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 15, 2023

lentzi90 force-pushed the lentzi90/remove-unused-templates branch from cf69675 to ac86215 Compare May 16, 2023 05:40

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 16, 2023

lentzi90 force-pushed the lentzi90/remove-unused-templates branch from ac86215 to b0186f4 Compare May 16, 2023 05:52

lentzi90 force-pushed the lentzi90/remove-unused-templates branch from b0186f4 to 88d8bf5 Compare May 25, 2023 11:39

lentzi90 changed the title ~~🌱 Set provider ID through kubelet in templates~~ 🌱 Set provider ID through kubelet in ubuntu templates May 25, 2023

lentzi90 mentioned this pull request May 25, 2023

🌱 Templates: Set provider ID for flatcar directly via kubelet #1564

Closed

3 tasks

lentzi90 force-pushed the lentzi90/remove-unused-templates branch from 88d8bf5 to 6693c30 Compare May 25, 2023 12:05

lentzi90 commented May 25, 2023

View reviewed changes

kustomize/v1alpha7/flatcar/patch-flatcar.yaml Show resolved Hide resolved

lentzi90 force-pushed the lentzi90/remove-unused-templates branch from 6693c30 to 8c7b85e Compare May 26, 2023 08:16

mdbooth reviewed May 26, 2023

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label May 26, 2023

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label May 26, 2023

k8s-ci-robot merged commit b21411b into kubernetes-sigs:main May 26, 2023

lentzi90 deleted the lentzi90/remove-unused-templates branch May 26, 2023 11:32

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🌱 Set provider ID through kubelet in ubuntu templates #1551

🌱 Set provider ID through kubelet in ubuntu templates #1551

lentzi90 commented May 15, 2023 •

edited

Loading

k8s-ci-robot commented May 15, 2023

netlify bot commented May 15, 2023 •

edited

Loading

lentzi90 commented May 15, 2023

mdbooth left a comment

tormath1 commented May 15, 2023 •

edited

Loading

jichenjc commented May 16, 2023

jichenjc commented May 16, 2023

lentzi90 commented May 16, 2023

lentzi90 commented May 16, 2023

lentzi90 commented May 25, 2023

mdbooth commented May 25, 2023

lentzi90 commented May 25, 2023

mdbooth commented May 25, 2023

mdbooth commented May 25, 2023

lentzi90 commented May 25, 2023

mdbooth left a comment

mdbooth May 26, 2023

lentzi90 May 26, 2023

mdbooth May 26, 2023

lentzi90 May 26, 2023

lentzi90 commented May 26, 2023

🌱 Set provider ID through kubelet in ubuntu templates #1551

🌱 Set provider ID through kubelet in ubuntu templates #1551

Conversation

lentzi90 commented May 15, 2023 • edited Loading

k8s-ci-robot commented May 15, 2023

netlify bot commented May 15, 2023 • edited Loading

✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!

lentzi90 commented May 15, 2023

mdbooth left a comment

Choose a reason for hiding this comment

tormath1 commented May 15, 2023 • edited Loading

jichenjc commented May 16, 2023

jichenjc commented May 16, 2023

lentzi90 commented May 16, 2023

lentzi90 commented May 16, 2023

lentzi90 commented May 25, 2023

mdbooth commented May 25, 2023

lentzi90 commented May 25, 2023

mdbooth commented May 25, 2023

mdbooth commented May 25, 2023

lentzi90 commented May 25, 2023

mdbooth left a comment

Choose a reason for hiding this comment

mdbooth May 26, 2023

Choose a reason for hiding this comment

lentzi90 May 26, 2023

Choose a reason for hiding this comment

mdbooth May 26, 2023

Choose a reason for hiding this comment

lentzi90 May 26, 2023

Choose a reason for hiding this comment

lentzi90 commented May 26, 2023

lentzi90 commented May 15, 2023 •

edited

Loading

netlify bot commented May 15, 2023 •

edited

Loading

tormath1 commented May 15, 2023 •

edited

Loading