-
Notifications
You must be signed in to change notification settings - Fork 263
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
🌱 Set provider ID through kubelet in ubuntu templates #1551
🌱 Set provider ID through kubelet in ubuntu templates #1551
Conversation
Some of the kustomizations and cluster-templates seem to be no longer used and can be removed. These were left over when the default template was switched to use the external cloud provider.
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: lentzi90 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
✅ Deploy Preview for kubernetes-sigs-cluster-api-openstack ready!
To edit notification comments on pull requests, go to your Netlify site settings. |
@tormath1 can you confirm if the flatcar template is used/needed or not? I removed it because there was no make target for it, so I assumed it was just forgotten. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
Thanks for the ping - I think it has been missed from the previous PR to promote the external-cloud-provider as the default one (https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1522/files). So we can remove it :) |
/test pull-cluster-api-provider-openstack-e2e-test |
I am curious on |
It comes with the instance data from cloud-init. Basically all of the kubeadm config will be injected through cloud-init (or ignition) and can have templated values that is filled in from the instance data. |
cf69675
to
ac86215
Compare
ac86215
to
b0186f4
Compare
Hmm I'm trying to figure out where the issue is for flatcar but not finding any relevant logs. Any clue @tormath1 ? |
Documenting some debugging progress here. The service that is responsible for setting the variables for flatcar is $ systemctl cat coreos-metadata
# /usr/lib/systemd/system/coreos-metadata.service
[Unit]
Description=Flatcar Metadata Agent
[Service]
Type=oneshot
RemainAfterExit=yes
Restart=on-failure
RestartSec=10
Environment=COREOS_METADATA_OPT_PROVIDER=--cmdline
ExecStart=/usr/bin/coreos-metadata ${COREOS_METADATA_OPT_PROVIDER} --attributes=/run/metadata/flatcar
ExecStartPost=/usr/bin/sed --in-place "s/AFTERBURN/COREOS/g ; s/AWS/EC2/g ; s/GCP/GCE/g" /run/metadata/flatcar
ExecStartPost=/usr/bin/ln -fs /run/metadata/flatcar /run/metadata/coreos
[Install]
RequiredBy=metadata.target
Alias=afterburn.service It ran successfully and filled in the $ systemctl status coreos-metadata
● coreos-metadata.service - Flatcar Metadata Agent
Loaded: loaded (/usr/lib/systemd/system/coreos-metadata.service; disabled; preset: disabled)
Active: active (exited) since Thu 2023-05-25 08:17:17 UTC; 3min 1s ago
Main PID: 1090 (code=exited, status=0/SUCCESS)
CPU: 108ms
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.369 INFO Fetch successful
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.369 INFO Fetching http://169.254.169.254/latest/meta-data/instance-id: Attempt #1
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.422 INFO Fetch successful
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.423 INFO Fetching http://169.254.169.254/latest/meta-data/instance-type: Attempt #1
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.475 INFO Fetch successful
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.475 INFO Fetching http://169.254.169.254/latest/meta-data/local-ipv4: Attempt #1
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.528 INFO Fetch successful
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.528 INFO Fetching http://169.254.169.254/latest/meta-data/public-ipv4: Attempt #1
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal coreos-metadata[1090]: May 25 08:17:17.578 INFO Fetch successful
May 25 08:17:17 cluster-e2e-6tznc9-control-plane-zxn8l.novalocal systemd[1]: Finished coreos-metadata.service. However, the instance ID is incorrect, as @tormath1 had already found out: $ cat /run/metadata/flatcar
COREOS_OPENSTACK_IPV4_LOCAL=10.6.0.65
COREOS_OPENSTACK_IPV4_PUBLIC=172.24.4.96
COREOS_OPENSTACK_INSTANCE_TYPE=m1.medium
COREOS_OPENSTACK_HOSTNAME=cluster-e2e-nfkf4l-control-plane-pvng4.novalocal
COREOS_OPENSTACK_INSTANCE_ID=i-00000005 As you an see, the metadata service gets the data from $ curl http://169.254.169.254/openstack/latest/meta_data.json
{"uuid": "36db0850-9770-41ee-8310-eca39bb62547", "public_keys": {"cluster-api-provider-openstack-sigs-k8s-io": "ssh-rsa AAA..."}, "keys": [{"name": "cluster-api-provider-openstack-sigs-k8s-io", "type": "ssh", "data": "ssh-rsa
AAA..."}], "hostname"
: "cluster-e2e-6tznc9-control-plane-zxn8l.novalocal", "name": "cluster-e2e-6tznc9-control-plane-zxn8l", "launch_index": 0, "availability_zone": "testaz1", "random_seed": "CM0...", "project_id": "d2f2d9be8c2544d3b417ecf581dee54b", "devices": [], "dedicated_cpus": []} The Could it be that this is an issue in the coreos-metadata service? I.e. it is fetching the wrong data? |
There are 2 problems here:
And a third problem: We should probably try to fix it in afterburn, tbh. @tormath1 what's the relationship between afterburn and coreos-metadata? |
I can open an issue in afterburn. (From what I can see the coreos-metadata is the same as afterburn.) For some reason they are checking the EC2 service when fetching from network. When using config-drive it combines some fields from the EC2 service and some from OpenStack, which seems really weird to me. |
Beat you to it: coreos/afterburn#930 |
Yes that works for me! I can split this PR into one for ubuntu and one for flatcar. |
b0186f4
to
88d8bf5
Compare
88d8bf5
to
6693c30
Compare
This commit sets the provider ID in the ubuntu templates. To avoid issues with potential mismatches between node names and openstack servers we can configure the kubelet to set the provider ID. Otherwise, the cloud controller will try to match nodes and servers based on just the name. The names can differ because of special characters, like dots. When this happens, the cloud controller will be unable to match them and thus believe that the node has no underlying server.
6693c30
to
8c7b85e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
# Fixme(lentzi90): This is here just to override the value set in the default | ||
# kustomization. It will be replaced with a value that works for flatcar in | ||
# https://github.com/kubernetes-sigs/cluster-api-provider-openstack/pull/1564 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You added this comment in both places other than the one where I asked for it 🤣 I'll try not to take it personally.
It's not worth updating imho. The important thing is that that somebody looking will see the comment, and they'll see one of these.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OMG sorry! I'm not sure how I managed to miss the first occurance 🤦 😂
@@ -20,6 +20,7 @@ spec: | |||
nodeRegistration: | |||
kubeletExtraArgs: | |||
cloud-provider: external | |||
provider-id: openstack:///'{{ instance_id }}' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea why we're not inheriting this from default? The answer to this question would be outside the scope of this PR, though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is the generated template! So it is actually inherited from default and that is why we see it here. 🙂 The without-lb patch is over here and does not have any provider-id set.
/hold cancel |
What this PR does / why we need it:
To avoid issues with potential mismatches between node names and
openstack servers we can configure the kubelet to set the provider ID.
Otherwise, the cloud controller will try to match nodes and servers
based on just the name. The names can differ because of special
characters, like dots. When this happens, the cloud controller will be
unable to match them and thus believe that the node has no underlying
server.
Also removes some unused (I think) cluster templates and kustomizations.
I will make a follow up PR for the flatcar templates.
Which issue(s) this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close the issue(s) when PR gets merged):Fixes #1547 or at least works around it.
Special notes for your reviewer:
TODOs:
/hold