fix(webhook): more robust cidr check for ippool #29

starbops · 2024-03-08T10:54:47Z

IMPORTANT: Please do not create a Pull Request without creating an issue first.

Problem:

We relied on a node annotation called rke2.io/node-args to extract the cluster-wide service CIDR string --service-cidr. It's an RKE2-specific annotation. Currently, there's no other good way to get such information via Kubernetes API calls. Plus, the implementation has a flaw iterating through all the nodes: worker nodes do not have such a flag so the validation procedure will fail if the cluster has any pure worker nodes.

On the other hand, we need a way for admin to specify the cluster-wide service CIDR if the cluster is not RKE2.

Solution:

Load the cluster-wide service CIDR from the following sources:

~~rke2.io/node-args annotation in management Node objects~~
Webhook command argument --service-cidr
Default value 10.53.0.0/16

Comparing the CIDR of the user-input IPPool object with the cluster-wide one. If overlap happens, reject the create/update requests.

Related Issue:

harvester/harvester#5153

Test plan:

Install and enable the harvester-vm-dhcp-controller add-on

apiVersion: harvesterhci.io/v1beta1
kind: Addon
metadata:
  labels:
    addon.harvesterhci.io/experimental: "true"
  namespace: harvester-system
  name: harvester-vm-dhcp-controller
spec:
  chart: harvester-vm-dhcp-controller
  enabled: true
  repo: https://charts.harvesterhci.io
  valuesContent: |
    image:
      repository: starbops/harvester-vm-dhcp-controller
      tag: fix-5153-head
    agent:
      image:
        repository: starbops/harvester-vm-dhcp-agent
        tag: fix-5153-head
    webhook:
      image:
        repository: starbops/harvester-vm-dhcp-webhook
        tag: fix-5153-head
  version: 0.3.0

Prepare a VM Network (NAD) named test-net

Create an IPPool object associated to the VM Network and overlapped with the cluster service CIDR using kubectl

apiVersion: network.harvesterhci.io/v1alpha1
kind: IPPool
metadata:
  namespace: default
  name: test-net
spec:
  ipv4Config:
    serverIP: 10.53.0.2
    cidr: 10.53.0.0/16
    pool:
      start: 10.53.0.100
      end: 10.53.0.200
  networkName: default/test-net

The creation request should be rejected by the validating admission webhook:

Error from server (InternalError): error when creating "STDIN": admission webhook "validator.harvester-system.harvester-vm-dhcp-controller-webhook" denied the request: Internal error occurred: could not create IPPool default/test-net because cidr 10.53.0.0/16 overlaps cluster service cidr 10.53.0.0/16

w13915984028

LGTM, thanks.

w13915984028 · 2024-03-08T13:44:41Z

pkg/webhook/ippool/validator.go

+		var serviceCIDR string
+		serviceCIDR, err = util.GetServiceCIDRFromNode(node)
+		if err != nil {
+			logrus.Warningf("could not find service cidr from node annoatation")


please wrap the err to log, which has node name and more detailed information

w13915984028 · 2024-03-08T13:55:23Z

pkg/webhook/ippool/validator.go

+	sets := labels.Set{
+		util.ManagementNodeLabelKey: "true",
+	}
+	mgmtNodes, err := v.nodeCache.List(sets.AsSelector())


When webhook is just up, the nodeCache has a time to be empty, seems essential to list from remote

I'm curious if nodeCache could be empty initially, will other types of resources, e.g., NAD, VM, etc, also have the same issue? Changing all the caches to clients for webhooks seems too heavy. Or is there a good way to force it fill the cache when the webhook is up?

others will fail, and reconciler will solve it; but here you have a fallback path ...

or when the list return length is 0, then retry to list from remote

Hmmm, that makes sense. Invalid ippool objects could slip in at that specific moment. Thanks!

w13915984028 · 2024-03-08T14:01:11Z

pkg/webhook/ippool/validator_test.go

+				},
+			},
+			expected: output{
+				err: fmt.Errorf("could not create IPPool %s/%s because cidr %s overlaps cluster service cidr %s", testIPPoolNamespace, testIPPoolName, testCIDROverlap, testServiceCIDR),


cannot or could not ? and a few followings

starbops · 2024-03-11T03:30:46Z

After discussion, I'll drop the runtime decision of service CIDR from the node's annotation because the footprint is too heavy, querying for every Node object whenever a new IPPool is created/updated. Instead, I will leave 10.53.0.0/16 the default value for service CIDR and allow users to configure it via the webhook binary's argument (also configurable from chart value).

cc @w13915984028

Load the cluster-wide service CIDR from the following sources: - "rke2.io/node-args" annotation in management Node objects - Webhook command argument "--service-cidr" - Default value "10.53.0.0/16" Comparing the CIDR of the user-input IPPool object with the cluster-wide one. If overlap, reject the create/update requests. Signed-off-by: Zespre Chang <zespre.chang@suse.com>

w13915984028

LGTM, thanks.

bk201

lgtm!

mingshuoqiu

LGTM

cmd/webhook/root.go

It's overkill to retrieve the cluster's service CIDR in runtime since it's rarely changed and almost the same in every Harvester deployment. Revert the relevant code and let users to input the service CIDR string from the webhook's command line argument to remain flexibility. The default value is still `10.53.0.0/16`. Signed-off-by: Zespre Chang <zespre.chang@suse.com>

starbops mentioned this pull request Mar 8, 2024

[BUG] vm-dhcp-agent failed to communicate with kube-apiserver harvester/harvester#5153

Closed

starbops marked this pull request as ready for review March 8, 2024 13:18

starbops requested review from bk201, w13915984028 and mingshuoqiu March 8, 2024 13:19

w13915984028 approved these changes Mar 8, 2024

View reviewed changes

bk201 approved these changes Mar 11, 2024

View reviewed changes

bk201 self-requested a review March 11, 2024 04:39

starbops force-pushed the fix-5153 branch from 1601dc3 to 69c6fd2 Compare March 11, 2024 07:01

starbops requested a review from w13915984028 March 11, 2024 07:02

w13915984028 approved these changes Mar 11, 2024

View reviewed changes

bk201 approved these changes Mar 11, 2024

View reviewed changes

mingshuoqiu approved these changes Mar 12, 2024

View reviewed changes

cmd/webhook/root.go Outdated Show resolved Hide resolved

starbops force-pushed the fix-5153 branch from 69c6fd2 to b686036 Compare March 12, 2024 02:57

starbops merged commit 5d790ef into harvester:main Mar 12, 2024
5 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(webhook): more robust cidr check for ippool #29

fix(webhook): more robust cidr check for ippool #29

starbops commented Mar 8, 2024 •

edited

Loading

w13915984028 left a comment

w13915984028 Mar 8, 2024

w13915984028 Mar 8, 2024

starbops Mar 8, 2024 •

edited

Loading

w13915984028 Mar 8, 2024 •

edited

Loading

starbops Mar 8, 2024

w13915984028 Mar 8, 2024

starbops commented Mar 11, 2024

w13915984028 left a comment

bk201 left a comment

mingshuoqiu left a comment

fix(webhook): more robust cidr check for ippool #29

fix(webhook): more robust cidr check for ippool #29

Conversation

starbops commented Mar 8, 2024 • edited Loading

w13915984028 left a comment

Choose a reason for hiding this comment

w13915984028 Mar 8, 2024

Choose a reason for hiding this comment

w13915984028 Mar 8, 2024

Choose a reason for hiding this comment

starbops Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

w13915984028 Mar 8, 2024 • edited Loading

Choose a reason for hiding this comment

starbops Mar 8, 2024

Choose a reason for hiding this comment

w13915984028 Mar 8, 2024

Choose a reason for hiding this comment

starbops commented Mar 11, 2024

w13915984028 left a comment

Choose a reason for hiding this comment

bk201 left a comment

Choose a reason for hiding this comment

mingshuoqiu left a comment

Choose a reason for hiding this comment

starbops commented Mar 8, 2024 •

edited

Loading

starbops Mar 8, 2024 •

edited

Loading

w13915984028 Mar 8, 2024 •

edited

Loading