-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(webhook): more robust cidr check for ippool #29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
pkg/webhook/ippool/validator.go
Outdated
var serviceCIDR string | ||
serviceCIDR, err = util.GetServiceCIDRFromNode(node) | ||
if err != nil { | ||
logrus.Warningf("could not find service cidr from node annoatation") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please wrap the err
to log, which has node name and more detailed information
pkg/webhook/ippool/validator.go
Outdated
sets := labels.Set{ | ||
util.ManagementNodeLabelKey: "true", | ||
} | ||
mgmtNodes, err := v.nodeCache.List(sets.AsSelector()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When webhook is just up, the nodeCache has a time to be empty, seems essential to list from remote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm curious if nodeCache could be empty initially, will other types of resources, e.g., NAD, VM, etc, also have the same issue? Changing all the caches to clients for webhooks seems too heavy. Or is there a good way to force it fill the cache when the webhook is up?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
others will fail, and reconciler will solve it; but here you have a fallback path ...
or when the list return length is 0, then retry to list from remote
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm, that makes sense. Invalid ippool objects could slip in at that specific moment. Thanks!
pkg/webhook/ippool/validator_test.go
Outdated
}, | ||
}, | ||
expected: output{ | ||
err: fmt.Errorf("could not create IPPool %s/%s because cidr %s overlaps cluster service cidr %s", testIPPoolNamespace, testIPPoolName, testCIDROverlap, testServiceCIDR), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cannot
or could not
? and a few followings
After discussion, I'll drop the runtime decision of service CIDR from the node's annotation because the footprint is too heavy, querying for every Node object whenever a new IPPool is created/updated. Instead, I will leave |
Load the cluster-wide service CIDR from the following sources: - "rke2.io/node-args" annotation in management Node objects - Webhook command argument "--service-cidr" - Default value "10.53.0.0/16" Comparing the CIDR of the user-input IPPool object with the cluster-wide one. If overlap, reject the create/update requests. Signed-off-by: Zespre Chang <zespre.chang@suse.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
It's overkill to retrieve the cluster's service CIDR in runtime since it's rarely changed and almost the same in every Harvester deployment. Revert the relevant code and let users to input the service CIDR string from the webhook's command line argument to remain flexibility. The default value is still `10.53.0.0/16`. Signed-off-by: Zespre Chang <zespre.chang@suse.com>
IMPORTANT: Please do not create a Pull Request without creating an issue first.
Problem:
We relied on a node annotation called
rke2.io/node-args
to extract the cluster-wide service CIDR string--service-cidr
. It's an RKE2-specific annotation. Currently, there's no other good way to get such information via Kubernetes API calls. Plus, the implementation has a flaw iterating through all the nodes: worker nodes do not have such a flag so the validation procedure will fail if the cluster has any pure worker nodes.On the other hand, we need a way for admin to specify the cluster-wide service CIDR if the cluster is not RKE2.
Solution:
Load the cluster-wide service CIDR from the following sources:
rke2.io/node-args
annotation in management Node objects--service-cidr
10.53.0.0/16
Comparing the CIDR of the user-input IPPool object with the cluster-wide one. If overlap happens, reject the create/update requests.
Related Issue:
harvester/harvester#5153
Test plan:
harvester-vm-dhcp-controller
add-ontest-net
kubectl