-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update calico to use the correct CIDR for pods #2768
Conversation
Hi @ottoyiu. Thanks for your PR. I'm waiting for a kubernetes member to verify that this patch is reasonable to test. If it is, they should reply with I understand the commands that are listed here. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
e145d8c
to
37e3cb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Comment for you
@@ -353,7 +353,8 @@ func (b *BootstrapChannelBuilder) buildManifest() (*channelsapi.Addons, map[stri | |||
|
|||
if b.cluster.Spec.Networking.Calico != nil { | |||
key := "networking.projectcalico.org" | |||
version := "2.1.1" | |||
// 2.1.1-kops.1 = 2.1.1 with CIDR change | |||
version := "2.1.1-kops.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you will need 2.1.2-kops.1 not 2.1.1-kops.1
@k8s-bot ok to test |
Do we know what happens when we change this value? |
@justinsb: you mean when a user modifies the |
Currently, we are using .NonMasqueradeCIDR in the wrong fashion. We should be using .KubeControllerManager.ClusterCIDR to prevent IP collision with Service IPs.
@justinb @chrislovecnm I have tested these changes. The upgrade/migration path for an existing cluster to the new non-overlapping CIDR requires a manual step (as shown in main PR notes) which to some can be seen as sub-optimal. However, it could also be seen as ideal, as this change will have zero effect on existing running clusters unless the manual step is ran. I would like to hear what you think is the proper approach in this case. I'm leaning towards just documenting the migration step for existing clusters if the cluster operator(s) deem the migration necessary. |
@ottoyiu I think documenting and printing a warning on screen may be helpful for the user. Any other ides? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
@chrislovecnm just came by from vacation. What's the proper way to contribute to a release note? Should I submit a PR on justinb's release notes branch? |
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Also document the migration procedure necessary for existing calico clusters
Release notes for Calico Pod CIDR changes made in #2768
@blakebarnett will fix #3018 ASAP. Oversight on my part; my apologies. #3019 seems to be a much trickier problem to solve, since its tied with the calico addon version. Kops already thinks the calico has been upgraded... will document it in the ticket instead. |
Also document the migration procedure necessary for existing calico clusters
This PR is in relations to the conversations had in #1171.
For reference, here's the default values for the individual components.
Currently, we are using
.NonMasqueradeCIDR
in the wrong fashion. Weshould be using
.KubeControllerManager.ClusterCIDR
instead to prevent IPcollision with Service IPs.
Note: The
.NonMasqueradeCIDR
is the cluster's base subnet, and should not be directly used by the components because of overlap.Scenarios Tested
New Cluster Creation for k8s 1.6 - PASS
Deployed a new Kubernetes 1.6.4 cluster using this branch with the changes. Pods are now being allocated an IP within the 100.96.0.0/11 range (as defined by
.KubeControllerManager.ClusterCIDR
vs.NonMasqueradeCIDR
Tested:
Observation:
Cluster Upgrade for k8s 1.6 (cluster first created using kops-1.6.2)
Deployed a new Kubernetes 1.6.4 cluster using kops-1.6.2, then ran kops built from this branch:
kops update cluster --yes
to change the CIDR.Observation: All pods are still functional and running with the existing Pod IPs. Pod to Service IP, Pod to Pod, Pod to External (google.com) tested.
Upgrade strategy attempts
kops rolling-update --force --yes
to do rolling restart - FAILObservation: Existing Pods continue to be functional with its old Pod IPs (which may or may not lie within the newly defined IP range) while nodes are being cycled. A period of several minutes where DNS is not responsive in Pods, and shortly recovers. Not sure if it has anything to do with the CIDR change. Seems more to do with the "rolling" update. New Nodes are still assigned old CIDR range. Service IP range remains unchanged.
Deleting running calico/node pods from calico daemonset - FAIL
kubectl get pods --namespace kube-system | grep calico-node | awk '{print $1}' | xargs kubectl delete pod --namespace kube-system
Observation: Existing Pods continue to be functional with its old Pod IPs. New Pods being rescheduled on existing Nodes are still getting a Pod IP in the old IP range. New Nodes are still assigned old CIDR range.
Failure Investigation
Calico only uses the
CALICO_IPV4POOL_CIDR
to create a default IPv4 pool if a pool doesn't exist already:https://github.com/projectcalico/calicoctl/blob/v1.3.0/calico_node/startup/startup.go#L463
Because of this, we need to run two jobs that execute calicoctl manually to migrate on the new CIDR - one to create a new IPv4 pool that we want, and one to delete the existing IP pool that we no longer want. This is to be executed after executing one of the listed upgrade strategies:
By doing that, new Pods will get new IPs in the right range, and existing Pods with existing IPs continue to function.
Operations on k8s 1.5 using the same tests as above
This change is