-
Notifications
You must be signed in to change notification settings - Fork 558
Document required manual calico 2.6.3 -> calico 3.1.1 upgrade when upgrading from < 0.17.0-provisioned clusters #3208
Conversation
…er upgrading a cluster created with acs-engine prior to 0.17.0
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: oivindoh Assign the PR to them by writing The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
Codecov Report
@@ Coverage Diff @@
## master #3208 +/- ##
=======================================
Coverage 52.31% 52.31%
=======================================
Files 103 103
Lines 15458 15458
=======================================
Hits 8087 8087
Misses 6643 6643
Partials 728 728 |
examples/networkpolicy/README.md
Outdated
|
||
acs-engine releases starting with 0.17.0 now produce an addon manifest for calico in `/etc/kubernetes/addons/calico-daemonset.yaml` contaning calico 3.1.x, and an `updateStrategy` of `RollingUpdate`. | ||
|
||
To get up and running with the new version of calico after upgrading a cluster with acs-engine `0.17.0` and up, follow these steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you @oivindoh! Looks great. Could you please add something like "As per the Calico v0.3.1 release notes, Calico v0.3.x includes breaking changes:
Some highlights include:
You must upgrade to Calico v2.6.5 before you can upgrade to v3.0.1 (see https://docs.projectcalico.org/v3.0/getting-started/kubernetes/upgrade/)
Calico deployments that access the etcd datastore directly must complete
a one-time migration.
You must convert any customized Calico manifests via calicoctl convert
before you can use them with v3.0.1.
Here some instructions to get up and running with the new version of Calico […]" here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So people can have context on the breaking changes (and know it's a calico breaking change, not acs-engine)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I added some text to be more explicit about this being due to calico breaking changes and not acs-engine 👍
…ico, point to documentation for calico k8s upgrade and calico convert.
@dtzar could you sanity-check this? |
@oivindoh Thanks for the docs. Although I'm sure the steps you list work, this does not follow the upgrade guidance provided by Calico. I know there are significant changes to the RBAC config as well as the manifest - see the changes in my PR which did the upgrade to get an idea. Soo... I wouldn't approve this guidance as-is. It should map directly to what is on Calico's webpage. You could truncate what you have and just punt to the Calico upgrade webpage listing the fact we use the Kuberntes Datastore, policy only configuration - or be more specific following the flow/guidance they have which will be specific to acs-engine deployments. |
@dtzar I guess I'm not entirely clear on where I diverge from the linked upgrade guidance - applying the 3.x manifest with node/cni changed to 2.6.10 and 2.0.6 effectively performs step 1, and handles upgrade to 2.6.5+ in the process. Applying that manifest again with node/cni 3.1.1 performs the rest of the steps required, keeping cluster-cidr as appropriate. What I wasn't clear on after reading Calico docs (and what I wanted to document here) was the how of upgrading to 2.6.5+, given that I now had a cluster with a 3.1.1 manifest to be managed as an addon and calico 2.6.3 actually running in the cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let me know if this is more clear now or if you still have questions
examples/networkpolicy/README.md
Outdated
|
||
To get up and running with the new version of calico after upgrading a cluster with acs-engine `0.17.0` and up, follow these steps: | ||
1. To update to `2.6.5+` in preparation of an upgrade to 3.1.x as specified, edit `/etc/kubernetes/addons/calico-daemonset.yaml` on a master node, replacing `calico/node:v3.1.1` with `calico/node:v2.6.10` and `calico/cni:v3.1.1` with `calico/cni:v2.0.6`. Run `kubectl apply -f /etc/kubernetes/addons/calico-daemonset.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The place where you diverge from the upgrade guidance is here because you are only replacing the image versions, not the entire template/manifest. The calico-daemonset.yaml
is a merge of the Calico RBAC + manifest files and needs more changes than just bumping the image versions.
examples/networkpolicy/README.md
Outdated
|
||
`YYYY-MM-DD HH:MM:SS.FFF [INFO][n] health.go 150: Overall health summary=&health.HealthReport{Live:true, Ready:true}` | ||
|
||
3) Edit `/etc/kubernetes/addons/calico-daemonset.yaml` on the master node again, replacing `calico/node:v2.6.10` with `calico/node:v3.1.1` and `calico/cni:v2.0.6` with `calico/cni:v3.1.1`. Run `kubectl apply -f /etc/kubernetes/addons/calico-daemonset.yaml`. | ||
2. To complete the upgrade to 3.1.x, edit `/etc/kubernetes/addons/calico-daemonset.yaml` on the master node again, replacing `calico/node:v2.6.10` with `calico/node:v3.1.1` and `calico/cni:v2.0.6` with `calico/cni:v3.1.1`. Run `kubectl apply -f /etc/kubernetes/addons/calico-daemonset.yaml`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same thing here.
@oivindoh I read through the document again end-to-end and I think I realize what you're suggesting now. The missing explanation is that I think you're saying to push through with the actual acs-engine upgrade (which will break Calico) and then go into each master node and do the following steps, correct? If so, then yes the only thing I'm unsure of is if there are any potential problems with the temporary upgrade step you have of 2.6.10 using a 3.x RBAC/Policy template. Feels like a safer route would be to upgrade to 2.6.10 on the existing cluster (so the manifest is much closer), then upgrade acs-engine, then the new 3.x template might just work as-is then. |
@dtzar Exactly - however upgrading the cluster shouldn’t break calico because it will happily continue to run using its old 2.6.3 definitions until the moment you apply the add on manifest manually, since it’s not set to anything but ensureexists, effectively keeping addon manager off the table. I should update the document and remove the propagate to each master comment, since the masters will already have the updated manifest anyway (we have opted for 3.1.3 instead of the supplied 3.1.1 for richer network policy support, so needed to propagate). I didn’t do any form of extended validation after applying 2.6.10 other the quick manual tests to see or services were still available and no alerts were triggering before updating to 3.x, so I can’t really vouch for zero negative effects during that step, but I could not detect anything FWIW. 3.1.1 and 3.1.3 have definitely been chugging along happily after the upgrade process. |
@oivindoh - I wouldn't recommend using the new yaml manifest with 2.6.10 images since it does have significant manifest/rbac changes. Would you be willing to test out this flow and update the document?
Also - I just issued #3257 which upgrades to the latest 3.1.3 so you can update your document accordingly and not have to do another step. |
I think given that acs-engine itself does not provide cluster lifecycle configuration management (such that this would be taken care of more elegantly), and that the versions of Calico will continue to grow, let's just accept this doc as-is. It might not be 100% precise, but it has a good chance of helping someone out in the future who is running an acs-engine-built Calico 2 cluster and who is unable to tear down and recreate a new cluster. |
* 'master' of https://github.com/Azure/acs-engine: (59 commits) Docs: Update user guide list to include Windows, update description of clusters (Azure#3473) update to Azure CNI v1.0.10 (Azure#3551) Adding 'make dev' equivalent for Windows (Azure#3471) print out ubuntu ver in e2e (Azure#3555) fix an issue where networkPlugin was not defined correctly when using calico or cilium (Azure#3271) Bump ginkgo to a tagged release (Azure#3554) Reenable AzureFile tests for Windows on K8s 1.11.1, resolves Azure#3439 (Azure#3496) removing rbac error checking from merge fn (Azure#3530) Change dns healthcheck to look at external domain (Azure#3282) DOCUMENTATION: Fix Documented Default Value for clusterSubnet (Azure#3474) Document required manual calico 2.6.3 -> calico 3.1.1 upgrade when upgrading from < 0.17.0-provisioned clusters (Azure#3208) revert --image-pull-policy=IfNotPresent for win (Azure#3553) --image-pull-policy=IfNotPresent for kubectl run commands (Azure#3552) Kubernetes: --max-pods=30 should be Azure CNI-only (Azure#3543) disable Azure CNI network monitor addon default (Azure#3550) only do az vm list for k8s (Azure#3540) Retire Swarm E2E for PR test coverage (Azure#3539) retire Azure CDN for container image repository proxying (Azure#3535) removed datadisk to allow scale after upgrade (Azure#3482) Pump k8s-azure-kms version (Azure#3531) ...
* master: (59 commits) Docs: Update user guide list to include Windows, update description of clusters (Azure#3473) update to Azure CNI v1.0.10 (Azure#3551) Adding 'make dev' equivalent for Windows (Azure#3471) print out ubuntu ver in e2e (Azure#3555) fix an issue where networkPlugin was not defined correctly when using calico or cilium (Azure#3271) Bump ginkgo to a tagged release (Azure#3554) Reenable AzureFile tests for Windows on K8s 1.11.1, resolves Azure#3439 (Azure#3496) removing rbac error checking from merge fn (Azure#3530) Change dns healthcheck to look at external domain (Azure#3282) DOCUMENTATION: Fix Documented Default Value for clusterSubnet (Azure#3474) Document required manual calico 2.6.3 -> calico 3.1.1 upgrade when upgrading from < 0.17.0-provisioned clusters (Azure#3208) revert --image-pull-policy=IfNotPresent for win (Azure#3553) --image-pull-policy=IfNotPresent for kubectl run commands (Azure#3552) Kubernetes: --max-pods=30 should be Azure CNI-only (Azure#3543) disable Azure CNI network monitor addon default (Azure#3550) only do az vm list for k8s (Azure#3540) Retire Swarm E2E for PR test coverage (Azure#3539) retire Azure CDN for container image repository proxying (Azure#3535) removed datadisk to allow scale after upgrade (Azure#3482) Pump k8s-azure-kms version (Azure#3531) ...
…grading from < 0.17.0-provisioned clusters (#3208)
…grading from < 0.17.0-provisioned clusters (#3208)
…grading from < 0.17.0-provisioned clusters (Azure#3208)
…grading from < 0.17.0-provisioned clusters (Azure#3208)
What this PR does / why we need it:
Upgrading from clusters deployed by acs-engine < 0.17.0 and calico enabled had calico 2.6.3. When upgrading such a cluster with 0.17.0 and later, calico addon manifest is 3.1.x, and a migration is not supported for releases prior to 2.6.5, so we need to perform some manual steps to get up and running on 3.1.x. See Issue #3191
Which issue this PR fixes (optional, in
fixes #<issue number>(, fixes #<issue_number>, ...)
format, will close that issue when PR gets merged): fixes #fixes #3191
Special notes for your reviewer:
I'm not entirely certain where this belongs (whether in examples/networkpolicy or examples/k8s-upgrade.
If applicable:
Release note: