Cross-Cluster Canary Release #371

mumoshu · 2019-11-18T09:42:16Z

I prefer doing a blue-green cluster deployment over in-place and rolling-update of K8s control-plane and worker nodes. This is beneficial for me as, when the new cluster isn't functioning, I can easily roll it back without wasting a long time recreating the cluster from scratch.

The process is currently manual and error-prone. I'd like to automate this.

The process to be automated generally looks like that of standard canary releases, except that the canary target is the cluster or a kind of service endpoint(like the hostname of the LB serving your web or api traffic).

After experimenting Flagger, I'm now wondering if we can generalize Flagger to support cross-cluster canary release as well.

Have you ever considered about it?
Does this sound like a good idea?

I'd appreciate any feedback.

Anyways, Thanks a lot for maintaining this awesome project ☺️

Resolves fluxcd#371 --- This adds the support for `corev1.Service` as the `targetRef.kind`, so that we can use Flagger just for canary analysis and traffic-shifting on existing and pre-created services. Flagger doesn't touch deployments and HPAs in this mode. This is useful for keeping your full-control on the resources backing the service to be canary-released, including pods(behind a ClusterIP service) and external services(behind an ExternalName service). Major use-case in my mind are: - Canary-release a K8s cluster. You create two clusters and a master cluster. In the master cluster, you create two `ExternalName` services pointing to (the hostname of the loadbalancer of the targeted app instance in) each cluster. Flagger runs on the master cluster and helps safely rolling-out a new K8s cluster by doing a canary release on the `ExternalName` service. - You want annotations and labels added to the service for integrating with things like external lbs(without extending Flagger to support customizing any aspect of the K8s service it manages **Design**: A canary release on a K8s service is almost the same as one on a K8s deployment. The only fundamental difference is that it operates only on a set of K8s services. For example, one may start by creating two Helm releases for `podinfo-blue` and `podinfo-green`, and a K8s service `podinfo`. The `podinfo` service should initially have the same `Spec` as that of `podinfo-blue`. On a new release, you update `podinfo-green`, then trigger Flagger by updating the K8s service `podinfo` so that it points to pods or `externalName` as declared in `podinfo-green`. Flagger does the rest. The end result is the traffic to `podinfo` is gradually and safely shifted from `podinfo-blue` to `podinfo-green`. **How it works**: Under the hood, Flagger maintains two K8s services, `podinfo-primary` and `podinfo-canary`. Compared to canaries on K8s deployments, it doesn't create the service named `podinfo`, as it is already provided by YOU. Once Flagger detects the change in the `podinfo` service, it updates the `podinfo-canary` service and the routes, then analyzes the canary. On successful analysis, it promotes the canary service to the `podinfo-primary` service. You expose the `podinfo` service via any L7 ingress solution or a service mesh so that the traffic is managed by Flagger for safe deployments. **Giving it a try**: To give it a try, create a `Canary` as usual, but its `targetRef` pointed to a K8s service: ``` apiVersion: flagger.app/v1alpha3 kind: Canary metadata: name: podinfo spec: provider: kubernetes targetRef: apiVersion: core/v1 kind: Service name: podinfo service: port: 9898 canaryAnalysis: # schedule interval (default 60s) interval: 10s # max number of failed checks before rollback threshold: 2 # number of checks to run before rollback iterations: 2 # Prometheus checks based on # http_request_duration_seconds histogram metrics: [] ``` Create a K8s service named `podinfo`, and update it. Now watch for the services `podinfo`, `podinfo-primary`, `podinfo-canary`. Flagger tracks `podinfo` service for changes. Upon any change, it reconciles `podinfo-primary` and `podinfo-canary` services. `podinfo-canary` always replicate the latest `podinfo`. In contract, `podinfo-primary` replicates the latest successful `podinfo-canary`. **Notes**: - For the canary cluster use-case, we would need to write a K8s operator to, e.g. for App Mesh, sync `ExternalName` services to AppMesh `VirtualNode`s. But that's another story!

mumoshu mentioned this issue Nov 18, 2019

feat: Add support for targeting Kubernetes Service Kind #372

Merged

4 tasks

stefanprodan mentioned this issue Nov 25, 2019

Allow Flagger to target other kinds than Kubernetes Deployments #377

Closed

stefanprodan closed this as completed in #372 Nov 27, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cross-Cluster Canary Release #371

Cross-Cluster Canary Release #371

mumoshu commented Nov 18, 2019 •

edited

Loading

Cross-Cluster Canary Release #371

Cross-Cluster Canary Release #371

Comments

mumoshu commented Nov 18, 2019 • edited Loading

mumoshu commented Nov 18, 2019 •

edited

Loading