-
Notifications
You must be signed in to change notification settings - Fork 742
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
feat: Add support for targeting Kubernetes Service Kind #372
Conversation
@mumoshu this looks really good. I'm at KubeCon so I'll do a proper review after I'm getting back from US. As for e2e testing, this should be implemented similar to the blue/green one see https://github.com/weaveworks/flagger/blob/master/test/e2e-kubernetes.sh and https://github.com/weaveworks/flagger/blob/master/test/e2e-kubernetes-tests.sh |
@stefanprodan Thanks! JFYI I've added an E2E test suite for this feature in e39f45b. After you're back from KubeCon, I'd appreciate any suggestion or request on where to add docs and unit/integration tests, too. Re tests, does an another version of |
Uses fluxcd/flagger#372 to let Flagger manages traffic between two Kubernetes services each is managed by a Helm release.
@mumoshu I have a couple of questions/suggestions:
|
That sounds good! I'll address those in the coming days.
Thanks for all the explanations! Honestly speaking I only had a vague idea on how analysis on service canary work in practice, but reading your explanation made it clearer... So, I'd say the only way would be to use custom queries. You use an external operator or some umbrella helm chart or helmfile to compose custom queries (or their templates). This service canary feature is pretty low-level compared to that for K8s deployments. That means various assumptions we've made so far on Flagger - like mapping between targetRef.name to query labels - doesn't make sense on service canaries. Would it justify the requirement for users to write their own custom queries?
Honestly, it won't work as you might have expected. I did confirm that a rollback happens (while testing Flagger with my own crossover via the As you've pointed out earlier, it won't work with the out-of-box request success rate and duration queries, as there's no way to match custom queries to the metrics for the canary service.
Correct.
Absolutely. I was thinking that I would change the custom queries part of a canary object before changing the targeted service, so that Flagger knows the exact custom queries on metrics from the next/canary service. If it doesn't work, we need an additional source of metrics labels to query per canary. It's kind of like annotating And use those from within a query template that would look like:
Exactly. To avoid that, I believe we may need another k8s operator, or a complex CI pipeline that generates manifests accordingly. |
@mumoshu I've extracted the Kubernetes operations to an interface and created a factory for it. Once merged you have to pull the changes into this PR and add a PS: this is merged in master, please see the controller interface. |
@stefanprodan Thanks! That was close, I've already done similar things on my end. I'll try to adapt to yours soon. A few points that I wanted to confirm:
|
@mumoshu can you please join the Flagger slack so we can chat there? Thanks |
@mumoshu sorry I had to fix a bug and it conflicts with your branch. Can you please fix the conflict, the change are very lite https://github.com/weaveworks/flagger/pull/380/files |
Resolves fluxcd#371 --- This adds the support for `corev1.Service` as the `targetRef.kind`, so that we can use Flagger just for canary analysis and traffic-shifting on existing and pre-created services. Flagger doesn't touch deployments and HPAs in this mode. This is useful for keeping your full-control on the resources backing the service to be canary-released, including pods(behind a ClusterIP service) and external services(behind an ExternalName service). Major use-case in my mind are: - Canary-release a K8s cluster. You create two clusters and a master cluster. In the master cluster, you create two `ExternalName` services pointing to (the hostname of the loadbalancer of the targeted app instance in) each cluster. Flagger runs on the master cluster and helps safely rolling-out a new K8s cluster by doing a canary release on the `ExternalName` service. - You want annotations and labels added to the service for integrating with things like external lbs(without extending Flagger to support customizing any aspect of the K8s service it manages **Design**: A canary release on a K8s service is almost the same as one on a K8s deployment. The only fundamental difference is that it operates only on a set of K8s services. For example, one may start by creating two Helm releases for `podinfo-blue` and `podinfo-green`, and a K8s service `podinfo`. The `podinfo` service should initially have the same `Spec` as that of `podinfo-blue`. On a new release, you update `podinfo-green`, then trigger Flagger by updating the K8s service `podinfo` so that it points to pods or `externalName` as declared in `podinfo-green`. Flagger does the rest. The end result is the traffic to `podinfo` is gradually and safely shifted from `podinfo-blue` to `podinfo-green`. **How it works**: Under the hood, Flagger maintains two K8s services, `podinfo-primary` and `podinfo-canary`. Compared to canaries on K8s deployments, it doesn't create the service named `podinfo`, as it is already provided by YOU. Once Flagger detects the change in the `podinfo` service, it updates the `podinfo-canary` service and the routes, then analyzes the canary. On successful analysis, it promotes the canary service to the `podinfo-primary` service. You expose the `podinfo` service via any L7 ingress solution or a service mesh so that the traffic is managed by Flagger for safe deployments. **Giving it a try**: To give it a try, create a `Canary` as usual, but its `targetRef` pointed to a K8s service: ``` apiVersion: flagger.app/v1alpha3 kind: Canary metadata: name: podinfo spec: provider: kubernetes targetRef: apiVersion: core/v1 kind: Service name: podinfo service: port: 9898 canaryAnalysis: # schedule interval (default 60s) interval: 10s # max number of failed checks before rollback threshold: 2 # number of checks to run before rollback iterations: 2 # Prometheus checks based on # http_request_duration_seconds histogram metrics: [] ``` Create a K8s service named `podinfo`, and update it. Now watch for the services `podinfo`, `podinfo-primary`, `podinfo-canary`. Flagger tracks `podinfo` service for changes. Upon any change, it reconciles `podinfo-primary` and `podinfo-canary` services. `podinfo-canary` always replicate the latest `podinfo`. In contract, `podinfo-primary` replicates the latest successful `podinfo-canary`. **Notes**: - For the canary cluster use-case, we would need to write a K8s operator to, e.g. for App Mesh, sync `ExternalName` services to AppMesh `VirtualNode`s. But that's another story!
@stefanprodan Done 👍 |
Update: I had some conversation with @stefanprodan today and we'll be moving everything from the newly added |
The current design is that everything related to managing the targeted resource should go into the respective implementation of `canary.Controller`. In the service-canary use-case our target is Service so rather than splitting and scattering the logics over Controller and Router, everything should naturally go to `ServiceController`. Maybe at the time of writing the first implementation, I was confusing the target service vs the router.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Thanks @mumoshu I think we should create an issue for adding this feature to docs including a tutorial with crossover as ingress controller
Resolves #371
This adds the support for
corev1.Service
as thetargetRef.kind
, so that we can use Flagger just for canary analysis and traffic-shifting on existing and pre-created services. Flagger doesn't touch deployments and HPAs in this mode.This is useful for keeping your full-control on the resources backing the service to be canary-released, including pods(behind a ClusterIP service) and external services(behind an ExternalName service).
Major use-case in my mind are:
ExternalName
services pointing to (the hostname of the loadbalancer of the targeted app instance in) each cluster. Flagger runs on the master cluster and helps safely rolling-out a new K8s cluster by doing a canary release on theExternalName
service.Design:
A canary release on a K8s service is almost the same as one on a K8s deployment. The only fundamental difference is that it operates only on a set of K8s services.
For example, one may start by creating two Helm releases for
podinfo-blue
andpodinfo-green
, and a K8s servicepodinfo
. Thepodinfo
service should initially have the sameSpec
as that ofpodinfo-blue
.On a new release, you update
podinfo-green
, then trigger Flagger by updating the K8s servicepodinfo
so that it points to pods orexternalName
as declared inpodinfo-green
. Flagger does the rest. The end result is the traffic topodinfo
is gradually and safely shifted frompodinfo-blue
topodinfo-green
.How it works:
Under the hood, Flagger maintains two K8s services,
podinfo-primary
andpodinfo-canary
. Compared to canaries on K8s deployments, it doesn't create the service namedpodinfo
, as it is already provided by YOU.Once Flagger detects the change in the
podinfo
service, it updates thepodinfo-canary
service and the routes, then analyzes the canary. On successful analysis, it promotes the canary service to thepodinfo-primary
service. You expose thepodinfo
service via any L7 ingress solution or a service mesh so that the traffic is managed by Flagger for safe deployments.Giving it a try:
To give it a try, create a
Canary
as usual, but itstargetRef
pointed to a K8s service:Create a K8s service named
podinfo
, and update it. Now watch for the servicespodinfo
,podinfo-primary
,podinfo-canary
.Flagger tracks
podinfo
service for changes. Upon any change, it reconcilespodinfo-primary
andpodinfo-canary
services.podinfo-canary
always replicate the latestpodinfo
. In contract,podinfo-primary
replicates the latest successfulpodinfo-canary
.Notes:
ExternalName
services to AppMeshVirtualNode
s. But that's another story!TODOs: