Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Canary promotion improvements #310

Merged
merged 6 commits into from
Sep 24, 2019
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions artifacts/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ spec:
- Initialized
- Waiting
- Progressing
- Promoting
- Finalising
- Succeeded
- Failed
Expand Down
1 change: 1 addition & 0 deletions charts/flagger/templates/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -255,6 +255,7 @@ spec:
- Initialized
- Waiting
- Progressing
- Promoting
- Finalising
- Succeeded
- Failed
Expand Down
36 changes: 30 additions & 6 deletions docs/gitbook/how-it-works.md
Original file line number Diff line number Diff line change
Expand Up @@ -143,7 +143,7 @@ status:
```

The `Promoted` status condition can have one of the following reasons:
Initialized, Waiting, Progressing, Finalising, Succeeded or Failed.
Initialized, Waiting, Progressing, Promoting, Finalising, Succeeded or Failed.
A failed canary will have the promoted status set to `false`,
the reason to `failed` and the last applied spec will be different to the last promoted one.

Expand All @@ -153,6 +153,26 @@ Wait for a successful rollout:
kubectl wait canary/podinfo --for=condition=promoted
```

CI example:

```bash
# update the container image
kubectl set image deployment/podinfo podinfod=stefanprodan/podinfo:3.0.1

# wait for Flagger to detect the change
ok=false
until ${ok}; do
kubectl get canary/podinfo | grep 'Progressing' && ok=true || ok=false
sleep 5
done

# wait for the canary analysis to finish
kubectl wait canary/podinfo --for=condition=promoted --timeout=5m

# check if the deployment was successful
kubectl get canary/podinfo | grep Succeeded
```

### Istio routing

Flagger creates an Istio Virtual Service and Destination Rules based on the Canary service spec.
Expand Down Expand Up @@ -344,12 +364,13 @@ A canary deployment is triggered by changes in any of the following objects:
Gated canary promotion stages:

* scan for canary deployments
* check Istio virtual service routes are mapped to primary and canary ClusterIP services
* check primary and canary deployments status
* check primary and canary deployment status
* halt advancement if a rolling update is underway
* halt advancement if pods are unhealthy
* call pre-rollout webhooks are check results
* halt advancement if any hook returned a non HTTP 2xx result
* call confirm-rollout webhooks and check results
* halt advancement if any hook returns a non HTTP 2xx result
* call pre-rollout webhooks and check results
* halt advancement if any hook returns a non HTTP 2xx result
* increment the failed checks counter
* increase canary traffic weight percentage from 0% to 5% (step weight)
* call rollout webhooks and check results
Expand All @@ -366,8 +387,11 @@ Gated canary promotion stages:
* halt advancement if any webhook call fails
* halt advancement while canary request success rate is under the threshold
* halt advancement while canary request duration P99 is over the threshold
* halt advancement while any custom metric check fails
* halt advancement if the primary or canary deployment becomes unhealthy
* halt advancement while canary deployment is being scaled up/down by HPA
* call confirm-promotion webhooks and check results
* halt advancement if any hook returns a non HTTP 2xx result
* promote canary to primary
* copy ConfigMaps and Secrets from canary to primary
* copy canary deployment spec template over primary
Expand All @@ -377,7 +401,7 @@ Gated canary promotion stages:
* scale to zero the canary deployment
* mark rollout as finished
* call post-rollout webhooks
* post the analysis result to Slack
* post the analysis result to Slack or MS Teams
* wait for the canary deployment to be updated and start over

### Canary Analysis
Expand Down
1 change: 1 addition & 0 deletions kustomize/base/flagger/crd.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,7 @@ spec:
- Initialized
- Waiting
- Progressing
- Promoting
- Finalising
- Succeeded
- Failed
Expand Down
4 changes: 3 additions & 1 deletion pkg/apis/flagger/v1alpha3/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,9 @@ const (
CanaryPhaseWaiting CanaryPhase = "Waiting"
// CanaryPhaseProgressing means the canary analysis is underway
CanaryPhaseProgressing CanaryPhase = "Progressing"
// CanaryPhaseProgressing means the canary analysis is finished and traffic has been routed back to primary
// CanaryPhasePromoting means the canary analysis is finished and the primary spec has been updated
CanaryPhasePromoting CanaryPhase = "Promoting"
// CanaryPhaseProgressing means the canary promotion is finished and traffic has been routed back to primary
CanaryPhaseFinalising CanaryPhase = "Finalising"
// CanaryPhaseSucceeded means the canary analysis has been successful
// and the canary deployment has been promoted
Expand Down
3 changes: 3 additions & 0 deletions pkg/canary/status.go
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@ func (c *Deployer) MakeStatusConditions(canaryStatus flaggerv1.CanaryStatus,
case flaggerv1.CanaryPhaseProgressing:
status = corev1.ConditionUnknown
message = "New revision detected, starting canary analysis."
case flaggerv1.CanaryPhasePromoting:
status = corev1.ConditionUnknown
message = "Canary analysis completed, starting primary rolling update."
case flaggerv1.CanaryPhaseFinalising:
status = corev1.ConditionUnknown
message = "Canary analysis completed, routing all traffic to primary."
Expand Down
145 changes: 61 additions & 84 deletions pkg/controller/scheduler.go
Original file line number Diff line number Diff line change
Expand Up @@ -214,7 +214,27 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
return
}

// scale canary to zero if analysis has succeeded
// route all traffic to primary if analysis has succeeded
if cd.Status.Phase == flaggerv1.CanaryPhasePromoting {
if provider != "kubernetes" {
c.recordEventInfof(cd, "Routing all traffic to primary")
if err := meshRouter.SetRoutes(cd, 100, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, 100, 0)
}

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

return
}

// scale canary to zero if promotion has finished
if cd.Status.Phase == flaggerv1.CanaryPhaseFinalising {
if err := c.deployer.Scale(cd, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
Expand Down Expand Up @@ -304,7 +324,7 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}
}

// canary fix routing: A/B testing
// strategy: A/B testing
if len(cd.Spec.CanaryAnalysis.Match) > 0 && cd.Spec.CanaryAnalysis.Iterations > 0 {
// route traffic to canary and increment iterations
if cd.Spec.CanaryAnalysis.Iterations > cd.Status.Iterations {
Expand Down Expand Up @@ -336,38 +356,19 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
c.recordEventWarningf(cd, "%v", err)
return
}
// increment iterations
if err := c.deployer.SetStatusIterations(cd, cd.Status.Iterations+1); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
return
}

// route all traffic to primary
if cd.Spec.CanaryAnalysis.Iterations < cd.Status.Iterations {
primaryWeight = 100
canaryWeight = 0
if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, primaryWeight, canaryWeight)

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhasePromoting); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

c.recordEventInfof(cd, "Routing all traffic to primary")
return
}

return
}

// canary fix routing: B/G
// strategy: Blue/Green
if cd.Spec.CanaryAnalysis.Iterations > 0 {
// increment iterations
if cd.Spec.CanaryAnalysis.Iterations > cd.Status.Iterations {
Expand Down Expand Up @@ -405,105 +406,79 @@ func (c *Controller) advanceCanary(name string, namespace string, skipLivenessCh
}

// promote canary - max iterations reached
if cd.Spec.CanaryAnalysis.Iterations+1 == cd.Status.Iterations {
if cd.Spec.CanaryAnalysis.Iterations < cd.Status.Iterations {
c.recordEventInfof(cd, "Copying %s.%s template spec to %s.%s",
cd.Spec.TargetRef.Name, cd.Namespace, primaryName, cd.Namespace)
if err := c.deployer.Promote(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

// increment iterations
if err := c.deployer.SetStatusIterations(cd, cd.Status.Iterations+1); err != nil {
// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhasePromoting); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
return
}

// route all traffic to primary
if cd.Spec.CanaryAnalysis.Iterations < cd.Status.Iterations {
if provider != "kubernetes" {
c.recordEventInfof(cd, "Routing all traffic to primary")
if err := meshRouter.SetRoutes(cd, 100, 0); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, 100, 0)
return
}

// strategy: Canary progressive traffic increase
if cd.Spec.CanaryAnalysis.StepWeight > 0 {
// increase traffic weight
if canaryWeight < maxWeight {
primaryWeight -= cd.Spec.CanaryAnalysis.StepWeight
if primaryWeight < 0 {
primaryWeight = 0
}
canaryWeight += cd.Spec.CanaryAnalysis.StepWeight
if canaryWeight > 100 {
canaryWeight = 100
}

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

return
}

return
}
if err := c.deployer.SetStatusWeight(cd, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

// canary incremental traffic weight
if canaryWeight < maxWeight {
primaryWeight -= cd.Spec.CanaryAnalysis.StepWeight
if primaryWeight < 0 {
primaryWeight = 0
}
canaryWeight += cd.Spec.CanaryAnalysis.StepWeight
if primaryWeight > 100 {
primaryWeight = 100
c.recorder.SetWeight(cd, primaryWeight, canaryWeight)
c.recordEventInfof(cd, "Advance %s.%s canary weight %v", cd.Name, cd.Namespace, canaryWeight)
return
}

// check promotion gate
// promote canary - max weight reached
if canaryWeight >= maxWeight {
// check promotion gate
if promote := c.runConfirmPromotionHooks(cd); !promote {
return
}
}

if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

// update weight status
if err := c.deployer.SetStatusWeight(cd, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

c.recorder.SetWeight(cd, primaryWeight, canaryWeight)
c.recordEventInfof(cd, "Advance %s.%s canary weight %v", cd.Name, cd.Namespace, canaryWeight)

// promote canary
if canaryWeight >= maxWeight {
// update primary spec
c.recordEventInfof(cd, "Copying %s.%s template spec to %s.%s",
cd.Spec.TargetRef.Name, cd.Namespace, primaryName, cd.Namespace)
if err := c.deployer.Promote(cd); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
}
} else {
// route all traffic to primary
primaryWeight = 100
canaryWeight = 0
if err := meshRouter.SetRoutes(cd, primaryWeight, canaryWeight); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}
c.recorder.SetWeight(cd, primaryWeight, canaryWeight)

// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhaseFinalising); err != nil {
c.recordEventWarningf(cd, "%v", err)
// update status phase
if err := c.deployer.SetStatusPhase(cd, flaggerv1.CanaryPhasePromoting); err != nil {
c.recordEventWarningf(cd, "%v", err)
return
}

return
}

c.recordEventInfof(cd, "Routing all traffic to primary")
return
}

}

func (c *Controller) shouldSkipAnalysis(cd *flaggerv1.Canary, meshRouter router.Interface, primaryWeight int, canaryWeight int) bool {
Expand Down Expand Up @@ -555,6 +530,7 @@ func (c *Controller) shouldAdvance(cd *flaggerv1.Canary) (bool, error) {
cd.Status.Phase == flaggerv1.CanaryPhaseInitializing ||
cd.Status.Phase == flaggerv1.CanaryPhaseProgressing ||
cd.Status.Phase == flaggerv1.CanaryPhaseWaiting ||
cd.Status.Phase == flaggerv1.CanaryPhasePromoting ||
cd.Status.Phase == flaggerv1.CanaryPhaseFinalising {
return true, nil
}
Expand All @@ -579,6 +555,7 @@ func (c *Controller) shouldAdvance(cd *flaggerv1.Canary) (bool, error) {
func (c *Controller) checkCanaryStatus(cd *flaggerv1.Canary, shouldAdvance bool) bool {
c.recorder.SetStatus(cd, cd.Status.Phase)
if cd.Status.Phase == flaggerv1.CanaryPhaseProgressing ||
cd.Status.Phase == flaggerv1.CanaryPhasePromoting ||
cd.Status.Phase == flaggerv1.CanaryPhaseFinalising {
return true
}
Expand Down
Loading