From bb0dc4a97ffc9477f82b6fa3c42711be01ca2e65 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 26 Sep 2022 14:06:38 +0100 Subject: [PATCH 01/51] adding adr for change detection --- docs/adrs/0013-pipelines-change-detection.md | 28 +++ docs/rfcs/0003-change-detection/README.md | 212 +++++++++++++++++++ 2 files changed, 240 insertions(+) create mode 100644 docs/adrs/0013-pipelines-change-detection.md create mode 100644 docs/rfcs/0003-change-detection/README.md diff --git a/docs/adrs/0013-pipelines-change-detection.md b/docs/adrs/0013-pipelines-change-detection.md new file mode 100644 index 0000000..4753379 --- /dev/null +++ b/docs/adrs/0013-pipelines-change-detection.md @@ -0,0 +1,28 @@ +# 12. Pipeline Promotions + +## Status + +Discovery in progress. This ADR currently is only to raise a draft PR to gather feedback. + +## Context + +As part of weave gitops, Sunglow team is working on deliverying Continuous Delivery Pipelines based in the following +[initiative](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) + +First iteration (v0.1) has covered the ability to view an application being deployed across different environments. + +Current iteration (v0.2) has a outcome to enable applicatio promotions by using weave gitops enteprrise + +Two main approaches are under discussion + +- [Promotions via Resource Watching](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) +- [Promotions via Webhooks](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) + + +## Decision + +TBA + +## Consequences + +TBA \ No newline at end of file diff --git a/docs/rfcs/0003-change-detection/README.md b/docs/rfcs/0003-change-detection/README.md new file mode 100644 index 0000000..5d19a6f --- /dev/null +++ b/docs/rfcs/0003-change-detection/README.md @@ -0,0 +1,212 @@ +# RFC-0003 Comparison of approaches to detect changes that trigger promotions + +<!-- +The title must be short and descriptive. +--> + +**Status:** provisional + +<!-- +Status represents the current state of the RFC. +Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejected`, `withdrawn`, or `replaced`. +--> + +**Creation date:** 2022-09-21 + +**Last update:** 2022-09-21 + +## Summary + +<!-- +One paragraph explanation of the proposed feature or enhancement. +--> + +## Motivation + +<!-- +This section is for explicitly listing the motivation, goals, and non-goals of +this RFC. Describe why the change is important and the benefits to users. +--> + +### Terminology + +- **CD Pipeline**: A CD Pipeline declares a series of environments through which a given application is expected to be deployed. +- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. +- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. +- **Application**: A Helm Release or an Image. +- **Progressive Delivery / Rollout**: Updates to an application flow from one environment to the next. This has no relationship to Flagger + +### Goals + +<!-- +List the specific goals of this RFC. What is it trying to achieve? How will we +know that this has succeeded? +--> + +### Non-Goals + +<!-- +What is out of scope for this RFC? Listing non-goals helps to focus discussion +and make progress. +--> + + +## Comparison + +<!-- +This is where we get down to the specifics of what the proposal actually is. +This should have enough detail that reviewers can understand exactly what +you're proposing, but should not include things like API designs or +implementation. + +If the RFC goal is to document best practices, +then this section can be replaced with the the actual documentation. +--> + + +### Watcher approach + +This approach suggests the creation of a watcher per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes and take an action to start the next promotion based on the Pipeline definition. + +[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) + +#### Sequence diagram + +```mermaid + sequenceDiagram + actor U as operator + U->>+API Server: creates Pipeline + participant PC as Pipeline Controller + participant PS as Promotion Strategy + API Server->>+PC: notifies + participant dt1 as dev/target 1 + + rect rgb(67, 207, 250) + note right of PC: setup phase + note right of PC: pipelines.wego.weave.works/name<br/>pipelines.wego.weave.works/env<br/>pipelines.wego.weave.works/target + PC->>+dt1: label AppRef with metadata + participant dt2 as dev/target 2 + PC->>+dt2: label AppRef with metadata + participant pt1 as prod/target 1 + PC->>+pt1: label AppRef with metadata + end + + rect rgb(50, 227, 221) + note right of PC: promotion phase + PC-->>+dt1: watches HelmRelease and Kustomizations changes + PC-->>+dt2: watches HelmRelease and Kustomizations changes + PC-->>+pt1: watches HelmRelease and Kustomizations changes + end + + + dt1->>+PC: update events from AppRef + PC ->>PC: filter upgrade events + PC ->>PC: extract metadata + PC->>+PS: kicks off + ``` + +#### Advantages + +1. Plug n play: no further configurations or setup is needed to get updates. +1. Simple authentication: No need to worry about who triggered the event, since we are talking directly with the target. + + +#### Disadvantages and Mitigations + +1. Requires Flux on all leaf clusters. +1. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. +1. There is no way to kick off promotions externally + +### Alert approach + +This approach suggests the use of Flux notification controller running on the remote cluster. An alert/provider CR can be setup to call a webhook running on the management cluster to notify the management cluster of a Helm release change in a remote cluster. + +[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) + + +#### Sequence diagram + +```mermaid + sequenceDiagram + actor U as operator + U->>+API Server: creates Pipeline ns1/p1 + participant PC as Pipeline Controller + participant PS as Promotion Strategy + API Server->>+PC: notifies + participant dt1 as dev/target 1 + rect rgb(67, 207, 250) + note right of PC: alerting setup phase + PC->>+dt1: creates Provider /ns1/p1/dev + PC->>+dt1: creates Alert + participant dt2 as dev/target 2 + PC->>+dt2: creates Provider /ns1/p1/dev + PC->>+dt2: creates Alert + participant pt1 as prod/target 1 + PC->>+pt1: creates Provider + PC->>+pt1: creates Alert + end + rect rgb(50, 227, 221) + note right of PC: promotion phase + dt1->>+PC: sends Event to /ns1/p1/dev + PC->>+PS: kicks off + PS->>+pt1: promotes app + end + ``` + +#### Example Event + +```json +{ + "involvedObject": { + "kind": "HelmRelease", + "namespace": "flux-system", + "name": "metallb", + "uid": "57c3579b-42da-4f27-afc5-8bd7778286e1", + "apiVersion": "helm.toolkit.fluxcd.io/v2beta1", + "resourceVersion": "155540" + }, + "severity": "info", + "timestamp": "2022-09-13T16:01:01Z", + "message": "Helm upgrade succeeded", + "reason": "info", + "metadata": { + "revision": "0.13.4", + "summary": "foobar" + }, + "reportingController": "helm-controller", + "reportingInstance": "helm-controller-7cdc7874f8-9qpft" +} +``` + +#### Advantages + +1. Simplicity: Uses Flux functionality as much as possible +1. Flexiblility: Promotion can be kicked off from external systems by calling the webhook +1. Flexibility: Promotion can be exercised by an external system + +#### Disadvantages and Mitigations + +1. Requires Flux on all leaf clusters. _Mitigations: ?_ +1. Authenticity of events needs to be taken care of. _Mitigations: add authentication to webhook; verify event by reaching out to leaf cluster_ +1. Network connectivity from all leaf clusters to management cluster necessary. _Mitigations: promotion can be kicked from any external system so if using notification-controller would not work, an external CI system could trigger promotion instead._ + +#### Known Unknowns + +1. How does p-c set the correct Provider address? + 2. Configuration (user burden) + 3. Automatic determination (might get complicated quick to account for the different environments (with/without Ingress, external LB, ...) +1. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, (2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) and the management cluster's Git repository. + +#### Further Considerations + +##### delivery semantics/failure scenarios recovery for notifications + +The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's only configurable globally with a default of 5m. This might lead to events not being emitted to the webhook. + +notification-controller has [at-most once delivery semantics](https://github.com/fluxcd/notification-controller/tree/main/docs/spec#events-dispatching-1): + +> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry sending an alert for four times with an exponential backoff of maximum 30 seconds. + +#### enrichment of events for custom metadata + +The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". \ No newline at end of file From fb5756937471ee2e3a1e0e650c9e557646fca480 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 26 Sep 2022 14:44:57 +0100 Subject: [PATCH 02/51] adding adr --- docs/adrs/0013-pipelines-change-detection.md | 40 ++++++++++++++------ 1 file changed, 28 insertions(+), 12 deletions(-) diff --git a/docs/adrs/0013-pipelines-change-detection.md b/docs/adrs/0013-pipelines-change-detection.md index 4753379..fa2cbf3 100644 --- a/docs/adrs/0013-pipelines-change-detection.md +++ b/docs/adrs/0013-pipelines-change-detection.md @@ -1,27 +1,43 @@ -# 12. Pipeline Promotions +# 13. Pipelines - How to detect deployment changes ## Status - -Discovery in progress. This ADR currently is only to raise a draft PR to gather feedback. +Proposed ## Context +As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) -As part of weave gitops, Sunglow team is working on deliverying Continuous Delivery Pipelines based in the following -[initiative](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) - -First iteration (v0.1) has covered the ability to view an application being deployed across different environments. +It's [first version (v0.1)](//TODO add link) has been delivered covering the ability to view an application deployed across different environments. -Current iteration (v0.2) has a outcome to enable applicatio promotions by using weave gitops enteprrise +The [second iteration](//TODO) aims to enable integration with internal and external promotions. -Two main approaches are under discussion +As part of the promotions capabilities, there is the need to detect when a deployment has occurred with not only +an approach to do it. During the discovery of the second iteration, two models has been spiked: -- [Promotions via Resource Watching](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) -- [Promotions via Webhooks](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) +Detect deployment changes via +- [Watching](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) +- [Webhooks](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) ## Decision -TBA +As [discussed in RFC](../rfcs/0003-change-detection/README.md) each of approaches has associated unknowns. The major ones were + +- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the impediments + that it would suppose for customers while adopting the solution. +- Watching: how reliable the solution could be as not having existing examples of products using it for watching remote clusters. + +We envision that weave gitops needs to be a flexible solution that eventually would need to support both approaches +to accommodate the range of potential enterprises using weave gitops. + +In order to start with one of the approaches, we have decided to start by using `webhooks` due to + +- allow us to provide promotions for wge customers based on our own promotions capability + +but also because + +- flux provides the base building blocks to integrate with existing customer's promotion systems. It opens the door +for a gradual adoption of the wge pipeline solution for customers that already have custom delivery logic. + ## Consequences From 126df4d952eaca10ef88d4bfc7399e4ded62ffa2 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 26 Sep 2022 14:59:08 +0100 Subject: [PATCH 03/51] consequences added --- docs/adrs/0013-pipelines-change-detection.md | 12 ++++++++++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/docs/adrs/0013-pipelines-change-detection.md b/docs/adrs/0013-pipelines-change-detection.md index fa2cbf3..4b6831c 100644 --- a/docs/adrs/0013-pipelines-change-detection.md +++ b/docs/adrs/0013-pipelines-change-detection.md @@ -38,7 +38,15 @@ but also because - flux provides the base building blocks to integrate with existing customer's promotion systems. It opens the door for a gradual adoption of the wge pipeline solution for customers that already have custom delivery logic. - ## Consequences -TBA \ No newline at end of file +As mentioned in the decision, the following consequences of the decision + +- A path forward for pipelines to deliver promotions capability. Sunglow could deliver promotions based on this approach. +- A risk to manage in the context of customer adoption: the network path opened. Sunglow would need to establish the customer feedback +loop with SAs/CXs to manage and mitigate the risk once it happens. +- A scenario further to develop: existing CI scenarios based on the approach. Sunglow would need to use customer feedback to +determine which existing systems are of relevance to provide the integration experience. + + + From 5ec346c9955235cecf0f60481aca779073ec7ecf Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 5 Oct 2022 09:08:53 +0100 Subject: [PATCH 04/51] slicing promotion designs --- docs/adrs/0013-pipelines-change-detection.md | 2 +- docs/rfcs/0003-pipelines-promotion/README.md | 0 .../change-detection.md} | 0 docs/rfcs/0003-pipelines-promotion/promotions-solution.md | 7 +++++++ 4 files changed, 8 insertions(+), 1 deletion(-) create mode 100644 docs/rfcs/0003-pipelines-promotion/README.md rename docs/rfcs/{0003-change-detection/README.md => 0003-pipelines-promotion/change-detection.md} (100%) create mode 100644 docs/rfcs/0003-pipelines-promotion/promotions-solution.md diff --git a/docs/adrs/0013-pipelines-change-detection.md b/docs/adrs/0013-pipelines-change-detection.md index 4b6831c..9492063 100644 --- a/docs/adrs/0013-pipelines-change-detection.md +++ b/docs/adrs/0013-pipelines-change-detection.md @@ -20,7 +20,7 @@ Detect deployment changes via ## Decision -As [discussed in RFC](../rfcs/0003-change-detection/README.md) each of approaches has associated unknowns. The major ones were +As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) each of approaches has associated unknowns. The major ones were - Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the impediments that it would suppose for customers while adopting the solution. diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md new file mode 100644 index 0000000..e69de29 diff --git a/docs/rfcs/0003-change-detection/README.md b/docs/rfcs/0003-pipelines-promotion/change-detection.md similarity index 100% rename from docs/rfcs/0003-change-detection/README.md rename to docs/rfcs/0003-pipelines-promotion/change-detection.md diff --git a/docs/rfcs/0003-pipelines-promotion/promotions-solution.md b/docs/rfcs/0003-pipelines-promotion/promotions-solution.md new file mode 100644 index 0000000..d5166fc --- /dev/null +++ b/docs/rfcs/0003-pipelines-promotion/promotions-solution.md @@ -0,0 +1,7 @@ +# Pipeline Promotions RFCs + +The following topics have been designed to solve promotions for pipelines: + +- [How pipelines detect that a deployment has happened](change-detection.md) +- [How the promotions solution looks like](promotions-api.md) +- What are the security considerations \ No newline at end of file From f28fc7b4309668afafe366deadf11fe0e8aa6cea Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 5 Oct 2022 09:22:46 +0100 Subject: [PATCH 05/51] updated ADR --- ...d => 0013-pipelines-detect-deployments.md} | 33 +++++++++---------- 1 file changed, 16 insertions(+), 17 deletions(-) rename docs/adrs/{0013-pipelines-change-detection.md => 0013-pipelines-detect-deployments.md} (61%) diff --git a/docs/adrs/0013-pipelines-change-detection.md b/docs/adrs/0013-pipelines-detect-deployments.md similarity index 61% rename from docs/adrs/0013-pipelines-change-detection.md rename to docs/adrs/0013-pipelines-detect-deployments.md index 9492063..87c3a15 100644 --- a/docs/adrs/0013-pipelines-change-detection.md +++ b/docs/adrs/0013-pipelines-detect-deployments.md @@ -1,14 +1,15 @@ -# 13. Pipelines - How to detect deployment changes +# 13. Pipelines - How to detect deployments ## Status Proposed ## Context -As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) +As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where +[first iteration has been delivered](https://docs.gitops.weave.works/docs/next/enterprise/pipelines/intro/index.html) +covering the ability to view an application deployed across different environments. -It's [first version (v0.1)](//TODO add link) has been delivered covering the ability to view an application deployed across different environments. - -The [second iteration](//TODO) aims to enable integration with internal and external promotions. +The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims +to enable promotions between environments. As part of the promotions capabilities, there is the need to detect when a deployment has occurred with not only an approach to do it. During the discovery of the second iteration, two models has been spiked: @@ -20,23 +21,21 @@ Detect deployment changes via ## Decision -As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) each of approaches has associated unknowns. The major ones were +As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) each of approaches has associated unknowns. +The major ones are: -- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the impediments - that it would suppose for customers while adopting the solution. +- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments + that it would suppose for customers while adopting the solution, as well its security management. - Watching: how reliable the solution could be as not having existing examples of products using it for watching remote clusters. -We envision that weave gitops needs to be a flexible solution that eventually would need to support both approaches +We envision weave gitops as needs to be a flexible solution that eventually would need to support both approaches to accommodate the range of potential enterprises using weave gitops. -In order to start with one of the approaches, we have decided to start by using `webhooks` due to - -- allow us to provide promotions for wge customers based on our own promotions capability - -but also because +In order to start with one of the approaches, we have decided to start by `webhooks` solution due to: -- flux provides the base building blocks to integrate with existing customer's promotion systems. It opens the door -for a gradual adoption of the wge pipeline solution for customers that already have custom delivery logic. +- Allow us to provide promotions for wge customers based on our own promotions capability with better scalability approach. +- Reinforces the vision of weave gitops being a continuum of Flux by using Flux core components, in this context, [notification + controller](https://fluxcd.io/flux/components/notification/), to provide the basic building blocks around deployment notification. ## Consequences @@ -44,7 +43,7 @@ As mentioned in the decision, the following consequences of the decision - A path forward for pipelines to deliver promotions capability. Sunglow could deliver promotions based on this approach. - A risk to manage in the context of customer adoption: the network path opened. Sunglow would need to establish the customer feedback -loop with SAs/CXs to manage and mitigate the risk once it happens. +loop with SAs/CXs to manage and mitigate the risk once it happens. Same for security - A scenario further to develop: existing CI scenarios based on the approach. Sunglow would need to use customer feedback to determine which existing systems are of relevance to provide the integration experience. From 9e41851c7b7d3b3cde61831385d851e95a8d58b6 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 5 Oct 2022 10:15:38 +0100 Subject: [PATCH 06/51] updated notification change rfc --- .../adrs/0013-pipelines-detect-deployments.md | 4 +- ...ection.md => detect-deployment-changes.md} | 51 +++++++++++++------ 2 files changed, 37 insertions(+), 18 deletions(-) rename docs/rfcs/0003-pipelines-promotion/{change-detection.md => detect-deployment-changes.md} (74%) diff --git a/docs/adrs/0013-pipelines-detect-deployments.md b/docs/adrs/0013-pipelines-detect-deployments.md index 87c3a15..39f0e45 100644 --- a/docs/adrs/0013-pipelines-detect-deployments.md +++ b/docs/adrs/0013-pipelines-detect-deployments.md @@ -17,11 +17,11 @@ an approach to do it. During the discovery of the second iteration, two models h Detect deployment changes via - [Watching](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) -- [Webhooks](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) +- [Alert](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) ## Decision -As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) each of approaches has associated unknowns. +As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each of approaches has associated unknowns. The major ones are: - Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments diff --git a/docs/rfcs/0003-pipelines-promotion/change-detection.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md similarity index 74% rename from docs/rfcs/0003-pipelines-promotion/change-detection.md rename to docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index 5d19a6f..1b24979 100644 --- a/docs/rfcs/0003-pipelines-promotion/change-detection.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -1,4 +1,4 @@ -# RFC-0003 Comparison of approaches to detect changes that trigger promotions +# RFC-0003 How to detect deployment changes and to notify for pipeline promotions <!-- The title must be short and descriptive. @@ -11,9 +11,9 @@ Status represents the current state of the RFC. Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejected`, `withdrawn`, or `replaced`. --> -**Creation date:** 2022-09-21 +**Creation date:** 2022-10-05 -**Last update:** 2022-09-21 +**Last update:** 2022-10-05 ## Summary @@ -21,6 +21,21 @@ Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejec One paragraph explanation of the proposed feature or enhancement. --> +Given a continious delivery pipeline is comprised of diffferent environments the application goes trough in +its way to production, there is need for an action to move the application among environments. That concept is known as +promotion and it is a one of the core concepts of a pipelines domain. + +This RFC looks at different designs for notifying that a deployment has happened in order to trigger a promotion (if needed). + +## Terminology + +- **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed. +- **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline. +For example promote stating to production would attempt to deploy an application existing in staging environment to production environment. +- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. +- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. +- **Application**: A Helm Release. + ## Motivation <!-- @@ -28,13 +43,9 @@ This section is for explicitly listing the motivation, goals, and non-goals of this RFC. Describe why the change is important and the benefits to users. --> -### Terminology -- **CD Pipeline**: A CD Pipeline declares a series of environments through which a given application is expected to be deployed. -- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. -- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. -- **Application**: A Helm Release or an Image. -- **Progressive Delivery / Rollout**: Updates to an application flow from one environment to the next. This has no relationship to Flagger +This RFC looks at different designs for notifying that a deployment has happened in order to trigger a promotion (if needed). + ### Goals @@ -43,13 +54,17 @@ List the specific goals of this RFC. What is it trying to achieve? How will we know that this has succeeded? --> +- Discover different solutions within weave gitops that would allow to solve the problem of how to detect that +a deployment pipeline has changed. +- Recommend the one that seems better suited for the role. + ### Non-Goals <!-- What is out of scope for this RFC? Listing non-goals helps to focus discussion and make progress. --> - +- Anything related to processing the deployment notification. ## Comparison @@ -60,13 +75,15 @@ you're proposing, but should not include things like API designs or implementation. If the RFC goal is to document best practices, -then this section can be replaced with the the actual documentation. +then this section can be replaced with the actual documentation. --> -### Watcher approach +### Watchers approach -This approach suggests the creation of a watcher per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes and take an action to start the next promotion based on the Pipeline definition. +This approach suggests the creation of [kubernetes watchers](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes) +per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes +and take an action to start the next promotion based on the Pipeline definition. [Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) @@ -119,7 +136,9 @@ This approach suggests the creation of a watcher per remote cluster. Each watche ### Alert approach -This approach suggests the use of Flux notification controller running on the remote cluster. An alert/provider CR can be setup to call a webhook running on the management cluster to notify the management cluster of a Helm release change in a remote cluster. +This approach suggests the use of Flux [notification controller](https://fluxcd.io/flux/components/notification/) running on the remote cluster. +An [alert](https://fluxcd.io/flux/components/notification/alert/) / [provider](https://fluxcd.io/flux/components/notification/provider/) +would be setup to call a webhook running on the management cluster to notify a Helm release change in a remote cluster. [Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) @@ -181,8 +200,8 @@ This approach suggests the use of Flux notification controller running on the re #### Advantages 1. Simplicity: Uses Flux functionality as much as possible -1. Flexiblility: Promotion can be kicked off from external systems by calling the webhook -1. Flexibility: Promotion can be exercised by an external system +2. Flexibility: Promotion can be kicked off from external systems by calling the webhook +3. Flexibility: Promotion can be exercised by an external system #### Disadvantages and Mitigations From 49860d3ff82168cd5278d1c217a60e93474667bc Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 5 Oct 2022 13:54:22 +0100 Subject: [PATCH 07/51] detect deployment changes --- .../detect-deployment-changes.md | 24 ++++++++++++++----- 1 file changed, 18 insertions(+), 6 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index 1b24979..6be060c 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -206,26 +206,38 @@ would be setup to call a webhook running on the management cluster to notify a H #### Disadvantages and Mitigations 1. Requires Flux on all leaf clusters. _Mitigations: ?_ -1. Authenticity of events needs to be taken care of. _Mitigations: add authentication to webhook; verify event by reaching out to leaf cluster_ -1. Network connectivity from all leaf clusters to management cluster necessary. _Mitigations: promotion can be kicked from any external system so if using notification-controller would not work, an external CI system could trigger promotion instead._ +2. Authenticity of events needs to be taken care of. +_Mitigations: add authentication and authorization to the webhook; verify event by reaching out to leaf cluster_ +4. Network connectivity from all leaf clusters to management cluster necessary. +_Mitigations: promotion can be kicked from any external system so if using notification-controller would not work, +an external CI system could trigger promotion instead._ #### Known Unknowns 1. How does p-c set the correct Provider address? 2. Configuration (user burden) 3. Automatic determination (might get complicated quick to account for the different environments (with/without Ingress, external LB, ...) -1. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, (2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) and the management cluster's Git repository. +2. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' +API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). +Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, +(2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. +An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) +and the management cluster's Git repository. #### Further Considerations ##### delivery semantics/failure scenarios recovery for notifications -The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's only configurable globally with a default of 5m. This might lead to events not being emitted to the webhook. +The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's +only configurable globally with a default of 5m. This might lead to events not being emitted to the webhook. notification-controller has [at-most once delivery semantics](https://github.com/fluxcd/notification-controller/tree/main/docs/spec#events-dispatching-1): -> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry sending an alert for four times with an exponential backoff of maximum 30 seconds. +> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for +> connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry +> sending an alert for four times with an exponential backoff of maximum 30 seconds. #### enrichment of events for custom metadata -The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". \ No newline at end of file +The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events +by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". \ No newline at end of file From 0a6a5e7468e5fbdd2f199f36809cbf195fcd30fb Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 5 Oct 2022 17:23:54 +0100 Subject: [PATCH 08/51] added ADR for promotions solution --- .../0014-pipelines-promotions-solution.md | 39 ++++ docs/rfcs/0003-pipelines-promotion/README.md | 7 + .../detect-deployment-changes.md | 2 +- .../promotions-solution.md | 185 +++++++++++++++++- 4 files changed, 227 insertions(+), 6 deletions(-) create mode 100644 docs/adrs/0014-pipelines-promotions-solution.md diff --git a/docs/adrs/0014-pipelines-promotions-solution.md b/docs/adrs/0014-pipelines-promotions-solution.md new file mode 100644 index 0000000..9663acd --- /dev/null +++ b/docs/adrs/0014-pipelines-promotions-solution.md @@ -0,0 +1,39 @@ +# 13. Pipelines Promotions + +## Status +Proposed + +## Context +As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where +[first iteration has been delivered](https://docs.gitops.weave.works/docs/next/enterprise/pipelines/intro/index.html) +covering the ability to view an application deployed across different environments. + +The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims +to enable promotions between environments. + +Once defined how to [detect a deployment change](0013-pipelines-detect-deployments.md), this ADR defines +how the solution e2e looks in terms of architecture. + +## Decision + +As [discussed in RFC](../rfcs/0003-pipelines-promotion/promotions-solution.md) four alternatives were discussed: + +- Alternative A: weave gitops backend +- Alternative B: pipelines controller +- Alternative C: new service called promotions service +- Alternative D: cluster services + pipeline controller + promotion executor + +From the alternatives, promotions solution would be implemented using +alternative B, pipelines controller, as + +//TODO + + +## Consequences + +As mentioned in the decision, the following consequences of the decision + +- A path forward for pipelines promotions e2e. + + + diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index e69de29..443248d 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -0,0 +1,7 @@ +# Pipeline Promotions RFCs + +The following topics have been designed to solve promotions for pipelines: + +- [How pipelines detect that a deployment has happened](change-detection.md) +- [How the promotions solution looks like](promotions-solution.md) +- What are the security considerations \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index 6be060c..837446e 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -64,7 +64,7 @@ a deployment pipeline has changed. What is out of scope for this RFC? Listing non-goals helps to focus discussion and make progress. --> -- Anything related to processing the deployment notification. +- Anything related to processing the deployment notification. ## Comparison diff --git a/docs/rfcs/0003-pipelines-promotion/promotions-solution.md b/docs/rfcs/0003-pipelines-promotion/promotions-solution.md index d5166fc..db8bb8c 100644 --- a/docs/rfcs/0003-pipelines-promotion/promotions-solution.md +++ b/docs/rfcs/0003-pipelines-promotion/promotions-solution.md @@ -1,7 +1,182 @@ -# Pipeline Promotions RFCs +# RFC-0003 How promotions solution looks like -The following topics have been designed to solve promotions for pipelines: +<!-- +The title must be short and descriptive. +--> -- [How pipelines detect that a deployment has happened](change-detection.md) -- [How the promotions solution looks like](promotions-api.md) -- What are the security considerations \ No newline at end of file +**Status:** provisional + +<!-- +Status represents the current state of the RFC. +Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejected`, `withdrawn`, or `replaced`. +--> + +**Creation date:** 2022-10-05 + +**Last update:** 2022-10-05 + +## Summary + +<!-- +One paragraph explanation of the proposed feature or enhancement. +--> + +Given a continuous delivery pipeline is comprised of diffferent environments the application goes trough in +its way to production, there is need for an action to move the application among environments. That concept is known as +promotion and it is a one of the core concepts of a pipelines domain. + +This RFC looks at different e2e solutions for promotions. + +## Terminology + +- **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed. +- **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline. + For example promote stating to production would attempt to deploy an application existing in staging environment to production environment. +- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. +- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. +- **Application**: A Helm Release. + +## Motivation + +<!-- +This section is for explicitly listing the motivation, goals, and non-goals of +this RFC. Describe why the change is important and the benefits to users. +--> + + +This RFC looks at different e2e solutions for promotions. + + +### Goals + +<!-- +List the specific goals of this RFC. What is it trying to achieve? How will we +know that this has succeeded? +--> + +- Discover different e2e solutions for promotions. +- Recommend the one that seems better suited for the role. + +### Non-Goals + +<!-- +What is out of scope for this RFC? Listing non-goals helps to focus discussion +and make progress. +--> +- Anything no promotions related + +## Solution Alternatives + +The following solutions has been identified +- Alternative A: weave gitops backend +- Alternative B: pipeline controller +- Alternative C: new service called promotions service +- Alternative D: cluster services + pipeline controller + promotion executor + +### Alternative A: weave gitops backend + +#### Diagram +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>WGE: process deployment notification + participant WGE as Weave Gitops Backend + participant k8s as Kubernetes Api + WGE->>k8s: get pipeline + WGE->>WGE: promotion business loic + participant k8s as Kubernetes Api + WGE->>configRepo: raise PR + participant configRepo as Configuration Repo +``` +#### Pro +- already setup and *should* be more easily exposed +- no need to generate TS client + +#### Cons +- service account needs extra permissions +- need to work around entitlements/user auth + +### Alternative B: pipeline controller + +#### Diagram +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>pc: process deployment notification + participant pc as Pipeline Controller + participant k8s as Kubernetes Api + pc->>k8s: get pipeline + pc->>pc: promotion business loic + participant k8s as Kubernetes Api + pc->>configRepo: raise PR + participant configRepo as Configuration Repo +``` +#### Pro +- separate service account and permissions +- easier to dev against + +#### Cons + +- need service/ingress resource to expose it +- feels weird for a controller to not run a reconcile loop and instead host a webhook server + +### Alternative C: new service called promotions service +#### Diagram +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>PS: process deployment notification + participant PS as Promotions Svc + participant k8s as Kubernetes Api + PS->>k8s: get pipeline + PS->>PS: promotion business loic + participant k8s as Kubernetes Api + PS->>configRepo: raise PR + participant configRepo as Configuration Repo +``` +#### Pro +- easiest to dev against + +#### Cons +- 1 more component for the team to maintain +- new repo/CI (?) + +### Alternative D: cluster services + pipeline controller + promotion executor +#### Diagram +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>WGE: process deployment notification + participant WGE as Weave Gitops Backend + participant k8s as Kubernetes Api + WGE->>k8s: write deployment event + participant pc as pipeline controller + k8s->>pc: watch deployment event & pipelines + pc->>pj: create promotion job + participant pj as promotion job + pj->>pj: promotion business logic + pj->>configRepo: raise PR + participant configRepo as Configuration Repo +``` +#### Pro +- separation of concerns with scalability and fault-tolerance by design + +#### Cons +- most complex solution (might be over complex?) +- kubernetes jobs not a popular choice \ No newline at end of file From 7c70e6b733bee6abbed899e0faba99529fa7094d Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 6 Oct 2022 08:52:47 +0100 Subject: [PATCH 09/51] unified promotions rfcs into one --- docs/rfcs/0003-pipelines-promotion/README.md | 230 +++++++++++++++++- .../promotions-solution.md | 182 -------------- docs/rfcs/README.md | 8 + 3 files changed, 233 insertions(+), 187 deletions(-) delete mode 100644 docs/rfcs/0003-pipelines-promotion/promotions-solution.md create mode 100644 docs/rfcs/README.md diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 443248d..f2e05bc 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -1,7 +1,227 @@ -# Pipeline Promotions RFCs +# RFC-0003 Promotion capability for pipelines -The following topics have been designed to solve promotions for pipelines: +**Status:** provisional -- [How pipelines detect that a deployment has happened](change-detection.md) -- [How the promotions solution looks like](promotions-solution.md) -- What are the security considerations \ No newline at end of file +**Creation date:** 2022-10-05 + +**Last update:** 2022-10-05 + +## Summary + +Given a continuous delivery pipeline is comprised of diffferent environments the application goes trough in +its way to production, there is need for an action to move the application among environments. That concept is known as +promotion and it is a one of the core concepts of a pipelines domain. Current pipelines in weave gitops +does not support promotion. + +This RFC addresses it as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) + +## Terminology + +- **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed. +- **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline. + For example promote stating to production would attempt to deploy an application existing in staging environment to production environment. +- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. +- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. +- **Application**: A Helm Release. + +## Motivation + +Given a continuous delivery pipeline is comprised of diffferent environments the application goes trough in +its way to production, there is need for an action to move the application among environments. That concept is known as +promotion and it is a one of the core concepts of a pipelines domain. Current pipelines in weave gitops +does not support promotion. + +This RFC addresses it as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) + +### Goals + +- Design the e2e solution for promotions on weave gitops pipelines. +- Should support the [scenarios identified](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#5b514ad575544595b1028d73e5b6dd23) + +### Non-Goals + +- Anything beyond the scope of promotions. +- Scenarios other than the identified in the product initiative. + +## Proposal + +The proposed solution architecture is shown below. + +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>pc: process deployment notification + participant pc as Pipeline Controller + participant k8s as Kubernetes Api + pc->>k8s: get pipeline + pc->>pc: promotion business loic + participant k8s as Kubernetes Api + pc->>configRepo: raise PR + participant configRepo as Configuration Repo +``` + +With three main responsibilities + +1. Detect and communicate deployment changes +2. Process deployment change and determine promotions needs +3. Execute the promotion + +### Detect and communicate deployment changes + +The solution leverages [flux native notification capabilities](https://fluxcd.io/flux/components/notification/) for this responsibility. +An evaluation of different alternatives solutions to this concern could be found [here](detect-deployment-changes.md). + +### Process deployment change and determine promotions needs + +This responsibility is assumed by pipeline controller living in the management cluster that +- would expose a webhook to ingest deployment change events. +- process concurrently these requests +- determine whether at the back of the event and a pipeline definition, a promotion is required. + +#### Promotions within pipeline spec + +In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: podinfo + namespace: default +spec: + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + promotion: + - name: promote-via-pr + type: pull-request + url: git@github.com:organization/repo + branch: main + secretRef: my-other-deployed-secret + environments: + - name: dev + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: dev +``` +The promotion field used to capture the promotion tasks for the next environment in the pipeline after a successful deployment has taken place. +Each task will include the following fields: + +- `name`: the task name +- `type`: the task type, either webhook or pull-request +- `url` : the git repository url or the webhook url +- `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request) +- `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook. + +### To execute the promotion + +Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge +of orchestrating and executing the task according to its configuration. + +The current solution has been chosen over its alternatives (see alternatives section) due to + +- it enables promotions. +- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. +- it is easier to develop over other alternatives. + +On the flip side, the solution has the following constraints: + +- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. +- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. + + +### Alternatives + +The following solutions has been identified +- Alternative A: weave gitops backend +- Alternative B: pipeline controller +- Alternative C: new service called promotions service +- Alternative D: cluster services + pipeline controller + promotion executor + +### Alternative A: weave gitops backend + +#### Diagram +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>WGE: process deployment notification + participant WGE as Weave Gitops Backend + participant k8s as Kubernetes Api + WGE->>k8s: get pipeline + WGE->>WGE: promotion business loic + participant k8s as Kubernetes Api + WGE->>configRepo: raise PR + participant configRepo as Configuration Repo +``` +#### Pro +- already setup and *should* be more easily exposed +- no need to generate TS client + +#### Cons +- service account needs extra permissions +- need to work around entitlements/user auth + + +### Alternative C: new service called promotions service +#### Diagram +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>PS: process deployment notification + participant PS as Promotions Svc + participant k8s as Kubernetes Api + PS->>k8s: get pipeline + PS->>PS: promotion business loic + participant k8s as Kubernetes Api + PS->>configRepo: raise PR + participant configRepo as Configuration Repo +``` +#### Pro +- easiest to dev against + +#### Cons +- 1 more component for the team to maintain +- new repo/CI (?) + +### Alternative D: cluster services + pipeline controller + promotion executor +#### Diagram +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>WGE: process deployment notification + participant WGE as Weave Gitops Backend + participant k8s as Kubernetes Api + WGE->>k8s: write deployment event + participant pc as pipeline controller + k8s->>pc: watch deployment event & pipelines + pc->>pj: create promotion job + participant pj as promotion job + pj->>pj: promotion business logic + pj->>configRepo: raise PR + participant configRepo as Configuration Repo +``` +#### Pro +- separation of concerns with scalability and fault-tolerance by design + +#### Cons +- most complex solution (might be over complex?) +- kubernetes jobs not a popular choice \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/promotions-solution.md b/docs/rfcs/0003-pipelines-promotion/promotions-solution.md deleted file mode 100644 index db8bb8c..0000000 --- a/docs/rfcs/0003-pipelines-promotion/promotions-solution.md +++ /dev/null @@ -1,182 +0,0 @@ -# RFC-0003 How promotions solution looks like - -<!-- -The title must be short and descriptive. ---> - -**Status:** provisional - -<!-- -Status represents the current state of the RFC. -Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejected`, `withdrawn`, or `replaced`. ---> - -**Creation date:** 2022-10-05 - -**Last update:** 2022-10-05 - -## Summary - -<!-- -One paragraph explanation of the proposed feature or enhancement. ---> - -Given a continuous delivery pipeline is comprised of diffferent environments the application goes trough in -its way to production, there is need for an action to move the application among environments. That concept is known as -promotion and it is a one of the core concepts of a pipelines domain. - -This RFC looks at different e2e solutions for promotions. - -## Terminology - -- **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed. -- **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline. - For example promote stating to production would attempt to deploy an application existing in staging environment to production environment. -- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. -- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. -- **Application**: A Helm Release. - -## Motivation - -<!-- -This section is for explicitly listing the motivation, goals, and non-goals of -this RFC. Describe why the change is important and the benefits to users. ---> - - -This RFC looks at different e2e solutions for promotions. - - -### Goals - -<!-- -List the specific goals of this RFC. What is it trying to achieve? How will we -know that this has succeeded? ---> - -- Discover different e2e solutions for promotions. -- Recommend the one that seems better suited for the role. - -### Non-Goals - -<!-- -What is out of scope for this RFC? Listing non-goals helps to focus discussion -and make progress. ---> -- Anything no promotions related - -## Solution Alternatives - -The following solutions has been identified -- Alternative A: weave gitops backend -- Alternative B: pipeline controller -- Alternative C: new service called promotions service -- Alternative D: cluster services + pipeline controller + promotion executor - -### Alternative A: weave gitops backend - -#### Diagram -```mermaid - sequenceDiagram - participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster - F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>WGE: process deployment notification - participant WGE as Weave Gitops Backend - participant k8s as Kubernetes Api - WGE->>k8s: get pipeline - WGE->>WGE: promotion business loic - participant k8s as Kubernetes Api - WGE->>configRepo: raise PR - participant configRepo as Configuration Repo -``` -#### Pro -- already setup and *should* be more easily exposed -- no need to generate TS client - -#### Cons -- service account needs extra permissions -- need to work around entitlements/user auth - -### Alternative B: pipeline controller - -#### Diagram -```mermaid - sequenceDiagram - participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster - F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>pc: process deployment notification - participant pc as Pipeline Controller - participant k8s as Kubernetes Api - pc->>k8s: get pipeline - pc->>pc: promotion business loic - participant k8s as Kubernetes Api - pc->>configRepo: raise PR - participant configRepo as Configuration Repo -``` -#### Pro -- separate service account and permissions -- easier to dev against - -#### Cons - -- need service/ingress resource to expose it -- feels weird for a controller to not run a reconcile loop and instead host a webhook server - -### Alternative C: new service called promotions service -#### Diagram -```mermaid - sequenceDiagram - participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster - F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>PS: process deployment notification - participant PS as Promotions Svc - participant k8s as Kubernetes Api - PS->>k8s: get pipeline - PS->>PS: promotion business loic - participant k8s as Kubernetes Api - PS->>configRepo: raise PR - participant configRepo as Configuration Repo -``` -#### Pro -- easiest to dev against - -#### Cons -- 1 more component for the team to maintain -- new repo/CI (?) - -### Alternative D: cluster services + pipeline controller + promotion executor -#### Diagram -```mermaid - sequenceDiagram - participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster - F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>WGE: process deployment notification - participant WGE as Weave Gitops Backend - participant k8s as Kubernetes Api - WGE->>k8s: write deployment event - participant pc as pipeline controller - k8s->>pc: watch deployment event & pipelines - pc->>pj: create promotion job - participant pj as promotion job - pj->>pj: promotion business logic - pj->>configRepo: raise PR - participant configRepo as Configuration Repo -``` -#### Pro -- separation of concerns with scalability and fault-tolerance by design - -#### Cons -- most complex solution (might be over complex?) -- kubernetes jobs not a popular choice \ No newline at end of file diff --git a/docs/rfcs/README.md b/docs/rfcs/README.md new file mode 100644 index 0000000..7192c5a --- /dev/null +++ b/docs/rfcs/README.md @@ -0,0 +1,8 @@ +# RFCs + +We love flux therefore we use learn and adopt as much as we can from the project. + +RFCs are not unique to flux but it being used by [Flux](https://github.com/fluxcd/flux2/tree/main/rfcs). + +In order to create an RFC, use [this template](https://github.com/fluxcd/flux2/blob/main/rfcs/RFC-0000/README.md) + From e2b44a534252732f597a1b76ee5ed933129cb0c5 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 6 Oct 2022 09:16:23 +0100 Subject: [PATCH 10/51] rfc promotions completed --- docs/rfcs/0003-pipelines-promotion/README.md | 193 +++++++++++-------- 1 file changed, 109 insertions(+), 84 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index f2e05bc..2937466 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -66,60 +66,22 @@ The proposed solution architecture is shown below. With three main responsibilities -1. Detect and communicate deployment changes -2. Process deployment change and determine promotions needs +1. Notify deployment changes +2. Determine whether a promotion is needed 3. Execute the promotion -### Detect and communicate deployment changes +### Notify deployment changes The solution leverages [flux native notification capabilities](https://fluxcd.io/flux/components/notification/) for this responsibility. An evaluation of different alternatives solutions to this concern could be found [here](detect-deployment-changes.md). -### Process deployment change and determine promotions needs +### Determine whether a promotion is needed This responsibility is assumed by pipeline controller living in the management cluster that - would expose a webhook to ingest deployment change events. - process concurrently these requests - determine whether at the back of the event and a pipeline definition, a promotion is required. -#### Promotions within pipeline spec - -In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below - -```yaml -apiVersion: pipelines.weave.works/v1alpha1 -kind: Pipeline -metadata: - name: podinfo - namespace: default -spec: - appRef: - apiVersion: helm.toolkit.fluxcd.io/v2beta1 - kind: HelmRelease - name: podinfo - promotion: - - name: promote-via-pr - type: pull-request - url: git@github.com:organization/repo - branch: main - secretRef: my-other-deployed-secret - environments: - - name: dev - targets: - - namespace: podinfo - clusterRef: - kind: GitopsCluster - name: dev -``` -The promotion field used to capture the promotion tasks for the next environment in the pipeline after a successful deployment has taken place. -Each task will include the following fields: - -- `name`: the task name -- `type`: the task type, either webhook or pull-request -- `url` : the git repository url or the webhook url -- `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request) -- `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook. - ### To execute the promotion Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge @@ -137,17 +99,16 @@ On the flip side, the solution has the following constraints: - non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. -### Alternatives +## Alternatives + +Other alternatives solutions have been discovered and discussed -The following solutions has been identified -- Alternative A: weave gitops backend -- Alternative B: pipeline controller -- Alternative C: new service called promotions service -- Alternative D: cluster services + pipeline controller + promotion executor +- Alternative A: to use weave gitops api +- Alternative B: create a new service - promotions service +- Alternative C: weave gitops api + pipeline controller + promotion executor -### Alternative A: weave gitops backend +### Alternative A: weave gitops api -#### Diagram ```mermaid sequenceDiagram participant F as Flux @@ -164,17 +125,64 @@ The following solutions has been identified WGE->>configRepo: raise PR participant configRepo as Configuration Repo ``` -#### Pro -- already setup and *should* be more easily exposed -- no need to generate TS client -#### Cons -- service account needs extra permissions -- need to work around entitlements/user auth +This solution is different from `pipeline controller` in that the three responsibilities + +1. Notify deployment changes +2. Determine whether a promotion is needed +3. Execute the promotion + +are fulfilled within weave gitops backend app. +**Pro** +- Already setup and *should* be more easily exposed. +- No need to manage other exposed surface, therefore less to secure. +- No need to generate TS client + +**Cons** +- Notifier service account needs permissions for promotion resources. + +### Alternative B: weave gitops api + pipeline controller + promotion executor + +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Leaf Cluster + participant MC as Management Cluster + F->>LC: deploy helm release + LC->>MC: notify deployment via notification controller + MC->>WGE: process deployment notification + participant WGE as Weave Gitops Backend + participant k8s as Kubernetes Api + WGE->>k8s: write deployment event + participant pc as pipeline controller + k8s->>pc: watch deployment event & pipelines + pc->>pj: create promotion job + participant pj as promotion job + pj->>pj: promotion business logic + pj->>configRepo: raise PR + participant configRepo as Configuration Repo +``` + +This solution is different from `pipeline controller` in that the three responsibilities are split + +1. Notify deployment changes: ingestion is done via weave gitops api. +2. Determine whether a promotion is needed: pipeline controller watches for changes in pipeline. +3. Execute the promotion: extracted to a kubernetes job layer. + +**Pro** +- Already setup and *should* be more easily exposed. +- No need to manage other exposed surface, therefore less to secure. +- No need to generate TS client +- Separation of concerns with scalability and fault-tolerance by design + +**Cons** +- Needs to write in pipeline resource +- Most complex solution +- Kubernetes jobs not a popular choice ### Alternative C: new service called promotions service -#### Diagram + ```mermaid sequenceDiagram participant F as Flux @@ -191,37 +199,54 @@ The following solutions has been identified PS->>configRepo: raise PR participant configRepo as Configuration Repo ``` -#### Pro +**Pro** - easiest to dev against -#### Cons +**Cons** - 1 more component for the team to maintain - new repo/CI (?) -### Alternative D: cluster services + pipeline controller + promotion executor -#### Diagram -```mermaid - sequenceDiagram - participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster - F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>WGE: process deployment notification - participant WGE as Weave Gitops Backend - participant k8s as Kubernetes Api - WGE->>k8s: write deployment event - participant pc as pipeline controller - k8s->>pc: watch deployment event & pipelines - pc->>pj: create promotion job - participant pj as promotion job - pj->>pj: promotion business logic - pj->>configRepo: raise PR - participant configRepo as Configuration Repo +## Design Details + +### Pipeline spec changes for promotions + +In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: podinfo + namespace: default +spec: + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + promotion: + - name: promote-via-pr + type: pull-request + url: git@github.com:organization/repo + branch: main + secretRef: my-other-deployed-secret + environments: + - name: dev + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: dev ``` -#### Pro -- separation of concerns with scalability and fault-tolerance by design +The promotion field used to capture the promotion tasks for the next environment in the pipeline after a successful deployment has taken place. +Each task will include the following fields: + +- `name`: the task name +- `type`: the task type, either webhook or pull-request +- `url` : the git repository url or the webhook url +- `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request) +- `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook. + + +## Implementation History -#### Cons -- most complex solution (might be over complex?) -- kubernetes jobs not a popular choice \ No newline at end of file +- [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589) \ No newline at end of file From ad93f0ac44926d6a82500ff4fbef83cab2c13da4 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 6 Oct 2022 09:41:21 +0100 Subject: [PATCH 11/51] section on nfrs --- docs/rfcs/0003-pipelines-promotion/README.md | 35 ++++++++++++++++++++ 1 file changed, 35 insertions(+) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 2937466..67bd71e 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -98,6 +98,37 @@ On the flip side, the solution has the following constraints: - there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. - non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. +### Non-functional requirements + +Here we try to provide to anticipate some of the non functional requirements + +#### Security + +Promotions have a couple of activities that requires to drill down in terms of security: + +1. communication of deployment changes via webhook so over the network. +2. to create pull requests, so write access to gitops configuration repo. + +**Security for deployment changes via webhook** + +//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594 + +**Security for pull request creation** + +//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594 + +#### Scalability + +The initial strategy to scale the solution by number of request, would be vertically by using goroutines. + +#### Reliability + +It will be implemented as part of the business logic of pipeline controller. + +#### Monitoring + +To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need +to enhance default controller metrics with business metrics like `latency of a promtion by application`. ## Alternatives @@ -208,6 +239,10 @@ This solution is different from `pipeline controller` in that the three responsi ## Design Details +### Promotions Webhook + +//TBA added at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594 + ### Pipeline spec changes for promotions In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below From d4165b10d7414fc7510bd0579bf24e29ae209313 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 6 Oct 2022 15:27:17 +0100 Subject: [PATCH 12/51] RFC and ADR ready for draft --- ...yments.md => 0013-pipelines-promotions.md} | 53 ++++++++++++------- .../0014-pipelines-promotions-solution.md | 39 -------------- docs/rfcs/0003-pipelines-promotion/README.md | 29 +++++----- 3 files changed, 47 insertions(+), 74 deletions(-) rename docs/adrs/{0013-pipelines-detect-deployments.md => 0013-pipelines-promotions.md} (56%) delete mode 100644 docs/adrs/0014-pipelines-promotions-solution.md diff --git a/docs/adrs/0013-pipelines-detect-deployments.md b/docs/adrs/0013-pipelines-promotions.md similarity index 56% rename from docs/adrs/0013-pipelines-detect-deployments.md rename to docs/adrs/0013-pipelines-promotions.md index 39f0e45..7ede7ee 100644 --- a/docs/adrs/0013-pipelines-detect-deployments.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -1,4 +1,4 @@ -# 13. Pipelines - How to detect deployments +# 13. Pipelines Promotions ## Status Proposed @@ -11,17 +11,34 @@ covering the ability to view an application deployed across different environmen The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims to enable promotions between environments. -As part of the promotions capabilities, there is the need to detect when a deployment has occurred with not only -an approach to do it. During the discovery of the second iteration, two models has been spiked: +This ADR records the major decisions taken during its design. -Detect deployment changes via +## Decision -- [Watching](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) -- [Alert](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) +### Promotions solution -## Decision +As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternatives were discussed: + +- weave gitops +- pipelines controller +- weave gitops + pipeline controller + promotion executor +- new service + +The `pipeline controller` solution has been chosen over its alternatives (see alternatives section) due to + +- it enables promotions. +- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. +- it is easier to develop over other alternatives. + +On the flip side, the solution has the following constraints: -As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each of approaches has associated unknowns. +- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. +- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. + + +### Deployment Change + +As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each of approaches has associated unknowns. The major ones are: - Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments @@ -29,23 +46,21 @@ The major ones are: - Watching: how reliable the solution could be as not having existing examples of products using it for watching remote clusters. We envision weave gitops as needs to be a flexible solution that eventually would need to support both approaches -to accommodate the range of potential enterprises using weave gitops. +to accommodate the range of potential enterprises using weave gitops. -In order to start with one of the approaches, we have decided to start by `webhooks` solution due to: +In order to start with one of the approaches, we have decided to start by `webhooks` solution due to: -- Allow us to provide promotions for wge customers based on our own promotions capability with better scalability approach. -- Reinforces the vision of weave gitops being a continuum of Flux by using Flux core components, in this context, [notification +- Allow us to provide promotions for wge customers based on our own promotions capability with better scalability approach. +- Reinforces the vision of weave gitops being a continuum of Flux by using Flux core components, in this context, [notification controller](https://fluxcd.io/flux/components/notification/), to provide the basic building blocks around deployment notification. ## Consequences -As mentioned in the decision, the following consequences of the decision - - A path forward for pipelines to deliver promotions capability. Sunglow could deliver promotions based on this approach. -- A risk to manage in the context of customer adoption: the network path opened. Sunglow would need to establish the customer feedback -loop with SAs/CXs to manage and mitigate the risk once it happens. Same for security -- A scenario further to develop: existing CI scenarios based on the approach. Sunglow would need to use customer feedback to -determine which existing systems are of relevance to provide the integration experience. - +- A risk to manage in the context of customer adoption: the network path opened. + - Sunglow would need to establish the customer feedback loop with SAs/CXs to manage and mitigate the risk once it happens. + - Same for security. +- A scenario further to develop: existing CI scenarios based on the approach. Sunglow would need to use customer feedback to + determine which existing systems are of relevance to provide the integration experience. diff --git a/docs/adrs/0014-pipelines-promotions-solution.md b/docs/adrs/0014-pipelines-promotions-solution.md deleted file mode 100644 index 9663acd..0000000 --- a/docs/adrs/0014-pipelines-promotions-solution.md +++ /dev/null @@ -1,39 +0,0 @@ -# 13. Pipelines Promotions - -## Status -Proposed - -## Context -As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where -[first iteration has been delivered](https://docs.gitops.weave.works/docs/next/enterprise/pipelines/intro/index.html) -covering the ability to view an application deployed across different environments. - -The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims -to enable promotions between environments. - -Once defined how to [detect a deployment change](0013-pipelines-detect-deployments.md), this ADR defines -how the solution e2e looks in terms of architecture. - -## Decision - -As [discussed in RFC](../rfcs/0003-pipelines-promotion/promotions-solution.md) four alternatives were discussed: - -- Alternative A: weave gitops backend -- Alternative B: pipelines controller -- Alternative C: new service called promotions service -- Alternative D: cluster services + pipeline controller + promotion executor - -From the alternatives, promotions solution would be implemented using -alternative B, pipelines controller, as - -//TODO - - -## Consequences - -As mentioned in the decision, the following consequences of the decision - -- A path forward for pipelines promotions e2e. - - - diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 67bd71e..dbf8e66 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -1,19 +1,17 @@ -# RFC-0003 Promotion capability for pipelines +# RFC-0003 Pipeline promotions **Status:** provisional -**Creation date:** 2022-10-05 +**Creation date:** 2022-10 **Last update:** 2022-10-05 ## Summary -Given a continuous delivery pipeline is comprised of diffferent environments the application goes trough in -its way to production, there is need for an action to move the application among environments. That concept is known as -promotion and it is a one of the core concepts of a pipelines domain. Current pipelines in weave gitops -does not support promotion. - -This RFC addresses it as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) +Given a continuous delivery pipeline, the application goes via different environments, in its way to production. We +need an action to efectively move applications beteween environments. That concept is generally known as a +promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap +as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) ## Terminology @@ -26,12 +24,10 @@ This RFC addresses it as specified in the [product initiative](https://www.notio ## Motivation -Given a continuous delivery pipeline is comprised of diffferent environments the application goes trough in -its way to production, there is need for an action to move the application among environments. That concept is known as -promotion and it is a one of the core concepts of a pipelines domain. Current pipelines in weave gitops -does not support promotion. - -This RFC addresses it as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) +Given a continuous delivery pipeline, the application goes via different environments, in its way to production. We +need an action to efectively move applications beteween environments. That concept is generally known as a +promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap +as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) ### Goals @@ -47,6 +43,7 @@ This RFC addresses it as specified in the [product initiative](https://www.notio The proposed solution architecture is shown below. +//TODO: review diagram ```mermaid sequenceDiagram participant F as Flux @@ -100,7 +97,8 @@ On the flip side, the solution has the following constraints: ### Non-functional requirements -Here we try to provide to anticipate some of the non functional requirements +As enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure +that no major impediments are found in the future. #### Security @@ -281,7 +279,6 @@ Each task will include the following fields: - `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request) - `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook. - ## Implementation History - [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589) \ No newline at end of file From ee90b898f6c0e8a783bd36c3c831d981e00a863a Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Fri, 7 Oct 2022 07:56:59 +0100 Subject: [PATCH 13/51] adr updated --- docs/adrs/0013-pipelines-promotions.md | 20 +++++++++++--------- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index 7ede7ee..967dbf5 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -11,11 +11,14 @@ covering the ability to view an application deployed across different environmen The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims to enable promotions between environments. -This ADR records the major decisions taken during its design. +This ADR records a couple of decision we think are important: + +- how the promotion solutions looks like end to end. +- how deployment changes are detected. ## Decision -### Promotions solution +### How promotions solution looks like end to end As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternatives were discussed: @@ -35,10 +38,10 @@ On the flip side, the solution has the following constraints: - there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. - non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. - -### Deployment Change +### How deployment changes are detected As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each of approaches has associated unknowns. + The major ones are: - Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments @@ -57,10 +60,9 @@ In order to start with one of the approaches, we have decided to start by `webho ## Consequences - A path forward for pipelines to deliver promotions capability. Sunglow could deliver promotions based on this approach. -- A risk to manage in the context of customer adoption: the network path opened. - - Sunglow would need to establish the customer feedback loop with SAs/CXs to manage and mitigate the risk once it happens. - - Same for security. -- A scenario further to develop: existing CI scenarios based on the approach. Sunglow would need to use customer feedback to - determine which existing systems are of relevance to provide the integration experience. +- A set of further actions needs to be risks that needs management: + - To manage the risk associated with the network flow between leaf to management cluster for deployment notifications. + - To determine concrete CI scenarios that we need to integrate with. + - To discover the reliability aspects of the watchers approach to understand its feasibility. From 21a4bab73e8f7ed0dd8e61e069210145e8f2223c Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Fri, 7 Oct 2022 08:43:14 +0100 Subject: [PATCH 14/51] promotions rfc reviewed --- docs/rfcs/0003-pipelines-promotion/README.md | 128 ++++++++++--------- 1 file changed, 71 insertions(+), 57 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index dbf8e66..3938e2a 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -8,8 +8,8 @@ ## Summary -Given a continuous delivery pipeline, the application goes via different environments, in its way to production. We -need an action to efectively move applications beteween environments. That concept is generally known as a +Given a continuous delivery pipeline, the application goes via different environments in its way to production. We +need an action to sign the intent of deploying an application between environments. That concept is generally known as a promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) @@ -24,8 +24,8 @@ as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeli ## Motivation -Given a continuous delivery pipeline, the application goes via different environments, in its way to production. We -need an action to efectively move applications beteween environments. That concept is generally known as a +Given a continuous delivery pipeline, the application goes via different environments in its way to production. We +need an action to sign the intent of deploying an application between environments. That concept is generally known as a promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) @@ -40,19 +40,16 @@ as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeli - Scenarios other than the identified in the product initiative. ## Proposal +We propose to use a solution as specified in the following diagram. -The proposed solution architecture is shown below. - -//TODO: review diagram ```mermaid sequenceDiagram participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster + participant LC as Notification Controller (Leaf) F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>pc: process deployment notification - participant pc as Pipeline Controller + LC->>pc: send deployment change event + participant pc as Pipeline Controller (Managment) + pc->>pc: authz and validate event participant k8s as Kubernetes Api pc->>k8s: get pipeline pc->>pc: promotion business loic @@ -61,7 +58,7 @@ The proposed solution architecture is shown below. participant configRepo as Configuration Repo ``` -With three main responsibilities +With three main activities 1. Notify deployment changes 2. Determine whether a promotion is needed @@ -74,9 +71,9 @@ An evaluation of different alternatives solutions to this concern could be found ### Determine whether a promotion is needed -This responsibility is assumed by pipeline controller living in the management cluster that +This responsibility is assumed by `pipeline controller` living in the management cluster that - would expose a webhook to ingest deployment change events. -- process concurrently these requests +- process concurrently the deployment events - determine whether at the back of the event and a pipeline definition, a promotion is required. ### To execute the promotion @@ -84,17 +81,6 @@ This responsibility is assumed by pipeline controller living in the management c Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge of orchestrating and executing the task according to its configuration. -The current solution has been chosen over its alternatives (see alternatives section) due to - -- it enables promotions. -- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. -- it is easier to develop over other alternatives. - -On the flip side, the solution has the following constraints: - -- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. -- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. - ### Non-functional requirements As enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure @@ -128,30 +114,43 @@ It will be implemented as part of the business logic of pipeline controller. To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need to enhance default controller metrics with business metrics like `latency of a promtion by application`. +### Why this solution + +The current solution has been chosen over its alternatives (see alternatives section) due to + +- it enables promotions. +- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. +- it is easier to develop over other alternatives. + +On the flip side, the solution has the following constraints: + +- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. +- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. + ## Alternatives -Other alternatives solutions have been discovered and discussed +Other alternatives solutions have been discovered and discussed. They difference among them is around +the component serving the promotion logic, therefore the alternatives names are based on it. -- Alternative A: to use weave gitops api -- Alternative B: create a new service - promotions service -- Alternative C: weave gitops api + pipeline controller + promotion executor +- Alternative A: weave gitops backend +- Alternative B: weave gitops api + pipeline controller + promotion executor +- Alternative C: promotions service (new service) -### Alternative A: weave gitops api +### Alternative A: weave gitops backend ```mermaid sequenceDiagram participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster + participant LC as Notification Controller (Leaf) F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>WGE: process deployment notification - participant WGE as Weave Gitops Backend + LC->>wge: send deployment change event + participant wge as Weave Gitops Backend (Managment) + wge->>wge: authz and validate event participant k8s as Kubernetes Api - WGE->>k8s: get pipeline - WGE->>WGE: promotion business loic + wge->>k8s: get pipeline + wge->>wge: promotion business loic participant k8s as Kubernetes Api - WGE->>configRepo: raise PR + wge->>configRepo: raise PR participant configRepo as Configuration Repo ``` @@ -177,11 +176,9 @@ are fulfilled within weave gitops backend app. sequenceDiagram participant F as Flux participant LC as Leaf Cluster - participant MC as Management Cluster F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>WGE: process deployment notification - participant WGE as Weave Gitops Backend + LC->>WGE: notify deployment via notification controller + participant WGE as Weave Gitops API participant k8s as Kubernetes Api WGE->>k8s: write deployment event participant pc as pipeline controller @@ -195,32 +192,33 @@ are fulfilled within weave gitops backend app. This solution is different from `pipeline controller` in that the three responsibilities are split -1. Notify deployment changes: ingestion is done via weave gitops api. +1. Notify deployment changes: ingestion is done via weave gitops api. the event is written in pipeline resource. 2. Determine whether a promotion is needed: pipeline controller watches for changes in pipeline. 3. Execute the promotion: extracted to a kubernetes job layer. **Pro** -- Already setup and *should* be more easily exposed. -- No need to manage other exposed surface, therefore less to secure. +- Using ingestion layer so not increased operational costs. - No need to generate TS client -- Separation of concerns with scalability and fault-tolerance by design +- Pipeline controller with reconcile loop so canonical usage. +- Scalability and fault-tolerance by design. **Cons** - Needs to write in pipeline resource - Most complex solution - Kubernetes jobs not a popular choice -### Alternative C: new service called promotions service +### Alternative C: promotions service + +This solution is a simplified approach to pipeline controller with only the promotion responsibility. ```mermaid sequenceDiagram participant F as Flux - participant LC as Leaf Cluster - participant MC as Management Cluster + participant LC as Notification Controller (Leaf) F->>LC: deploy helm release - LC->>MC: notify deployment via notification controller - MC->>PS: process deployment notification - participant PS as Promotions Svc + LC->>PS: notify deployment via notification controller + participant PS as Promotions Svc (Management) + PS->>PS: authz and validate event participant k8s as Kubernetes Api PS->>k8s: get pipeline PS->>PS: promotion business loic @@ -230,6 +228,7 @@ This solution is different from `pipeline controller` in that the three responsi ``` **Pro** - easiest to dev against +- no controller so no reconcile loop executed **Cons** - 1 more component for the team to maintain @@ -237,10 +236,6 @@ This solution is different from `pipeline controller` in that the three responsi ## Design Details -### Promotions Webhook - -//TBA added at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594 - ### Pipeline spec changes for promotions In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below @@ -279,6 +274,25 @@ Each task will include the following fields: - `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request) - `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook. +### Promotions Webhook + +The endpoint should receive webhook requests to indicate a promotion of an environment. + +Each environment of each pipeline has its own webhook URL for triggering a promotion: + +``` +/pipelines/promotions/{namespace}/{name}/{environment} +``` + +When a request is received, the handler will look up the environment in the pipeline to: + +- `authz` the request via hmac +- `validate` the promotion +- `lookup and execute` the promotion actions + +The handler needs to run with it own set of permissions (not user permissions) to be able +to read app versions across environments in a pipeline. + ## Implementation History - [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589) \ No newline at end of file From 0ff3a151dff34c51fbff8005ff19f77a8cc3b8e1 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Fri, 7 Oct 2022 08:49:36 +0100 Subject: [PATCH 15/51] change detection rfc updated --- .../detect-deployment-changes.md | 195 ++++++++---------- 1 file changed, 89 insertions(+), 106 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index 837446e..f2fd440 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -17,15 +17,10 @@ Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejec ## Summary -<!-- -One paragraph explanation of the proposed feature or enhancement. ---> - -Given a continious delivery pipeline is comprised of diffferent environments the application goes trough in -its way to production, there is need for an action to move the application among environments. That concept is known as -promotion and it is a one of the core concepts of a pipelines domain. - -This RFC looks at different designs for notifying that a deployment has happened in order to trigger a promotion (if needed). +Given a continuous delivery pipeline, the application goes via different environments in its way to production. We +need an action to sign the intent of deploying an application between environments. That concept is generally known as a +promotion. Current pipelines in weave gitops does not support promotion. This RFC looks at different designs for notifying +that a deployment has happened in order to trigger a promotion (if needed). ## Terminology @@ -38,14 +33,10 @@ For example promote stating to production would attempt to deploy an application ## Motivation -<!-- -This section is for explicitly listing the motivation, goals, and non-goals of -this RFC. Describe why the change is important and the benefits to users. ---> - - -This RFC looks at different designs for notifying that a deployment has happened in order to trigger a promotion (if needed). - +Given a continuous delivery pipeline, the application goes via different environments in its way to production. We +need an action to sign the intent of deploying an application between environments. That concept is generally known as a +promotion. Current pipelines in weave gitops does not support promotion. This RFC looks at different designs for notifying +that a deployment has happened in order to trigger a promotion (if needed). ### Goals @@ -66,84 +57,15 @@ and make progress. --> - Anything related to processing the deployment notification. -## Comparison - -<!-- -This is where we get down to the specifics of what the proposal actually is. -This should have enough detail that reviewers can understand exactly what -you're proposing, but should not include things like API designs or -implementation. - -If the RFC goal is to document best practices, -then this section can be replaced with the actual documentation. ---> - - -### Watchers approach - -This approach suggests the creation of [kubernetes watchers](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes) -per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes -and take an action to start the next promotion based on the Pipeline definition. - -[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) - -#### Sequence diagram - -```mermaid - sequenceDiagram - actor U as operator - U->>+API Server: creates Pipeline - participant PC as Pipeline Controller - participant PS as Promotion Strategy - API Server->>+PC: notifies - participant dt1 as dev/target 1 - - rect rgb(67, 207, 250) - note right of PC: setup phase - note right of PC: pipelines.wego.weave.works/name<br/>pipelines.wego.weave.works/env<br/>pipelines.wego.weave.works/target - PC->>+dt1: label AppRef with metadata - participant dt2 as dev/target 2 - PC->>+dt2: label AppRef with metadata - participant pt1 as prod/target 1 - PC->>+pt1: label AppRef with metadata - end - - rect rgb(50, 227, 221) - note right of PC: promotion phase - PC-->>+dt1: watches HelmRelease and Kustomizations changes - PC-->>+dt2: watches HelmRelease and Kustomizations changes - PC-->>+pt1: watches HelmRelease and Kustomizations changes - end - - - dt1->>+PC: update events from AppRef - PC ->>PC: filter upgrade events - PC ->>PC: extract metadata - PC->>+PS: kicks off - ``` - -#### Advantages - -1. Plug n play: no further configurations or setup is needed to get updates. -1. Simple authentication: No need to worry about who triggered the event, since we are talking directly with the target. - +## Proposal -#### Disadvantages and Mitigations - -1. Requires Flux on all leaf clusters. -1. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. -1. There is no way to kick off promotions externally - -### Alert approach - -This approach suggests the use of Flux [notification controller](https://fluxcd.io/flux/components/notification/) running on the remote cluster. -An [alert](https://fluxcd.io/flux/components/notification/alert/) / [provider](https://fluxcd.io/flux/components/notification/provider/) +This approach suggests the use of Flux [notification controller](https://fluxcd.io/flux/components/notification/) running on the remote cluster. +An [alert](https://fluxcd.io/flux/components/notification/alert/) / [provider](https://fluxcd.io/flux/components/notification/provider/) would be setup to call a webhook running on the management cluster to notify a Helm release change in a remote cluster. [Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) - -#### Sequence diagram +### Sequence diagram ```mermaid sequenceDiagram @@ -172,7 +94,7 @@ would be setup to call a webhook running on the management cluster to notify a H end ``` -#### Example Event +### Example Event ```json { @@ -196,6 +118,7 @@ would be setup to call a webhook running on the management cluster to notify a H "reportingInstance": "helm-controller-7cdc7874f8-9qpft" } ``` +### Evaluation #### Advantages @@ -206,38 +129,98 @@ would be setup to call a webhook running on the management cluster to notify a H #### Disadvantages and Mitigations 1. Requires Flux on all leaf clusters. _Mitigations: ?_ -2. Authenticity of events needs to be taken care of. -_Mitigations: add authentication and authorization to the webhook; verify event by reaching out to leaf cluster_ -4. Network connectivity from all leaf clusters to management cluster necessary. -_Mitigations: promotion can be kicked from any external system so if using notification-controller would not work, -an external CI system could trigger promotion instead._ +2. Authenticity of events needs to be taken care of. + _Mitigations: add authentication and authorization to the webhook; verify event by reaching out to leaf cluster_ +4. Network connectivity from all leaf clusters to management cluster necessary. + _Mitigations: promotion can be kicked from any external system so if using notification-controller would not work, + an external CI system could trigger promotion instead._ #### Known Unknowns 1. How does p-c set the correct Provider address? 2. Configuration (user burden) 3. Automatic determination (might get complicated quick to account for the different environments (with/without Ingress, external LB, ...) -2. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' -API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). -Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, -(2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. -An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) -and the management cluster's Git repository. +2. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' + API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). + Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, + (2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. + An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) + and the management cluster's Git repository. #### Further Considerations ##### delivery semantics/failure scenarios recovery for notifications -The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's +The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's only configurable globally with a default of 5m. This might lead to events not being emitted to the webhook. notification-controller has [at-most once delivery semantics](https://github.com/fluxcd/notification-controller/tree/main/docs/spec#events-dispatching-1): -> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for -> connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry +> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for +> connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry > sending an alert for four times with an exponential backoff of maximum 30 seconds. #### enrichment of events for custom metadata The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events -by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". \ No newline at end of file +by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". + + +## Alternatives + +### Watchers + +This approach suggests the creation of [kubernetes watchers](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes) +per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes +and take an action to start the next promotion based on the Pipeline definition. + +[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) + +#### Sequence diagram + +```mermaid + sequenceDiagram + actor U as operator + U->>+API Server: creates Pipeline + participant PC as Pipeline Controller + participant PS as Promotion Strategy + API Server->>+PC: notifies + participant dt1 as dev/target 1 + + rect rgb(67, 207, 250) + note right of PC: setup phase + note right of PC: pipelines.wego.weave.works/name<br/>pipelines.wego.weave.works/env<br/>pipelines.wego.weave.works/target + PC->>+dt1: label AppRef with metadata + participant dt2 as dev/target 2 + PC->>+dt2: label AppRef with metadata + participant pt1 as prod/target 1 + PC->>+pt1: label AppRef with metadata + end + + rect rgb(50, 227, 221) + note right of PC: promotion phase + PC-->>+dt1: watches HelmRelease and Kustomizations changes + PC-->>+dt2: watches HelmRelease and Kustomizations changes + PC-->>+pt1: watches HelmRelease and Kustomizations changes + end + + + dt1->>+PC: update events from AppRef + PC ->>PC: filter upgrade events + PC ->>PC: extract metadata + PC->>+PS: kicks off + ``` + +#### Evaluation + +**Advantages** + +1. Plug n play: no further configurations or setup is needed to get updates. +1. Simple authentication: No need to worry about who triggered the event, since we are talking directly with the target. + + +**Disadvantages and Mitigations** + +1. Requires Flux on all leaf clusters. +1. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. +1. There is no way to kick off promotions externally \ No newline at end of file From 261354f9c0ff67ec3bf845eb536400dff8ea7a48 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 10 Oct 2022 10:51:09 +0100 Subject: [PATCH 16/51] completing arguments for pipeline controller --- docs/rfcs/0003-pipelines-promotion/README.md | 13 +++++++++---- 1 file changed, 9 insertions(+), 4 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 3938e2a..9403db4 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -2,9 +2,9 @@ **Status:** provisional -**Creation date:** 2022-10 +**Creation date:** 2022-10-xx -**Last update:** 2022-10-05 +**Last update:** 2022-10-xx ## Summary @@ -121,11 +121,14 @@ The current solution has been chosen over its alternatives (see alternatives sec - it enables promotions. - it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. - it is easier to develop over other alternatives. +- it follows [notification controller pattern](https://fluxcd.io/flux/guides/webhook-receivers/#expose-the-webhook-receiver) On the flip side, the solution has the following constraints: - there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. -- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. +- Non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. +We accept this tradeoff as pipeline controller provides us a balanced approach to start delivering the feature sooner over other +alternatives of creating a dedicated component. ## Alternatives @@ -168,7 +171,9 @@ are fulfilled within weave gitops backend app. - No need to generate TS client **Cons** -- Notifier service account needs permissions for promotion resources. +- Notifier service account needs permissions for promotion resources. +- Current api layer is designed (authz, entitlments, etc ) as an experience layer for weave gitops enterprise users while the promotion webhook +is intended to be used by a machine audience. ### Alternative B: weave gitops api + pipeline controller + promotion executor From 104f79fbc99255b97e557db4d4d763e2e5677056 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 10 Oct 2022 11:26:24 +0100 Subject: [PATCH 17/51] wording reviewed --- docs/adrs/0013-pipelines-promotions.md | 4 +- docs/rfcs/0003-pipelines-promotion/README.md | 41 ++++++++++++-------- 2 files changed, 27 insertions(+), 18 deletions(-) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index 967dbf5..c4a790b 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -22,10 +22,10 @@ This ADR records a couple of decision we think are important: As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternatives were discussed: -- weave gitops +- weave gitops backend - pipelines controller - weave gitops + pipeline controller + promotion executor -- new service +- promotions service The `pipeline controller` solution has been chosen over its alternatives (see alternatives section) due to diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 9403db4..c2079f1 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -95,11 +95,20 @@ Promotions have a couple of activities that requires to drill down in terms of s **Security for deployment changes via webhook** -//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594 +Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key +will be used for both authentication and authorization. Application teams will be able to specify the key to use within +the pipeline spec as a global value. Key management will be done by the application team. + +Both to simplify user experience for key management and other security configuration will be evolved over time. **Security for pull request creation** -//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594 +In order to create a pull request in the configuration repo, secrets will be required to both +- clone the git repo via http or ssh +- create the pull request via http api + +The secrets will be referenced as part of a pull request promotion task configuration. The lifecycle of the secrets +will be managed out of pipelines by the application team. #### Scalability @@ -118,17 +127,18 @@ to enhance default controller metrics with business metrics like `latency of a p The current solution has been chosen over its alternatives (see alternatives section) due to -- it enables promotions. -- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. -- it is easier to develop over other alternatives. -- it follows [notification controller pattern](https://fluxcd.io/flux/guides/webhook-receivers/#expose-the-webhook-receiver) +- It enables promotions. +- It allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. +- It follows [notification controller pattern](https://fluxcd.io/flux/guides/webhook-receivers/#expose-the-webhook-receiver). +- It is easier to develop over other alternatives. +- It keeps split user-experience and machine-experience apis. On the flip side, the solution has the following constraints: -- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. +- Need to manage another api surface. - Non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. -We accept this tradeoff as pipeline controller provides us a balanced approach to start delivering the feature sooner over other -alternatives of creating a dedicated component. + - We accept this tradeoff as pipeline controller provides us with a balanced approach between tech-debt and easy to start deliverying + over other alternatives (like creating another component). ## Alternatives @@ -172,8 +182,7 @@ are fulfilled within weave gitops backend app. **Cons** - Notifier service account needs permissions for promotion resources. -- Current api layer is designed (authz, entitlments, etc ) as an experience layer for weave gitops enterprise users while the promotion webhook -is intended to be used by a machine audience. +- Current api layer is designed as an experience layer for users (humans) while the promotion webhook is intended for machines. ### Alternative B: weave gitops api + pipeline controller + promotion executor @@ -214,7 +223,7 @@ This solution is different from `pipeline controller` in that the three responsi ### Alternative C: promotions service -This solution is a simplified approach to pipeline controller with only the promotion responsibility. +This solution would be to create a new component with the promotions responsibility. ```mermaid sequenceDiagram @@ -232,12 +241,12 @@ This solution is a simplified approach to pipeline controller with only the prom participant configRepo as Configuration Repo ``` **Pro** -- easiest to dev against -- no controller so no reconcile loop executed +- Easiest to dev against (vs api solution). +- No controller so no reconcile loop executed (vs pipeline controller solution). **Cons** -- 1 more component for the team to maintain -- new repo/CI (?) +- Ee would need to create it from scratch. +- One more component to manage. ## Design Details From 4d43115128c4d5d4c4e93a73af3404535b183738 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 10 Oct 2022 11:37:20 +0100 Subject: [PATCH 18/51] updated ADR justification --- docs/adrs/0013-pipelines-promotions.md | 14 +++++++++----- docs/rfcs/0003-pipelines-promotion/README.md | 4 +++- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index c4a790b..0a30e5e 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -29,14 +29,18 @@ As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternati The `pipeline controller` solution has been chosen over its alternatives (see alternatives section) due to -- it enables promotions. -- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. -- it is easier to develop over other alternatives. +- It enables promotions. +- It allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. +- It follows [notification controller pattern](https://fluxcd.io/flux/guides/webhook-receivers/#expose-the-webhook-receiver). +- It is easier to develop over other alternatives. +- It keeps split user-experience and machine-experience apis. On the flip side, the solution has the following constraints: -- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api. -- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. +- Need to manage another api surface. +- Non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. + - We accept this tradeoff as pipeline controller provides us with a balanced approach between tech-debt and easy to start delivering + over other alternatives (like creating another component). ### How deployment changes are detected diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index c2079f1..a046fb9 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -137,7 +137,7 @@ On the flip side, the solution has the following constraints: - Need to manage another api surface. - Non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. - - We accept this tradeoff as pipeline controller provides us with a balanced approach between tech-debt and easy to start deliverying + - We accept this tradeoff as pipeline controller provides us with a balanced approach between tech-debt and easy to start delivering over other alternatives (like creating another component). ## Alternatives @@ -265,6 +265,8 @@ spec: apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease name: podinfo + #used for hmac authz - this could change at implementation + secretRef: my-hmac-shared-secret promotion: - name: promote-via-pr type: pull-request From d532d5785202c2ee3df0a39907b9e214e7009ca2 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 10 Oct 2022 11:41:55 +0100 Subject: [PATCH 19/51] wording from PR review --- docs/rfcs/0003-pipelines-promotion/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index a046fb9..6dc3cd7 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -135,7 +135,7 @@ The current solution has been chosen over its alternatives (see alternatives sec On the flip side, the solution has the following constraints: -- Need to manage another api surface. +- Platform operators needs to manage another api surface, in this case, the `promotion webhook` endpoint. - Non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource. - We accept this tradeoff as pipeline controller provides us with a balanced approach between tech-debt and easy to start delivering over other alternatives (like creating another component). From f59cc5824f050027feb5ea7f3d155a2129cefc35 Mon Sep 17 00:00:00 2001 From: Yiannis <yiannis@weave.works> Date: Mon, 10 Oct 2022 13:46:40 +0100 Subject: [PATCH 20/51] Fix typos --- docs/adrs/0013-pipelines-promotions.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index 0a30e5e..c608adb 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -4,21 +4,21 @@ Proposed ## Context -As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where +As part of Weave GitOps Enterprise, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where [first iteration has been delivered](https://docs.gitops.weave.works/docs/next/enterprise/pipelines/intro/index.html) covering the ability to view an application deployed across different environments. The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims to enable promotions between environments. -This ADR records a couple of decision we think are important: +This ADR records a couple of decisions we think are important: -- how the promotion solutions looks like end to end. +- how the promotion solution looks like end to end. - how deployment changes are detected. ## Decision -### How promotions solution looks like end to end +### How the promotion solution looks like end to end As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternatives were discussed: From faa409df40f189ab03aa69020c213134b76fd119 Mon Sep 17 00:00:00 2001 From: Yiannis <yiannis@weave.works> Date: Mon, 10 Oct 2022 13:48:47 +0100 Subject: [PATCH 21/51] Fix typos --- docs/rfcs/0003-pipelines-promotion/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 6dc3cd7..274da0e 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -9,7 +9,7 @@ ## Summary Given a continuous delivery pipeline, the application goes via different environments in its way to production. We -need an action to sign the intent of deploying an application between environments. That concept is generally known as a +need an action to signal the intent of deploying an application between environments. That concept is generally known as a promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) @@ -17,7 +17,7 @@ as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeli - **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed. - **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline. - For example promote stating to production would attempt to deploy an application existing in staging environment to production environment. + For example promote staging to production would attempt to deploy an application existing in staging environment to production environment. - **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. - **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. - **Application**: A Helm Release. @@ -71,7 +71,7 @@ An evaluation of different alternatives solutions to this concern could be found ### Determine whether a promotion is needed -This responsibility is assumed by `pipeline controller` living in the management cluster that +This responsibility is assumed by the `pipeline controller` running in the management cluster that - would expose a webhook to ingest deployment change events. - process concurrently the deployment events - determine whether at the back of the event and a pipeline definition, a promotion is required. @@ -121,7 +121,7 @@ It will be implemented as part of the business logic of pipeline controller. #### Monitoring To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need -to enhance default controller metrics with business metrics like `latency of a promtion by application`. +to enhance default controller metrics with business metrics like `latency of a promotion by application`. ### Why this solution From 50353c846861393837bb5c65a45847b4fb46f873 Mon Sep 17 00:00:00 2001 From: David Harris <david.harris@weave.works> Date: Mon, 10 Oct 2022 14:27:50 +0100 Subject: [PATCH 22/51] update ADR - typos + more context --- docs/adrs/0013-pipelines-promotions.md | 15 ++++++++------- 1 file changed, 8 insertions(+), 7 deletions(-) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index c608adb..e0c267e 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -44,22 +44,23 @@ On the flip side, the solution has the following constraints: ### How deployment changes are detected -As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each of approaches has associated unknowns. +As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each approach has associated unknowns. The major ones are: -- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments - that it would suppose for customers while adopting the solution, as well its security management. +- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments that it would suppose for customers while adopting the solution, as well its security management. - Watching: how reliable the solution could be as not having existing examples of products using it for watching remote clusters. -We envision weave gitops as needs to be a flexible solution that eventually would need to support both approaches -to accommodate the range of potential enterprises using weave gitops. +We envision Weave GitOps will need to offer a flexible solution, and would eventually support both approaches +to accommodate the range of potential enterprise users. -In order to start with one of the approaches, we have decided to start by `webhooks` solution due to: +In order to optimise velocity, we are starting with one approach - the `webhooks` solution due to: -- Allow us to provide promotions for wge customers based on our own promotions capability with better scalability approach. +- It allows us to provide promotions for WGE customers with suspected better scalability. - Reinforces the vision of weave gitops being a continuum of Flux by using Flux core components, in this context, [notification controller](https://fluxcd.io/flux/components/notification/), to provide the basic building blocks around deployment notification. +- Leverages existing, tried-and-tested functionality from Flux to reduce amount of new functionality we need to write. +- Team is taking on responsibilities for Flux primitives, which includes Notification Controller related objects, and therefore presents a good opportunity to improve the UX for working with this capability. ## Consequences From d7750fb6243d95902eda18a25b17cd68d2d2e4ba Mon Sep 17 00:00:00 2001 From: David Harris <david.harris@weave.works> Date: Mon, 10 Oct 2022 14:40:23 +0100 Subject: [PATCH 23/51] typos --- docs/rfcs/0003-pipelines-promotion/README.md | 24 ++++++++++---------- 1 file changed, 12 insertions(+), 12 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 274da0e..d65833a 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -8,7 +8,7 @@ ## Summary -Given a continuous delivery pipeline, the application goes via different environments in its way to production. We +Given a continuous delivery pipeline, the application goes via different environments on its way to production. We need an action to signal the intent of deploying an application between environments. That concept is generally known as a promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) @@ -52,7 +52,7 @@ We propose to use a solution as specified in the following diagram. pc->>pc: authz and validate event participant k8s as Kubernetes Api pc->>k8s: get pipeline - pc->>pc: promotion business loic + pc->>pc: promotion business logic participant k8s as Kubernetes Api pc->>configRepo: raise PR participant configRepo as Configuration Repo @@ -71,9 +71,9 @@ An evaluation of different alternatives solutions to this concern could be found ### Determine whether a promotion is needed -This responsibility is assumed by the `pipeline controller` running in the management cluster that +This responsibility is assumed by the `pipeline controller` running in the management cluster that: - would expose a webhook to ingest deployment change events. -- process concurrently the deployment events +- process concurrently the deployment events. - determine whether at the back of the event and a pipeline definition, a promotion is required. ### To execute the promotion @@ -83,7 +83,7 @@ of orchestrating and executing the task according to its configuration. ### Non-functional requirements -As enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure +As an enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure that no major impediments are found in the future. #### Security @@ -161,13 +161,13 @@ the component serving the promotion logic, therefore the alternatives names are wge->>wge: authz and validate event participant k8s as Kubernetes Api wge->>k8s: get pipeline - wge->>wge: promotion business loic + wge->>wge: promotion business logic participant k8s as Kubernetes Api wge->>configRepo: raise PR participant configRepo as Configuration Repo ``` -This solution is different from `pipeline controller` in that the three responsibilities +This solution is different from `pipeline controller` in that the three responsibilities: 1. Notify deployment changes 2. Determine whether a promotion is needed @@ -206,7 +206,7 @@ are fulfilled within weave gitops backend app. This solution is different from `pipeline controller` in that the three responsibilities are split -1. Notify deployment changes: ingestion is done via weave gitops api. the event is written in pipeline resource. +1. Notify deployment changes: ingestion is done via weave gitops api. The event is written in pipeline resource. 2. Determine whether a promotion is needed: pipeline controller watches for changes in pipeline. 3. Execute the promotion: extracted to a kubernetes job layer. @@ -235,7 +235,7 @@ This solution would be to create a new component with the promotions responsibil PS->>PS: authz and validate event participant k8s as Kubernetes Api PS->>k8s: get pipeline - PS->>PS: promotion business loic + PS->>PS: promotion business logic participant k8s as Kubernetes Api PS->>configRepo: raise PR participant configRepo as Configuration Repo @@ -302,9 +302,9 @@ Each environment of each pipeline has its own webhook URL for triggering a promo When a request is received, the handler will look up the environment in the pipeline to: -- `authz` the request via hmac -- `validate` the promotion -- `lookup and execute` the promotion actions +- `authz` the request via hmac. +- `validate` the promotion. +- `lookup and execute` the promotion actions. The handler needs to run with it own set of permissions (not user permissions) to be able to read app versions across environments in a pipeline. From bb8d0a0ce1c89a117346b3f86c1a7284536de77b Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Tue, 11 Oct 2022 09:00:00 +0100 Subject: [PATCH 24/51] removed the part of the motivation that doesnt talk on problem statement --- docs/rfcs/0003-pipelines-promotion/README.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 6dc3cd7..d6c5c1a 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -26,8 +26,7 @@ as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeli Given a continuous delivery pipeline, the application goes via different environments in its way to production. We need an action to sign the intent of deploying an application between environments. That concept is generally known as a -promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap -as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) +promotion. Current pipelines in weave gitops does not support promotion. ### Goals From d5940d91054d4a0f431d661c14236d6007c5dc1d Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Tue, 11 Oct 2022 09:18:55 +0100 Subject: [PATCH 25/51] updated the token section to reflect the supported scenario --- docs/rfcs/0003-pipelines-promotion/README.md | 40 ++++++++++++++++---- 1 file changed, 32 insertions(+), 8 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index d6c5c1a..e4cebd7 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -100,14 +100,38 @@ the pipeline spec as a global value. Key management will be done by the applicat Both to simplify user experience for key management and other security configuration will be evolved over time. -**Security for pull request creation** +An example to visualise this configuration is shown below. -In order to create a pull request in the configuration repo, secrets will be required to both -- clone the git repo via http or ssh -- create the pull request via http api +```yaml + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + #used for hmac authz - this could change at implementation + secretRef: my-hmac-shared-secret +``` + +**Security for pull requests** + +In order to create a pull request in a configuration repo to action would be mainly required: -The secrets will be referenced as part of a pull request promotion task configuration. The lifecycle of the secrets -will be managed out of pipelines by the application team. +1. To clone the configuration git repo via http or ssh. +2. To create a pull request with promoted changes. + +Both actions would require a secret to use that ends in a combination of possible scenarios to eventually support. +This document assumes the simplest scenario possible which is having a single token for both +cloning via http and to create a pull request. The token will be present as kubernetes secrets and accessible by pipeline controller. + +An example to visualise this configuration is shown below. + +```yaml + promotion: + - name: promote-via-pr + type: pull-request + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR +``` #### Scalability @@ -269,9 +293,9 @@ spec: promotion: - name: promote-via-pr type: pull-request - url: git@github.com:organization/repo + url: https://github.com/organisation/gitops-configuration-monorepo.git branch: main - secretRef: my-other-deployed-secret + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR environments: - name: dev targets: From d53e62f435af8b15830623d9f6b97c987093593c Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Tue, 11 Oct 2022 09:27:46 +0100 Subject: [PATCH 26/51] added cons for api layer --- docs/rfcs/0003-pipelines-promotion/README.md | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index e4cebd7..8ce2116 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -201,11 +201,12 @@ are fulfilled within weave gitops backend app. **Pro** - Already setup and *should* be more easily exposed. - No need to manage other exposed surface, therefore less to secure. -- No need to generate TS client **Cons** - Notifier service account needs permissions for promotion resources. -- Current api layer is designed as an experience layer for users (humans) while the promotion webhook is intended for machines. +- Current api layer is designed as an experience layer for users (humans) while the promotion webhook is intended for machines. +- Extends the api layer with rest api so it would require to manage both grpc and rest apis that would increase maintainability costs. + ### Alternative B: weave gitops api + pipeline controller + promotion executor From dbbcd59f306b8b5bc362b54867016a912b83548d Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Tue, 11 Oct 2022 09:39:08 +0100 Subject: [PATCH 27/51] a bit more wordings around last alertnative --- docs/rfcs/0003-pipelines-promotion/README.md | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 5a837b0..abe7ebd 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -241,9 +241,10 @@ This solution is different from `pipeline controller` in that the three responsi - Scalability and fault-tolerance by design. **Cons** -- Needs to write in pipeline resource -- Most complex solution -- Kubernetes jobs not a popular choice +- Needs for writing the pipeline resource. +- The most complex alternative. +- To extract the promotion execution logic into an external component, would require to also create a management layer +between pipeline controller to the execution layer. ### Alternative C: promotions service From c3c09634b014dcdd032a1abaa60b898f6a235d10 Mon Sep 17 00:00:00 2001 From: Yiannis <yiannis@weave.works> Date: Tue, 11 Oct 2022 10:42:07 +0100 Subject: [PATCH 28/51] Small tweaks --- docs/adrs/0013-pipelines-promotions.md | 6 +++--- docs/rfcs/0003-pipelines-promotion/README.md | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index e0c267e..8a033eb 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -23,11 +23,11 @@ This ADR records a couple of decisions we think are important: As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternatives were discussed: - weave gitops backend -- pipelines controller +- pipeline controller - weave gitops + pipeline controller + promotion executor - promotions service -The `pipeline controller` solution has been chosen over its alternatives (see alternatives section) due to +The `pipeline controller` solution has been chosen over other alternatives (see alternatives section) due to - It enables promotions. - It allows to separations roles, therefore permissions between the components notifying the change and executing the promotion. @@ -48,7 +48,7 @@ As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes The major ones are: -- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments that it would suppose for customers while adopting the solution, as well its security management. +- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments that it would impose for customers while adopting the solution, as well its security management. - Watching: how reliable the solution could be as not having existing examples of products using it for watching remote clusters. We envision Weave GitOps will need to offer a flexible solution, and would eventually support both approaches diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index abe7ebd..19fa1cb 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -36,7 +36,7 @@ promotion. Current pipelines in weave gitops does not support promotion. ### Non-Goals - Anything beyond the scope of promotions. -- Scenarios other than the identified in the product initiative. +- Scenarios other than the ones identified in the product initiative. ## Proposal We propose to use a solution as specified in the following diagram. @@ -75,7 +75,7 @@ This responsibility is assumed by the `pipeline controller` running in the manag - process concurrently the deployment events. - determine whether at the back of the event and a pipeline definition, a promotion is required. -### To execute the promotion +### Execute the promotion Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge of orchestrating and executing the task according to its configuration. @@ -322,7 +322,7 @@ The endpoint should receive webhook requests to indicate a promotion of an envir Each environment of each pipeline has its own webhook URL for triggering a promotion: ``` -/pipelines/promotions/{namespace}/{name}/{environment} +/promotion/{namespace}/{name}/{environment} ``` When a request is received, the handler will look up the environment in the pipeline to: From 10fe75bb00bfe295aa0877b410ecef830422cef5 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 12 Oct 2022 15:15:03 +0100 Subject: [PATCH 29/51] rfc split in: overview doc + deeper detail doc by stage in the path --- docs/rfcs/0003-pipelines-promotion/README.md | 93 ++----------------- .../detect-deployment-changes.md | 26 +++++- .../determine-promotion-needs.md | 65 +++++++++++++ .../execute-promotion.md | 65 +++++++++++++ 4 files changed, 161 insertions(+), 88 deletions(-) create mode 100644 docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md create mode 100644 docs/rfcs/0003-pipelines-promotion/execute-promotion.md diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index abe7ebd..abf2f64 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -59,14 +59,15 @@ We propose to use a solution as specified in the following diagram. With three main activities -1. Notify deployment changes +1. Detect deployment changes 2. Determine whether a promotion is needed 3. Execute the promotion -### Notify deployment changes +### Detect deployment changes The solution leverages [flux native notification capabilities](https://fluxcd.io/flux/components/notification/) for this responsibility. -An evaluation of different alternatives solutions to this concern could be found [here](detect-deployment-changes.md). + +A deeper look into this part of the solution could be found [here](detect-deployment-changes.md). ### Determine whether a promotion is needed @@ -75,76 +76,15 @@ This responsibility is assumed by the `pipeline controller` running in the manag - process concurrently the deployment events. - determine whether at the back of the event and a pipeline definition, a promotion is required. +A deeper look into this part of the solution could be found [here](determine-promotion-needs.md). + ### To execute the promotion Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge of orchestrating and executing the task according to its configuration. -### Non-functional requirements - -As an enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure -that no major impediments are found in the future. - -#### Security - -Promotions have a couple of activities that requires to drill down in terms of security: - -1. communication of deployment changes via webhook so over the network. -2. to create pull requests, so write access to gitops configuration repo. - -**Security for deployment changes via webhook** - -Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key -will be used for both authentication and authorization. Application teams will be able to specify the key to use within -the pipeline spec as a global value. Key management will be done by the application team. - -Both to simplify user experience for key management and other security configuration will be evolved over time. - -An example to visualise this configuration is shown below. - -```yaml - appRef: - apiVersion: helm.toolkit.fluxcd.io/v2beta1 - kind: HelmRelease - name: podinfo - #used for hmac authz - this could change at implementation - secretRef: my-hmac-shared-secret -``` - -**Security for pull requests** - -In order to create a pull request in a configuration repo to action would be mainly required: - -1. To clone the configuration git repo via http or ssh. -2. To create a pull request with promoted changes. - -Both actions would require a secret to use that ends in a combination of possible scenarios to eventually support. -This document assumes the simplest scenario possible which is having a single token for both -cloning via http and to create a pull request. The token will be present as kubernetes secrets and accessible by pipeline controller. +A deeper look into this part of the solution could be found [here](execute-promotion.md). -An example to visualise this configuration is shown below. - -```yaml - promotion: - - name: promote-via-pr - type: pull-request - url: https://github.com/organisation/gitops-configuration-monorepo.git - branch: main - secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR -``` - -#### Scalability - -The initial strategy to scale the solution by number of request, would be vertically by using goroutines. - -#### Reliability - -It will be implemented as part of the business logic of pipeline controller. - -#### Monitoring - -To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need -to enhance default controller metrics with business metrics like `latency of a promotion by application`. ### Why this solution @@ -315,25 +255,6 @@ Each task will include the following fields: - `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request) - `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook. -### Promotions Webhook - -The endpoint should receive webhook requests to indicate a promotion of an environment. - -Each environment of each pipeline has its own webhook URL for triggering a promotion: - -``` -/pipelines/promotions/{namespace}/{name}/{environment} -``` - -When a request is received, the handler will look up the environment in the pipeline to: - -- `authz` the request via hmac. -- `validate` the promotion. -- `lookup and execute` the promotion actions. - -The handler needs to run with it own set of permissions (not user permissions) to be able -to read app versions across environments in a pipeline. - ## Implementation History - [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589) \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index f2fd440..b26b9d0 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -222,5 +222,27 @@ and take an action to start the next promotion based on the Pipeline definition. **Disadvantages and Mitigations** 1. Requires Flux on all leaf clusters. -1. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. -1. There is no way to kick off promotions externally \ No newline at end of file +2. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. +3. There is no way to kick off promotions externally + +## Design Details + + +### Promotions Webhook + +The endpoint should receive webhook requests to indicate a promotion of an environment. + +Each environment of each pipeline has its own webhook URL for triggering a promotion: + +``` +/pipelines/promotions/{namespace}/{name}/{environment} +``` + +When a request is received, the handler will look up the environment in the pipeline to: + +- `authz` the request via hmac. +- `validate` the promotion. +- `lookup and execute` the promotion actions. + +The handler needs to run with it own set of permissions (not user permissions) to be able +to read app versions across environments in a pipeline. \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md new file mode 100644 index 0000000..add7e1d --- /dev/null +++ b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md @@ -0,0 +1,65 @@ +### Non-functional requirements + +As an enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure +that no major impediments are found in the future. + +#### Security + +Promotions have a couple of activities that requires to drill down in terms of security: + +1. communication of deployment changes via webhook so over the network. +2. to create pull requests, so write access to gitops configuration repo. + +**Security for deployment changes via webhook** + +Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key +will be used for both authentication and authorization. Application teams will be able to specify the key to use within +the pipeline spec as a global value. Key management will be done by the application team. + +Both to simplify user experience for key management and other security configuration will be evolved over time. + +An example to visualise this configuration is shown below. + +```yaml + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + #used for hmac authz - this could change at implementation + secretRef: my-hmac-shared-secret +``` + +**Security for pull requests** + +In order to create a pull request in a configuration repo to action would be mainly required: + +1. To clone the configuration git repo via http or ssh. +2. To create a pull request with promoted changes. + +Both actions would require a secret to use that ends in a combination of possible scenarios to eventually support. +This document assumes the simplest scenario possible which is having a single token for both +cloning via http and to create a pull request. The token will be present as kubernetes secrets and accessible by pipeline controller. + +An example to visualise this configuration is shown below. + +```yaml + promotion: + - name: promote-via-pr + type: pull-request + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR +``` + +#### Scalability + +The initial strategy to scale the solution by number of request, would be vertically by using goroutines. + +#### Reliability + +It will be implemented as part of the business logic of pipeline controller. + +#### Monitoring + +To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need +to enhance default controller metrics with business metrics like `latency of a promotion by application`. \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md new file mode 100644 index 0000000..add7e1d --- /dev/null +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -0,0 +1,65 @@ +### Non-functional requirements + +As an enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure +that no major impediments are found in the future. + +#### Security + +Promotions have a couple of activities that requires to drill down in terms of security: + +1. communication of deployment changes via webhook so over the network. +2. to create pull requests, so write access to gitops configuration repo. + +**Security for deployment changes via webhook** + +Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key +will be used for both authentication and authorization. Application teams will be able to specify the key to use within +the pipeline spec as a global value. Key management will be done by the application team. + +Both to simplify user experience for key management and other security configuration will be evolved over time. + +An example to visualise this configuration is shown below. + +```yaml + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + #used for hmac authz - this could change at implementation + secretRef: my-hmac-shared-secret +``` + +**Security for pull requests** + +In order to create a pull request in a configuration repo to action would be mainly required: + +1. To clone the configuration git repo via http or ssh. +2. To create a pull request with promoted changes. + +Both actions would require a secret to use that ends in a combination of possible scenarios to eventually support. +This document assumes the simplest scenario possible which is having a single token for both +cloning via http and to create a pull request. The token will be present as kubernetes secrets and accessible by pipeline controller. + +An example to visualise this configuration is shown below. + +```yaml + promotion: + - name: promote-via-pr + type: pull-request + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR +``` + +#### Scalability + +The initial strategy to scale the solution by number of request, would be vertically by using goroutines. + +#### Reliability + +It will be implemented as part of the business logic of pipeline controller. + +#### Monitoring + +To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need +to enhance default controller metrics with business metrics like `latency of a promotion by application`. \ No newline at end of file From 84c130fb38f960680de51bf059e5d5d5f9b4c198 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 12 Oct 2022 16:37:37 +0100 Subject: [PATCH 30/51] execute promotion document reviewed --- .../detect-deployment-changes.md | 20 ++++ .../determine-promotion-needs.md | 4 + .../execute-promotion.md | 98 ++++++++++++++----- 3 files changed, 99 insertions(+), 23 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index b26b9d0..a2d86cf 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -166,6 +166,26 @@ The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows f by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". +#### Security + +Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key +will be used for both authentication and authorization. Application teams will be able to specify the key to use within +the pipeline spec as a global value. Key management will be done by the application team. + +Both to simplify user experience for key management and other security configuration will be evolved over time. + +An example to visualise this configuration is shown below. + +```yaml + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + #used for hmac authz - this could change at implementation + secretRef: my-hmac-shared-secret +``` + + ## Alternatives ### Watchers diff --git a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md index add7e1d..978259b 100644 --- a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md +++ b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md @@ -1,3 +1,7 @@ +### Determine whether a promotion is needed + + + ### Non-functional requirements As an enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index add7e1d..d34d02f 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -1,35 +1,46 @@ -### Non-functional requirements - -As an enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure -that no major impediments are found in the future. +# To execute the promotion -#### Security +It is the part of the solution whose is end goal is to execute the promotion logic like raise a PR, +call an external webhook, etc ... -Promotions have a couple of activities that requires to drill down in terms of security: +This document aims to look in deeper detail to this section by promotion task. -1. communication of deployment changes via webhook so over the network. -2. to create pull requests, so write access to gitops configuration repo. +Currently, designed supported promotion tasks are -**Security for deployment changes via webhook** +- Create a PR: creates a PR indicating the promotion of an application in a git configuration repo. +- Call a webhook: calls a webhook to delegate the promotion action to an external system. -Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key -will be used for both authentication and authorization. Application teams will be able to specify the key to use within -the pipeline spec as a global value. Key management will be done by the application team. +## Create a PR -Both to simplify user experience for key management and other security configuration will be evolved over time. +It creates a PR indicating the promotion of an application in a git configuration repo. -An example to visualise this configuration is shown below. +An example of this promotion task looks like ```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: podinfo + namespace: default +spec: appRef: apiVersion: helm.toolkit.fluxcd.io/v2beta1 kind: HelmRelease name: podinfo - #used for hmac authz - this could change at implementation - secretRef: my-hmac-shared-secret + promotion: + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR + environments: + - name: dev + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: dev ``` - -**Security for pull requests** +### Security In order to create a pull request in a configuration repo to action would be mainly required: @@ -44,13 +55,54 @@ An example to visualise this configuration is shown below. ```yaml promotion: - - name: promote-via-pr - type: pull-request - url: https://github.com/organisation/gitops-configuration-monorepo.git - branch: main - secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR +``` +## Call a webhook + +It calls a webhook to delegate the promotion action to an external system. An example of this promotion task looks like + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: podinfo + namespace: default +spec: + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + promotion: + webhook: + url: https://my-jenkins.prod/webhooks/XoLZfgK + secretRef: my-jenkins-promotion-secret #secretontains the github token to clone and create PR + environments: + - name: dev + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: dev ``` +### Security + +For the `webhook` promotion step we follow the same configuration as flux [notification controller provider](https://fluxcd.io/flux/components/notification/provider/#generic-webhook) +where the secret contains + +``` + // Secret reference containing the provider details, valid key names are: address, proxy, token, headers (YAML encoded) + // +optional +``` + +### Non-functional requirements + +Each promotion task has its security considerations defined. Other non-functional requirements will be understood in this +section. + #### Scalability The initial strategy to scale the solution by number of request, would be vertically by using goroutines. From 3c8807d7be0cc11af9b42f9924dd219879194415 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 08:54:44 +0100 Subject: [PATCH 31/51] detect deployment changes refactored --- docs/rfcs/0003-pipelines-promotion/README.md | 85 ++++- .../detect-deployment-changes.md | 298 ++++++------------ .../determine-promotion-needs.md | 68 +--- .../execute-promotion.md | 6 +- 4 files changed, 169 insertions(+), 288 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index abf2f64..701144c 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -65,15 +65,15 @@ With three main activities ### Detect deployment changes -The solution leverages [flux native notification capabilities](https://fluxcd.io/flux/components/notification/) for this responsibility. +The solution leverages [flux native notification capabilities](https://fluxcd.io/flux/components/notification/) for this responsibility. +Notification controllers in leaf clusters would notify of deployment events to the management cluster via +a deployment webhook. The management cluster will ingest and validate these deployment events. A deeper look into this part of the solution could be found [here](detect-deployment-changes.md). ### Determine whether a promotion is needed This responsibility is assumed by the `pipeline controller` running in the management cluster that: -- would expose a webhook to ingest deployment change events. -- process concurrently the deployment events. - determine whether at the back of the event and a pipeline definition, a promotion is required. A deeper look into this part of the solution could be found [here](determine-promotion-needs.md). @@ -105,14 +105,76 @@ On the flip side, the solution has the following constraints: ## Alternatives -Other alternatives solutions have been discovered and discussed. They difference among them is around +This solution is the result of two different alternative evaluations: +1. Alternatives to detect deployment changes. +2. Alternatives to process and execute promotions. + +### Alternatives to detect deployment changes. + +This approach suggests the creation of [kubernetes watchers](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes) +per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes +and take an action to start the next promotion based on the Pipeline definition. + +[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) + +#### Sequence diagram + +```mermaid + sequenceDiagram + actor U as operator + U->>+API Server: creates Pipeline + participant PC as Pipeline Controller + participant PS as Promotion Strategy + API Server->>+PC: notifies + participant dt1 as dev/target 1 + + rect rgb(67, 207, 250) + note right of PC: setup phase + note right of PC: pipelines.wego.weave.works/name<br/>pipelines.wego.weave.works/env<br/>pipelines.wego.weave.works/target + PC->>+dt1: label AppRef with metadata + participant dt2 as dev/target 2 + PC->>+dt2: label AppRef with metadata + participant pt1 as prod/target 1 + PC->>+pt1: label AppRef with metadata + end + + rect rgb(50, 227, 221) + note right of PC: promotion phase + PC-->>+dt1: watches HelmRelease and Kustomizations changes + PC-->>+dt2: watches HelmRelease and Kustomizations changes + PC-->>+pt1: watches HelmRelease and Kustomizations changes + end + + + dt1->>+PC: update events from AppRef + PC ->>PC: filter upgrade events + PC ->>PC: extract metadata + PC->>+PS: kicks off + ``` + +#### Evaluation + +**Advantages** + +1. Plug n play: no further configurations or setup is needed to get updates. +1. Simple authentication: No need to worry about who triggered the event, since we are talking directly with the target. + +**Disadvantages and Mitigations** + +1. Requires Flux on all leaf clusters. +2. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. +3. There is no way to kick off promotions externally + +### Alternatives to process and execute promotions. + +They difference among them is around the component serving the promotion logic, therefore the alternatives names are based on it. - Alternative A: weave gitops backend - Alternative B: weave gitops api + pipeline controller + promotion executor - Alternative C: promotions service (new service) -### Alternative A: weave gitops backend +#### Alternative A: weave gitops backend ```mermaid sequenceDiagram @@ -148,7 +210,7 @@ are fulfilled within weave gitops backend app. - Extends the api layer with rest api so it would require to manage both grpc and rest apis that would increase maintainability costs. -### Alternative B: weave gitops api + pipeline controller + promotion executor +#### Alternative B: weave gitops api + pipeline controller + promotion executor ```mermaid sequenceDiagram @@ -186,7 +248,7 @@ This solution is different from `pipeline controller` in that the three responsi - To extract the promotion execution logic into an external component, would require to also create a management layer between pipeline controller to the execution layer. -### Alternative C: promotions service +#### Alternative C: promotions service This solution would be to create a new component with the promotions responsibility. @@ -233,11 +295,10 @@ spec: #used for hmac authz - this could change at implementation secretRef: my-hmac-shared-secret promotion: - - name: promote-via-pr - type: pull-request - url: https://github.com/organisation/gitops-configuration-monorepo.git - branch: main - secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR environments: - name: dev targets: diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index a2d86cf..c947109 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -1,107 +1,62 @@ -# RFC-0003 How to detect deployment changes and to notify for pipeline promotions +# Detect deployment changes +This document looks in a bit more detail to the part of the solution around detecting or notifying deployment changes. +It is the part of the promotions solution described in the diagram. -<!-- -The title must be short and descriptive. ---> - -**Status:** provisional - -<!-- -Status represents the current state of the RFC. -Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejected`, `withdrawn`, or `replaced`. ---> - -**Creation date:** 2022-10-05 - -**Last update:** 2022-10-05 - -## Summary - -Given a continuous delivery pipeline, the application goes via different environments in its way to production. We -need an action to sign the intent of deploying an application between environments. That concept is generally known as a -promotion. Current pipelines in weave gitops does not support promotion. This RFC looks at different designs for notifying -that a deployment has happened in order to trigger a promotion (if needed). - -## Terminology - -- **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed. -- **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline. -For example promote stating to production would attempt to deploy an application existing in staging environment to production environment. -- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. -- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. -- **Application**: A Helm Release. - -## Motivation - -Given a continuous delivery pipeline, the application goes via different environments in its way to production. We -need an action to sign the intent of deploying an application between environments. That concept is generally known as a -promotion. Current pipelines in weave gitops does not support promotion. This RFC looks at different designs for notifying -that a deployment has happened in order to trigger a promotion (if needed). - -### Goals - -<!-- -List the specific goals of this RFC. What is it trying to achieve? How will we -know that this has succeeded? ---> - -- Discover different solutions within weave gitops that would allow to solve the problem of how to detect that -a deployment pipeline has changed. -- Recommend the one that seems better suited for the role. - -### Non-Goals +```mermaid + sequenceDiagram + participant F as Flux + participant LC as Notification Controller (Leaf) + F->>LC: deploy helm release + LC->>pc: send deployment change event + participant pc as Pipeline Controller (Managment) + pc->>pc: authz and validate event +``` -<!-- -What is out of scope for this RFC? Listing non-goals helps to focus discussion -and make progress. ---> -- Anything related to processing the deployment notification. +In order to notify deployment changes, we leverage [flux native notification capabilities](https://fluxcd.io/flux/components/notification/). +Notification controllers in leaf clusters notify of deployment events to the management cluster via +a deployment webhook. The management cluster will receive, authorise and validate these events. -## Proposal +## Sending deployment events -This approach suggests the use of Flux [notification controller](https://fluxcd.io/flux/components/notification/) running on the remote cluster. An [alert](https://fluxcd.io/flux/components/notification/alert/) / [provider](https://fluxcd.io/flux/components/notification/provider/) -would be setup to call a webhook running on the management cluster to notify a Helm release change in a remote cluster. - -[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) - -### Sequence diagram - -```mermaid - sequenceDiagram - actor U as operator - U->>+API Server: creates Pipeline ns1/p1 - participant PC as Pipeline Controller - participant PS as Promotion Strategy - API Server->>+PC: notifies - participant dt1 as dev/target 1 - rect rgb(67, 207, 250) - note right of PC: alerting setup phase - PC->>+dt1: creates Provider /ns1/p1/dev - PC->>+dt1: creates Alert - participant dt2 as dev/target 2 - PC->>+dt2: creates Provider /ns1/p1/dev - PC->>+dt2: creates Alert - participant pt1 as prod/target 1 - PC->>+pt1: creates Provider - PC->>+pt1: creates Alert - end - rect rgb(50, 227, 221) - note right of PC: promotion phase - dt1->>+PC: sends Event to /ns1/p1/dev - PC->>+PS: kicks off - PS->>+pt1: promotes app - end - ``` - -### Example Event +would be setup to call a webhook running on the management cluster to notify a Helm release change in a leaf cluster. + +An example of how the resources could look like is found below. + +```yaml +apiVersion: notification.toolkit.fluxcd.io/v1beta1 +kind: Alert +metadata: + name: search + namespace: shopping +spec: + summary: "foobar" + providerRef: + name: weave-gitops + eventSeverity: info + eventSources: + - kind: HelmRelease + name: search +--- +apiVersion: notification.toolkit.fluxcd.io/v1beta1 +kind: Provider +metadata: + name: weave-gitops + namespace: shopping +spec: + type: generic-hmac + address: https://weave-gitops/pipelines/promotions/{namespace}/{name}/{environment} + secretRef: + name: weave-gitops-secret-secret +``` +An example event of how this deployment change event looks like could be found below ```json { "involvedObject": { "kind": "HelmRelease", - "namespace": "flux-system", - "name": "metallb", + "namespace": "shopping", + "name": "search", "uid": "57c3579b-42da-4f27-afc5-8bd7778286e1", "apiVersion": "helm.toolkit.fluxcd.io/v2beta1", "resourceVersion": "155540" @@ -118,58 +73,31 @@ would be setup to call a webhook running on the management cluster to notify a H "reportingInstance": "helm-controller-7cdc7874f8-9qpft" } ``` -### Evaluation - -#### Advantages -1. Simplicity: Uses Flux functionality as much as possible -2. Flexibility: Promotion can be kicked off from external systems by calling the webhook -3. Flexibility: Promotion can be exercised by an external system +## Promotions Webhook -#### Disadvantages and Mitigations - -1. Requires Flux on all leaf clusters. _Mitigations: ?_ -2. Authenticity of events needs to be taken care of. - _Mitigations: add authentication and authorization to the webhook; verify event by reaching out to leaf cluster_ -4. Network connectivity from all leaf clusters to management cluster necessary. - _Mitigations: promotion can be kicked from any external system so if using notification-controller would not work, - an external CI system could trigger promotion instead._ - -#### Known Unknowns - -1. How does p-c set the correct Provider address? - 2. Configuration (user burden) - 3. Automatic determination (might get complicated quick to account for the different environments (with/without Ingress, external LB, ...) -2. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' - API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). - Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, - (2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. - An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) - and the management cluster's Git repository. - -#### Further Considerations - -##### delivery semantics/failure scenarios recovery for notifications +The endpoint should receive webhook requests to indicate a promotion of an environment. -The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's -only configurable globally with a default of 5m. This might lead to events not being emitted to the webhook. +Each environment of each pipeline has its own webhook URL for triggering a promotion. The path for the URL +looks like -notification-controller has [at-most once delivery semantics](https://github.com/fluxcd/notification-controller/tree/main/docs/spec#events-dispatching-1): - -> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for -> connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry -> sending an alert for four times with an exponential backoff of maximum 30 seconds. +``` +/pipelines/promotions/{namespace}/{name}/{environment} +``` -#### enrichment of events for custom metadata +When a request is received, the handler will look up the environment in the pipeline to: -The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events -by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". +- `authz` the request via hmac. +- `validate` the event promotion. +- `lookup and execute` the promotion actions. +The handler needs to run with it own set of permissions (not user permissions) to be able +to read app versions across environments in a pipeline. -#### Security +## Security -Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key -will be used for both authentication and authorization. Application teams will be able to specify the key to use within +Communications between leaf cluster and management cluster will be protected using [HMAC](https://en.wikipedia.org/wiki/HMAC). +HMAC shared key will be used for both authentication and authorization. Application teams will be able to specify the key to use within the pipeline spec as a global value. Key management will be done by the application team. Both to simplify user experience for key management and other security configuration will be evolved over time. @@ -182,87 +110,41 @@ An example to visualise this configuration is shown below. kind: HelmRelease name: podinfo #used for hmac authz - this could change at implementation - secretRef: my-hmac-shared-secret + secretRef: my-hmac-shared-secret ``` +Further considerations will be added at the back of [this story](https://github.com/weaveworks/pipeline-controller/issues/31) -## Alternatives - -### Watchers - -This approach suggests the creation of [kubernetes watchers](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes) -per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes -and take an action to start the next promotion based on the Pipeline definition. - -[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481) - -#### Sequence diagram +## Delivery semantics/failure scenarios recovery for notifications -```mermaid - sequenceDiagram - actor U as operator - U->>+API Server: creates Pipeline - participant PC as Pipeline Controller - participant PS as Promotion Strategy - API Server->>+PC: notifies - participant dt1 as dev/target 1 - - rect rgb(67, 207, 250) - note right of PC: setup phase - note right of PC: pipelines.wego.weave.works/name<br/>pipelines.wego.weave.works/env<br/>pipelines.wego.weave.works/target - PC->>+dt1: label AppRef with metadata - participant dt2 as dev/target 2 - PC->>+dt2: label AppRef with metadata - participant pt1 as prod/target 1 - PC->>+pt1: label AppRef with metadata - end - - rect rgb(50, 227, 221) - note right of PC: promotion phase - PC-->>+dt1: watches HelmRelease and Kustomizations changes - PC-->>+dt2: watches HelmRelease and Kustomizations changes - PC-->>+pt1: watches HelmRelease and Kustomizations changes - end - - - dt1->>+PC: update events from AppRef - PC ->>PC: filter upgrade events - PC ->>PC: extract metadata - PC->>+PS: kicks off - ``` - -#### Evaluation - -**Advantages** - -1. Plug n play: no further configurations or setup is needed to get updates. -1. Simple authentication: No need to worry about who triggered the event, since we are talking directly with the target. - - -**Disadvantages and Mitigations** - -1. Requires Flux on all leaf clusters. -2. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. -3. There is no way to kick off promotions externally +The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's +only configurable globally with a default of 5m. This might lead to events not being emitted to the webhook. -## Design Details +notification-controller has [at-most once delivery semantics](https://github.com/fluxcd/notification-controller/tree/main/docs/spec#events-dispatching-1): +> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for +> connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry +> sending an alert for four times with an exponential backoff of maximum 30 seconds. -### Promotions Webhook +## Enrichment of events with custom metadata -The endpoint should receive webhook requests to indicate a promotion of an environment. +The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events +by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". -Each environment of each pipeline has its own webhook URL for triggering a promotion: -``` -/pipelines/promotions/{namespace}/{name}/{environment} -``` +## Known-Unknowns -When a request is received, the handler will look up the environment in the pipeline to: +1. How does p-c set the correct Provider address? + 2. Configuration (user burden) + 3. Automatic determination (might get complicated quick to account for the different environments (with/without Ingress, external LB, ...) +2. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' + API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). + Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, + (2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. + An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) + and the management cluster's Git repository. -- `authz` the request via hmac. -- `validate` the promotion. -- `lookup and execute` the promotion actions. +## References -The handler needs to run with it own set of permissions (not user permissions) to be able -to read app versions across environments in a pipeline. \ No newline at end of file +- [Spike Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487) +- [Pipelines v2 epic](https://github.com/weaveworks/weave-gitops-enterprise/issues/1657) diff --git a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md index 978259b..84eac71 100644 --- a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md +++ b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md @@ -1,69 +1,9 @@ ### Determine whether a promotion is needed +It is the part of the solution whose end goal is to execute the promotion logic like raise a PR, +call an external webhook, etc ... +This document aims to look in deeper detail to this section by promotion task. -### Non-functional requirements +Currently, designed supported promotion tasks are -As an enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure -that no major impediments are found in the future. - -#### Security - -Promotions have a couple of activities that requires to drill down in terms of security: - -1. communication of deployment changes via webhook so over the network. -2. to create pull requests, so write access to gitops configuration repo. - -**Security for deployment changes via webhook** - -Communications between leaf cluster and management cluster will be protected using HMAC. HMAC shared key -will be used for both authentication and authorization. Application teams will be able to specify the key to use within -the pipeline spec as a global value. Key management will be done by the application team. - -Both to simplify user experience for key management and other security configuration will be evolved over time. - -An example to visualise this configuration is shown below. - -```yaml - appRef: - apiVersion: helm.toolkit.fluxcd.io/v2beta1 - kind: HelmRelease - name: podinfo - #used for hmac authz - this could change at implementation - secretRef: my-hmac-shared-secret -``` - -**Security for pull requests** - -In order to create a pull request in a configuration repo to action would be mainly required: - -1. To clone the configuration git repo via http or ssh. -2. To create a pull request with promoted changes. - -Both actions would require a secret to use that ends in a combination of possible scenarios to eventually support. -This document assumes the simplest scenario possible which is having a single token for both -cloning via http and to create a pull request. The token will be present as kubernetes secrets and accessible by pipeline controller. - -An example to visualise this configuration is shown below. - -```yaml - promotion: - - name: promote-via-pr - type: pull-request - url: https://github.com/organisation/gitops-configuration-monorepo.git - branch: main - secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR -``` - -#### Scalability - -The initial strategy to scale the solution by number of request, would be vertically by using goroutines. - -#### Reliability - -It will be implemented as part of the business logic of pipeline controller. - -#### Monitoring - -To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need -to enhance default controller metrics with business metrics like `latency of a promotion by application`. \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index d34d02f..b8d3a7b 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -1,9 +1,7 @@ # To execute the promotion -It is the part of the solution whose is end goal is to execute the promotion logic like raise a PR, -call an external webhook, etc ... - -This document aims to look in deeper detail to this section by promotion task. +It is the part of the solution whose end goal is to execute the promotion logic like raise a PR, +call an external webhook, etc. This document aims to look in deeper detail to this section by promotion task. Currently, designed supported promotion tasks are From a43bc5fcf0d021159b4878f9960c73dd5a0c2581 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 09:14:31 +0100 Subject: [PATCH 32/51] reviewed promotion needs section / document --- docs/rfcs/0003-pipelines-promotion/README.md | 5 +- .../detect-deployment-changes.md | 1 - .../determine-promotion-needs.md | 88 +++++++++++++++++-- 3 files changed, 86 insertions(+), 8 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 701144c..b64d392 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -73,8 +73,9 @@ A deeper look into this part of the solution could be found [here](detect-deploy ### Determine whether a promotion is needed -This responsibility is assumed by the `pipeline controller` running in the management cluster that: -- determine whether at the back of the event and a pipeline definition, a promotion is required. +This responsibility is assumed by the `pipeline controller` running in the management cluster that +determines whether, at the back of the deployment event and a pipeline definition, a promotion is required and +initialise the promotion. A deeper look into this part of the solution could be found [here](determine-promotion-needs.md). diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index c947109..2642438 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -131,7 +131,6 @@ notification-controller has [at-most once delivery semantics](https://github.com The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary". - ## Known-Unknowns 1. How does p-c set the correct Provider address? diff --git a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md index 84eac71..12e65fc 100644 --- a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md +++ b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md @@ -1,9 +1,87 @@ -### Determine whether a promotion is needed +# Determine whether a promotion is needed -It is the part of the solution whose end goal is to execute the promotion logic like raise a PR, -call an external webhook, etc ... +This document looks in a bit more detail to the part of the solution around detecting or notifying deployment changes. +It is the part of the promotions solution described in the diagram. -This document aims to look in deeper detail to this section by promotion task. +```mermaid + sequenceDiagram + participant pc as Pipeline Controller (Managment) + pc->>pc: authz and validate event + participant k8s as Kubernetes Api + pc->>k8s: get pipeline + pc->>pc: promotion business logic +``` -Currently, designed supported promotion tasks are +The responsibility of the pipeline controller is to determine whether, at the back of the deployment event and a pipeline definition, a promotion is required and +initialise the promotion. The following elements are required: +1. Context around the last deployment which comes from the [deployment event](detect-deployment-changes.md) +2. The set of pipeline environments which is defined in the pipeline spec. +3. The promotion tasks to apply which is defined in the pipeline spec. + + +At the back of the following pipeline + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: podinfo + namespace: default +spec: + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + promotion: + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + environments: + - name: dev + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: dev + - name: prod + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: prod +``` + +Promotion needs by environment will be determined by the environments field + +```yaml +environments: +- name: dev + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: dev +- name: prod + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: prod +``` + +reviewhere a deployment event coming from `dev` would end up in the need for a promotion while a deployment event coming +from `prod` will not. + +Once determined the need, the promotion strategy to use is specified in the `promotion` field. + +```yaml + promotion: + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main +``` + +In this case, to create a pull request into a git configuration repo. + +More information about promotion strategies could be found in [here](./execute-promotion.md) \ No newline at end of file From 2227e0faa0bf8c9a91b8a5402d4dd2a120b7c49b Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 09:59:25 +0100 Subject: [PATCH 33/51] execute promotions --- docs/rfcs/0003-pipelines-promotion/README.md | 29 ++++++++++++++++++- .../execute-promotion.md | 24 ++------------- 2 files changed, 31 insertions(+), 22 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index b64d392..2eb5088 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -82,10 +82,37 @@ A deeper look into this part of the solution could be found [here](determine-pro ### To execute the promotion Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge -of orchestrating and executing the task according to its configuration. +of orchestrating and executing the promotion. The promotion configuration will be added as part of the pipeline spec. A deeper look into this part of the solution could be found [here](execute-promotion.md). +### Non-functional requirements + +This section does a quick look into non-functional requirements for the solution at glance. + +#### Security + +The solution is secured by design as + +1. Communication between leaf and management clusters are via https channel with endpoint authz via HMAC. +2. Deployment events are validated to reduce the risks of impersonation. +3. Each promotion strategy will have their own security configuration. + +#### Scalability + +The solution is scalable by design as + +- It could horizontally scale by the number of replicas of pipeline controller. +- It could vertically scale by using `goroutines` to concurrently handle promotion requests. + +#### Reliability + +It will be implemented as part of the business logic of pipeline controller. + +#### Monitoring + +To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need +to enhance default controller metrics with business metrics like `latency of a promotion by application`. ### Why this solution diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index b8d3a7b..eb591c4 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -92,24 +92,6 @@ For the `webhook` promotion step we follow the same configuration as flux [notif where the secret contains ``` - // Secret reference containing the provider details, valid key names are: address, proxy, token, headers (YAML encoded) - // +optional -``` - -### Non-functional requirements - -Each promotion task has its security considerations defined. Other non-functional requirements will be understood in this -section. - -#### Scalability - -The initial strategy to scale the solution by number of request, would be vertically by using goroutines. - -#### Reliability - -It will be implemented as part of the business logic of pipeline controller. - -#### Monitoring - -To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need -to enhance default controller metrics with business metrics like `latency of a promotion by application`. \ No newline at end of file + // Secret reference containing the provider details, valid key names are: address, proxy, + // token, headers (YAML encoded) +``` \ No newline at end of file From 8431f05498d4f75d96f22119e60108ed18e03ae8 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 10:17:15 +0100 Subject: [PATCH 34/51] pr suggestions applied (mostly) --- docs/rfcs/0003-pipelines-promotion/README.md | 41 ------------------- .../determine-promotion-needs.md | 2 +- 2 files changed, 1 insertion(+), 42 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 2eb5088..29cfe9f 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -303,47 +303,6 @@ This solution would be to create a new component with the promotions responsibil - Ee would need to create it from scratch. - One more component to manage. -## Design Details - -### Pipeline spec changes for promotions - -In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below - -```yaml -apiVersion: pipelines.weave.works/v1alpha1 -kind: Pipeline -metadata: - name: podinfo - namespace: default -spec: - appRef: - apiVersion: helm.toolkit.fluxcd.io/v2beta1 - kind: HelmRelease - name: podinfo - #used for hmac authz - this could change at implementation - secretRef: my-hmac-shared-secret - promotion: - pullRequest: - url: https://github.com/organisation/gitops-configuration-monorepo.git - branch: main - secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR - environments: - - name: dev - targets: - - namespace: podinfo - clusterRef: - kind: GitopsCluster - name: dev -``` -The promotion field used to capture the promotion tasks for the next environment in the pipeline after a successful deployment has taken place. -Each task will include the following fields: - -- `name`: the task name -- `type`: the task type, either webhook or pull-request -- `url` : the git repository url or the webhook url -- `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request) -- `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook. - ## Implementation History - [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589) \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md index 12e65fc..823a73f 100644 --- a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md +++ b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md @@ -70,7 +70,7 @@ environments: name: prod ``` -reviewhere a deployment event coming from `dev` would end up in the need for a promotion while a deployment event coming +where a deployment event coming from `dev` would end up in the need for a promotion while a deployment event coming from `prod` will not. Once determined the need, the promotion strategy to use is specified in the `promotion` field. From a6897c765251fd35fc6fe476fffdf3b3f33a27bf Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 16:50:47 +0100 Subject: [PATCH 35/51] started with scenarios --- docs/rfcs/0003-pipelines-promotion/README.md | 137 ++++++++++++++++++ .../determine-promotion-needs.md | 73 +++++----- .../execute-promotion.md | 43 +++++- 3 files changed, 210 insertions(+), 43 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 29cfe9f..3414a58 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -303,6 +303,143 @@ This solution would be to create a new component with the promotions responsibil - Ee would need to create it from scratch. - One more component to manage. +## User Stories + +This section shows how the current proposal addresses the different scenarios specified in the product +initiative. It serves as an acceptance of the current design. + +### Promotion for a pipeline with a single deployment target per environment + +The original scenario is specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#0c0d8c38b42b4b1eb8c5fa7ff3a2ac31). + +An example of a pipeline for this scenario is shown below. + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: search + namespace: search +spec: + appRef: + kind: HelmRelease + name: search-helmrelease + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + promotion: + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + environments: + - name: dev + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: dev + namespace: flux-system + - name: prod + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: prod + namespace: flux-system +``` +It is the canonical scenario that the current solutions supports. No particular requirement is found. + + +### Promotion for a pipeline with multiple deployment target per environment + +Original scenario specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#2ffaf6d0bdc144269e39f5a44acb0dc3) + +An example of a pipeline for this scenario is shown below. + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: search-multiple-targets + namespace: search +spec: + appRef: + kind: HelmRelease + name: search-helmrelease + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + promotion: + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + environments: + - name: dev + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: dev + namespace: flux-system + - name: test + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: qa + namespace: flux-system + - namespace: search + clusterRef: + kind: GitopsCluster + name: perf + namespace: flux-system +``` + +### Promotion for a pipeline with multiple deployment target per environment + +Original scenario specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#3ea85277de5543d69a9e19407e69c84b) + +Strategy of promotion on first successful reconciliation + +It is covered by [Promotion between environment will happen when at least one of lower-environment deployment targets has been successfully deployed](determine-promotion-needs.md#promotion-decisions-business-logic) + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: search-multiple-targets + namespace: search +spec: + appRef: + kind: HelmRelease + name: search-helmrelease + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + environments: + - name: test + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: qa + namespace: flux-system + git: https://github.com/my-org/gitops-repo/test-clusters/qa/search #dummy example + - namespace: search + clusterRef: + kind: GitopsCluster + name: perf + namespace: flux-system + git: https://github.com/my-org/gitops-repo/test-clusters/perf/search #dummy example + - name: prod + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: prod + namespace: flux-system + git: https://github.com/my-org/gitops-repo/prod-clusters/prod/search #dummy example +``` + +### Promotion via external process + +Original scenario specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#bd4524a6838742cfa254642c1b42443f) +Which is covered by [Call Webhook promotion strategy](execute-promotion.md#call-a-webhook) + ## Implementation History - [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589) \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md index 823a73f..ae93cf4 100644 --- a/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md +++ b/docs/rfcs/0003-pipelines-promotion/determine-promotion-needs.md @@ -12,15 +12,17 @@ It is the part of the promotions solution described in the diagram. pc->>pc: promotion business logic ``` -The responsibility of the pipeline controller is to determine whether, at the back of the deployment event and a pipeline definition, a promotion is required and -initialise the promotion. The following elements are required: +The responsibility of the pipeline controller is to determine whether, at the back of the deployment event and a pipeline definition, +a promotion is required and initialise the promotion. -1. Context around the last deployment which comes from the [deployment event](detect-deployment-changes.md) -2. The set of pipeline environments which is defined in the pipeline spec. -3. The promotion tasks to apply which is defined in the pipeline spec. +## Input for promotions decisions +The following input elements are required: -At the back of the following pipeline +1. Context around the last deployment which comes from the [deployment event](detect-deployment-changes.md). +2. The set of pipeline `spec.environments` which is defined in the pipeline spec. + +At the back of the following pipeline ```yaml apiVersion: pipelines.weave.works/v1alpha1 @@ -44,44 +46,43 @@ spec: clusterRef: kind: GitopsCluster name: dev - - name: prod + - name: qa targets: - - namespace: podinfo + - namespace: podinfo clusterRef: kind: GitopsCluster - name: prod + name: qa + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: perf + - name: prod + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: prod ``` -Promotion needs by environment will be determined by the environments field +## Promotion decisions business logic -```yaml -environments: -- name: dev - targets: - - namespace: podinfo - clusterRef: - kind: GitopsCluster - name: dev -- name: prod - targets: - - namespace: podinfo - clusterRef: - kind: GitopsCluster - name: prod -``` +Given the previous input, the following requirements will be met for promotions. -where a deployment event coming from `dev` would end up in the need for a promotion while a deployment event coming -from `prod` will not. +1. Promotion tasks are applied to deployment targets. -Once determined the need, the promotion strategy to use is specified in the `promotion` field. +For example, given the previous example, when a deployment to `dev` deployment cluster has been received, +a promotion for environment `qa` will start by executing the promotion strategy `pullRequest` to the two +deployment targets `qa` and `perf` -```yaml - promotion: - pullRequest: - url: https://github.com/organisation/gitops-configuration-monorepo.git - branch: main -``` +2. Promotion between environment will happen when at least one of lower-environment deployment targets has been successfully deployed. + +In the previous example, it means that we just need to wait for either `qa` or `perf` deployment targets +has been successfully deployed (event has been received), in order to start the promotion from environments `qa` to `prod`. + +3. Promotions will happen for all the environments but the last one. -In this case, to create a pull request into a git configuration repo. +Once a deployment event for `prod` has been received, no further promotions will be executed. + +Once determined the need, the promotion strategy to use is specified in the `promotion` field. -More information about promotion strategies could be found in [here](./execute-promotion.md) \ No newline at end of file +Once a promotion is required, next is to execute. More information about promotion strategies could be found in [here](./execute-promotion.md) \ No newline at end of file diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index eb591c4..d258e81 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -3,12 +3,41 @@ It is the part of the solution whose end goal is to execute the promotion logic like raise a PR, call an external webhook, etc. This document aims to look in deeper detail to this section by promotion task. -Currently, designed supported promotion tasks are +## Define a promotion strategy -- Create a PR: creates a PR indicating the promotion of an application in a git configuration repo. -- Call a webhook: calls a webhook to delegate the promotion action to an external system. +A promotion strategy is defined as part of the pipeline spec in the field `spec.promotion` as you could see below. +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: podinfo + namespace: default +spec: + appRef: + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + kind: HelmRelease + name: podinfo + promotion: + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main + secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR + environments: + - name: dev + targets: + - namespace: podinfo + clusterRef: + kind: GitopsCluster + name: dev +``` + +spec.promotion: is an optional field that app teams could use to enhance their pipeline with promotion capabilities provided. +Under promotion, a single promotion strategy could be defined to use for promotions. The available promotions strategies are: + +- Create a PR: define `spec.promotion.pullRequest` in order to create a PR indicating the promotion of an application in a git configuration repo. +- Call a webhook: define `spec.promotion.webhook` in order to call a webhook to delegate the promotion action to an external system. -## Create a PR +### Create a PR It creates a PR indicating the promotion of an application in a git configuration repo. @@ -38,7 +67,7 @@ spec: kind: GitopsCluster name: dev ``` -### Security +#### Security In order to create a pull request in a configuration repo to action would be mainly required: @@ -58,7 +87,7 @@ An example to visualise this configuration is shown below. branch: main secretRef: my-gitops-configuration-monorepo-secret #contains the github token to clone and create PR ``` -## Call a webhook +### Call a webhook It calls a webhook to delegate the promotion action to an external system. An example of this promotion task looks like @@ -86,7 +115,7 @@ spec: name: dev ``` -### Security +#### Security For the `webhook` promotion step we follow the same configuration as flux [notification controller provider](https://fluxcd.io/flux/components/notification/provider/#generic-webhook) where the secret contains From 34e6762cd1de6828baa71bb0e47cf94df12d9b22 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 16:55:39 +0100 Subject: [PATCH 36/51] scenarios covered --- docs/rfcs/0003-pipelines-promotion/README.md | 54 ++++++++++++++++---- 1 file changed, 45 insertions(+), 9 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 3414a58..a6e1508 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -345,7 +345,8 @@ spec: name: prod namespace: flux-system ``` -It is the canonical scenario that the current solutions supports. No particular requirement is found. + +It is covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) ### Promotion for a pipeline with multiple deployment target per environment @@ -391,14 +392,13 @@ spec: namespace: flux-system ``` +It is covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) as +a PR will be created for each test deployment target `qa` and `perf + ### Promotion for a pipeline with multiple deployment target per environment Original scenario specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#3ea85277de5543d69a9e19407e69c84b) -Strategy of promotion on first successful reconciliation - -It is covered by [Promotion between environment will happen when at least one of lower-environment deployment targets has been successfully deployed](determine-promotion-needs.md#promotion-decisions-business-logic) - ```yaml apiVersion: pipelines.weave.works/v1alpha1 kind: Pipeline @@ -410,21 +410,30 @@ spec: kind: HelmRelease name: search-helmrelease apiVersion: helm.toolkit.fluxcd.io/v2beta1 + promotion: + pullRequest: + url: https://github.com/organisation/gitops-configuration-monorepo.git + branch: main environments: + - name: dev + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: dev + namespace: flux-system - name: test targets: - - namespace: search + - namespace: search clusterRef: kind: GitopsCluster name: qa namespace: flux-system - git: https://github.com/my-org/gitops-repo/test-clusters/qa/search #dummy example - namespace: search clusterRef: kind: GitopsCluster name: perf namespace: flux-system - git: https://github.com/my-org/gitops-repo/test-clusters/perf/search #dummy example - name: prod targets: - namespace: search @@ -432,14 +441,41 @@ spec: kind: GitopsCluster name: prod namespace: flux-system - git: https://github.com/my-org/gitops-repo/prod-clusters/prod/search #dummy example ``` +It is covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) as +it will promote to prod as soon as a succeful deployment to either `qa` or `perf` has happened. ### Promotion via external process Original scenario specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#bd4524a6838742cfa254642c1b42443f) + +```yaml +apiVersion: pipelines.weave.works/v1alpha1 +kind: Pipeline +metadata: + name: search-multiple-targets + namespace: search +spec: + appRef: + kind: HelmRelease + name: search-helmrelease + apiVersion: helm.toolkit.fluxcd.io/v2beta1 + promotion: + webhook: + url: https://my-jenkins.prod/webhooks/XoLZfgK + secretRef: my-jenkins-promotion-secret + environments: + - name: dev + targets: + - namespace: search + clusterRef: + kind: GitopsCluster + name: dev + namespace: flux-system +``` Which is covered by [Call Webhook promotion strategy](execute-promotion.md#call-a-webhook) + ## Implementation History - [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589) \ No newline at end of file From d24e0018fbe67f0ffd4b65c8ecfb66231734a2f3 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 17:30:37 +0100 Subject: [PATCH 37/51] scenarios reviewed --- docs/rfcs/0003-pipelines-promotion/README.md | 29 ++++++++++++++------ 1 file changed, 20 insertions(+), 9 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index a6e1508..24039cd 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -305,8 +305,9 @@ This solution would be to create a new component with the promotions responsibil ## User Stories -This section shows how the current proposal addresses the different scenarios specified in the product -initiative. It serves as an acceptance of the current design. +This section shows how the current proposal addresses the different scenarios specified in the [product +initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#5b514ad575544595b1028d73e5b6dd23). +It serves as part of the acceptance of the current design. ### Promotion for a pipeline with a single deployment target per environment @@ -346,7 +347,8 @@ spec: namespace: flux-system ``` -It is covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) +It is covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) where +we want to execute promotions by environment and by deployment target. This is the base scenario. ### Promotion for a pipeline with multiple deployment target per environment @@ -391,14 +393,17 @@ spec: name: perf namespace: flux-system ``` +The particularity of this scenario is that we want to raise a PR for each of the deployment targets that we have. This is +covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) rule #1 -It is covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) as -a PR will be created for each test deployment target `qa` and `perf +>1. Promotion tasks are applied to deployment targets. ### Promotion for a pipeline with multiple deployment target per environment Original scenario specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#3ea85277de5543d69a9e19407e69c84b) +An example of a pipeline representing this scenario could be found + ```yaml apiVersion: pipelines.weave.works/v1alpha1 kind: Pipeline @@ -442,13 +447,21 @@ spec: name: prod namespace: flux-system ``` -It is covered by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) as -it will promote to prod as soon as a succeful deployment to either `qa` or `perf` has happened. +The particularity of this scenario, is that we want to promote to production as soon as a deployment to test +has been successfully happen. This scenario is covered +by [promotion business rules](determine-promotion-needs.md#promotion-decisions-business-logic) rule #2 + +>2. Promotion between environment will happen when at least one of lower-environment deployment targets has been successfully deployed. + +it will promote to prod as soon as a successful deployment to either `qa` or `perf` has happened. ### Promotion via external process Original scenario specified [here](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#bd4524a6838742cfa254642c1b42443f) +This scenario is currently supported by having [Call Webhook promotion strategy](execute-promotion.md#call-a-webhook). +An example of pipeline for this story is shown below. + ```yaml apiVersion: pipelines.weave.works/v1alpha1 kind: Pipeline @@ -473,8 +486,6 @@ spec: name: dev namespace: flux-system ``` -Which is covered by [Call Webhook promotion strategy](execute-promotion.md#call-a-webhook) - ## Implementation History From 234a61d19ab19cfcade8ca83adf93dfd2fc6bb4a Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Thu, 13 Oct 2022 17:43:50 +0100 Subject: [PATCH 38/51] some more definitions --- docs/rfcs/0003-pipelines-promotion/README.md | 29 +++++++++++-------- .../execute-promotion.md | 13 ++++++++- 2 files changed, 29 insertions(+), 13 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 24039cd..b6c0e6e 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -18,6 +18,11 @@ as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeli - **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed. - **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline. For example promote staging to production would attempt to deploy an application existing in staging environment to production environment. +- **Promotion Strategy**: a concrete promotion. for example, **create a pull request** could be a promotion strategy +or promote by calling an external system. +- **Promotion Target**: the entity receiving the action of the promotion. For example, in the context of an strategy `create pull request` +a promotion target will be the configuration git repo. In the example of calling external promotion, for example a jenkins server +could be the promotion target. - **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”. - **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}. - **Application**: A Helm Release. @@ -51,10 +56,10 @@ We propose to use a solution as specified in the following diagram. pc->>pc: authz and validate event participant k8s as Kubernetes Api pc->>k8s: get pipeline - pc->>pc: promotion business logic + pc->>pc: is promotion required participant k8s as Kubernetes Api - pc->>configRepo: raise PR - participant configRepo as Configuration Repo + pc->>promotionTarget: execute promotion strategy + participant promotionTarget as Promotion Target ``` With three main activities @@ -214,10 +219,10 @@ the component serving the promotion logic, therefore the alternatives names are wge->>wge: authz and validate event participant k8s as Kubernetes Api wge->>k8s: get pipeline - wge->>wge: promotion business logic + wge->>wge: is promotion required participant k8s as Kubernetes Api - wge->>configRepo: raise PR - participant configRepo as Configuration Repo + wge->>promotionTarget: execute promotion strategy + participant promotionTarget as Promotion Target ``` This solution is different from `pipeline controller` in that the three responsibilities: @@ -254,8 +259,8 @@ are fulfilled within weave gitops backend app. pc->>pj: create promotion job participant pj as promotion job pj->>pj: promotion business logic - pj->>configRepo: raise PR - participant configRepo as Configuration Repo + pj->>promotionTarget: execute promotion strategy + participant promotionTarget as Promotion Target ``` This solution is different from `pipeline controller` in that the three responsibilities are split @@ -289,11 +294,11 @@ This solution would be to create a new component with the promotions responsibil participant PS as Promotions Svc (Management) PS->>PS: authz and validate event participant k8s as Kubernetes Api - PS->>k8s: get pipeline - PS->>PS: promotion business logic + PS->>k8s: get pipeline + PS->>PS: is promotion required participant k8s as Kubernetes Api - PS->>configRepo: raise PR - participant configRepo as Configuration Repo + PS->>promotionTarget: execute promotion strategy + participant promotionTarget as Promotion Target ``` **Pro** - Easiest to dev against (vs api solution). diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index d258e81..aa3c249 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -1,8 +1,19 @@ # To execute the promotion -It is the part of the solution whose end goal is to execute the promotion logic like raise a PR, +It is the part of the solution whose end goal is to execute a promotion strategy against a promotion target. + +```mermaid + sequenceDiagram + participant pc as Pipeline Controller (Managment) + pc->>promotionTarget: execute promotion strategy + participant promotionTarget as Promotion Target +``` + +Example of promotion strateglike raise a PR, call an external webhook, etc. This document aims to look in deeper detail to this section by promotion task. + + ## Define a promotion strategy A promotion strategy is defined as part of the pipeline spec in the field `spec.promotion` as you could see below. From a87a2c61c1a08adb0937edf63ea900cd3af17a44 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Fri, 14 Oct 2022 08:38:20 +0100 Subject: [PATCH 39/51] hmac security updated --- .../detect-deployment-changes.md | 20 +++++++------------ 1 file changed, 7 insertions(+), 13 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index 2642438..d4adc73 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -97,23 +97,17 @@ to read app versions across environments in a pipeline. ## Security Communications between leaf cluster and management cluster will be protected using [HMAC](https://en.wikipedia.org/wiki/HMAC). -HMAC shared key will be used for both authentication and authorization. Application teams will be able to specify the key to use within -the pipeline spec as a global value. Key management will be done by the application team. +HMAC shared key will be used for both authentication and authorization. -Both to simplify user experience for key management and other security configuration will be evolved over time. +At the back [this story](https://github.com/weaveworks/pipeline-controller/issues/31) this section would need to be updated +but current design guidelines state that: -An example to visualise this configuration is shown below. +- Application teams will be able to specify the key to use within the pipeline spec as a global value via a secretRef. +- Key management will be done manually by the application team. -```yaml - appRef: - apiVersion: helm.toolkit.fluxcd.io/v2beta1 - kind: HelmRelease - name: podinfo - #used for hmac authz - this could change at implementation - secretRef: my-hmac-shared-secret -``` +This approach puts a known operational overhead for the application team at this stage. The experience will be +simplified over time by automation to reduce maintenance costs. -Further considerations will be added at the back of [this story](https://github.com/weaveworks/pipeline-controller/issues/31) ## Delivery semantics/failure scenarios recovery for notifications From 712b4ae079c734ee61f609a71a2e1b23aa09907f Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Fri, 14 Oct 2022 08:55:42 +0100 Subject: [PATCH 40/51] updated reliability section --- docs/rfcs/0003-pipelines-promotion/README.md | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index b6c0e6e..c315d10 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -112,7 +112,9 @@ The solution is scalable by design as #### Reliability -It will be implemented as part of the business logic of pipeline controller. +Pipeline controller will need to implement the fault tolerance and reliability features within its business logic per +promotion strategy. For example, in the context of opening a pr against github, it will require to manage retries to +recover from api rate limiting. #### Monitoring From 4f651ba0f855bf4172ab684d84a0b9ee34196b1a Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Fri, 14 Oct 2022 09:17:45 +0100 Subject: [PATCH 41/51] strategies reviewed --- .../execute-promotion.md | 26 ++++++++++++------- 1 file changed, 17 insertions(+), 9 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index aa3c249..3402214 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -8,15 +8,12 @@ It is the part of the solution whose end goal is to execute a promotion strategy pc->>promotionTarget: execute promotion strategy participant promotionTarget as Promotion Target ``` - -Example of promotion strateglike raise a PR, -call an external webhook, etc. This document aims to look in deeper detail to this section by promotion task. - - +An example of promotion strategy could be opening a pull request against a configuration repo. This document looks +deeper details to this part of the solution. ## Define a promotion strategy -A promotion strategy is defined as part of the pipeline spec in the field `spec.promotion` as you could see below. +A promotion strategy is defined as part of the pipeline spec in the field `spec.promotion`. ```yaml apiVersion: pipelines.weave.works/v1alpha1 kind: Pipeline @@ -42,11 +39,12 @@ spec: name: dev ``` -spec.promotion: is an optional field that app teams could use to enhance their pipeline with promotion capabilities provided. +- `spec.promotion`: is an optional field that app teams could use to enhance their pipeline with promotion capabilities provided. + Under promotion, a single promotion strategy could be defined to use for promotions. The available promotions strategies are: -- Create a PR: define `spec.promotion.pullRequest` in order to create a PR indicating the promotion of an application in a git configuration repo. -- Call a webhook: define `spec.promotion.webhook` in order to call a webhook to delegate the promotion action to an external system. +- `spec.promotion.pullRequest`: to promote by creating a pull request in a configuration repo. +- `spec.promotion.webhook`: to promote by calling an external system that will be in charge of the promotion logic. ### Create a PR @@ -78,6 +76,12 @@ spec: kind: GitopsCluster name: dev ``` +where pullRequest configuration has + +- `url`: https URL for the git repo to clone. +- `branch`: git branch to do the promotion. +- `secretRef`: secret containing the tokens or keys required to clone and create the PR against the provider. See security below for more details. + #### Security In order to create a pull request in a configuration repo to action would be mainly required: @@ -125,6 +129,10 @@ spec: kind: GitopsCluster name: dev ``` +where webhook configuration has + +- `url`: URL to post the promotion event. +- `secretRef`: a kubernetes secret with different configuration settings to use for sending the event. See security below for more details. #### Security From c2756e4d2ccdb8126762d5c1f469d8a127b3893f Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Mon, 24 Oct 2022 16:27:11 +0100 Subject: [PATCH 42/51] added modularity reason --- docs/adrs/0013-pipelines-promotions.md | 1 + 1 file changed, 1 insertion(+) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index 8a033eb..12ff166 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -34,6 +34,7 @@ The `pipeline controller` solution has been chosen over other alternatives (see - It follows [notification controller pattern](https://fluxcd.io/flux/guides/webhook-receivers/#expose-the-webhook-receiver). - It is easier to develop over other alternatives. - It keeps split user-experience and machine-experience apis. +- It provides reasonable modularity for the feature. On the flip side, the solution has the following constraints: From 79f4bfd14c45e1dea2a77d72ff225feab730f72d Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eneko=20Fern=C3=A1ndez?= <12957664+enekofb@users.noreply.github.com> Date: Tue, 25 Oct 2022 12:05:01 +0100 Subject: [PATCH 43/51] Update docs/rfcs/0003-pipelines-promotion/execute-promotion.md Co-authored-by: Yiannis <8741709+yiannistri@users.noreply.github.com> --- docs/rfcs/0003-pipelines-promotion/execute-promotion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index 3402214..4b9ad48 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -4,7 +4,7 @@ It is the part of the solution whose end goal is to execute a promotion strategy ```mermaid sequenceDiagram - participant pc as Pipeline Controller (Managment) + participant pc as Pipeline Controller (Management) pc->>promotionTarget: execute promotion strategy participant promotionTarget as Promotion Target ``` From d5bc89439883615387081fb2a837198c3f87f2e0 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eneko=20Fern=C3=A1ndez?= <12957664+enekofb@users.noreply.github.com> Date: Tue, 25 Oct 2022 12:05:13 +0100 Subject: [PATCH 44/51] Update docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md Co-authored-by: Yiannis <8741709+yiannistri@users.noreply.github.com> --- docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md index d4adc73..6dff90b 100644 --- a/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md +++ b/docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md @@ -8,7 +8,7 @@ It is the part of the promotions solution described in the diagram. participant LC as Notification Controller (Leaf) F->>LC: deploy helm release LC->>pc: send deployment change event - participant pc as Pipeline Controller (Managment) + participant pc as Pipeline Controller (Management) pc->>pc: authz and validate event ``` From 2623f380a51236999ddae17be92dd97deb3130eb Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Tue, 25 Oct 2022 12:10:16 +0100 Subject: [PATCH 45/51] changed comment --- docs/rfcs/0003-pipelines-promotion/execute-promotion.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md index 4b9ad48..4d894a0 100644 --- a/docs/rfcs/0003-pipelines-promotion/execute-promotion.md +++ b/docs/rfcs/0003-pipelines-promotion/execute-promotion.md @@ -120,7 +120,7 @@ spec: promotion: webhook: url: https://my-jenkins.prod/webhooks/XoLZfgK - secretRef: my-jenkins-promotion-secret #secretontains the github token to clone and create PR + secretRef: my-jenkins-promotion-secret #contains the secrets to call jenkins webhook environments: - name: dev targets: From cdc7f878213bc63280354ca8e88b3e87ef396672 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eneko=20Fern=C3=A1ndez?= <12957664+enekofb@users.noreply.github.com> Date: Wed, 26 Oct 2022 16:53:53 +0100 Subject: [PATCH 46/51] Update docs/rfcs/0003-pipelines-promotion/README.md Co-authored-by: Max Jonas Werner <makkes@users.noreply.github.com> --- docs/rfcs/0003-pipelines-promotion/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index c315d10..763c8d4 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -11,7 +11,7 @@ Given a continuous delivery pipeline, the application goes via different environments on its way to production. We need an action to signal the intent of deploying an application between environments. That concept is generally known as a promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap -as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) +as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258). ## Terminology From 5d56f99d2aeb1f7cd2b295d0354f98274d0f28ac Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eneko=20Fern=C3=A1ndez?= <12957664+enekofb@users.noreply.github.com> Date: Wed, 26 Oct 2022 16:54:11 +0100 Subject: [PATCH 47/51] Update docs/rfcs/0003-pipelines-promotion/README.md Co-authored-by: Max Jonas Werner <makkes@users.noreply.github.com> --- docs/rfcs/0003-pipelines-promotion/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 763c8d4..97242ec 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -30,7 +30,7 @@ could be the promotion target. ## Motivation Given a continuous delivery pipeline, the application goes via different environments in its way to production. We -need an action to sign the intent of deploying an application between environments. That concept is generally known as a +need an action to signal the intent of deploying an application between environments. That concept is generally known as a promotion. Current pipelines in weave gitops does not support promotion. ### Goals From cf766abb161ac7ce3c64c8e5463b6e1475020bb5 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eneko=20Fern=C3=A1ndez?= <12957664+enekofb@users.noreply.github.com> Date: Wed, 26 Oct 2022 16:54:21 +0100 Subject: [PATCH 48/51] Update docs/rfcs/0003-pipelines-promotion/README.md Co-authored-by: Max Jonas Werner <makkes@users.noreply.github.com> --- docs/rfcs/0003-pipelines-promotion/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 97242ec..7a0f035 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -200,7 +200,7 @@ and take an action to start the next promotion based on the Pipeline definition. 2. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues. 3. There is no way to kick off promotions externally -### Alternatives to process and execute promotions. +### Alternatives to process and execute promotions They difference among them is around the component serving the promotion logic, therefore the alternatives names are based on it. From 42aba6067ed4195d53598b00216d072c05a49998 Mon Sep 17 00:00:00 2001 From: =?UTF-8?q?Eneko=20Fern=C3=A1ndez?= <12957664+enekofb@users.noreply.github.com> Date: Wed, 26 Oct 2022 16:54:33 +0100 Subject: [PATCH 49/51] Update docs/rfcs/0003-pipelines-promotion/README.md Co-authored-by: Max Jonas Werner <makkes@users.noreply.github.com> --- docs/rfcs/0003-pipelines-promotion/README.md | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index 7a0f035..aa4de3e 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -144,7 +144,7 @@ This solution is the result of two different alternative evaluations: 1. Alternatives to detect deployment changes. 2. Alternatives to process and execute promotions. -### Alternatives to detect deployment changes. +### Alternatives to detect deployment changes This approach suggests the creation of [kubernetes watchers](https://kubernetes.io/docs/reference/using-api/api-concepts/#efficient-detection-of-changes) per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes From 8a2b403c7ed4ea2e72ac1d0374a790537057055b Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Wed, 26 Oct 2022 17:08:10 +0100 Subject: [PATCH 50/51] pr comments --- docs/rfcs/0003-pipelines-promotion/README.md | 8 ++++---- 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/docs/rfcs/0003-pipelines-promotion/README.md b/docs/rfcs/0003-pipelines-promotion/README.md index aa4de3e..e83fef0 100644 --- a/docs/rfcs/0003-pipelines-promotion/README.md +++ b/docs/rfcs/0003-pipelines-promotion/README.md @@ -1,17 +1,17 @@ # RFC-0003 Pipeline promotions -**Status:** provisional +**Status:** implementable -**Creation date:** 2022-10-xx +**Creation date:** 2022-10-26 -**Last update:** 2022-10-xx +**Last update:** 2022-10-26 ## Summary Given a continuous delivery pipeline, the application goes via different environments on its way to production. We need an action to signal the intent of deploying an application between environments. That concept is generally known as a promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap -as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258). +as specified in the [product initiative](#user-stories). ## Terminology From f46679a6bce4ce5ea4d5c06f3deeef272c81cfd8 Mon Sep 17 00:00:00 2001 From: Eneko Fernandez <eneko@weave.works> Date: Fri, 28 Oct 2022 13:29:16 +0100 Subject: [PATCH 51/51] changed status to merge --- docs/adrs/0013-pipelines-promotions.md | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/docs/adrs/0013-pipelines-promotions.md b/docs/adrs/0013-pipelines-promotions.md index 12ff166..e2e715b 100644 --- a/docs/adrs/0013-pipelines-promotions.md +++ b/docs/adrs/0013-pipelines-promotions.md @@ -1,7 +1,8 @@ # 13. Pipelines Promotions ## Status -Proposed + +Accepted ## Context As part of Weave GitOps Enterprise, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where