adding adr for change detection

weaveworks · Sep 26, 2022 · bb0dc4a · bb0dc4a
1 parent 1e220d8
commit bb0dc4a
Show file tree

Hide file tree

Showing 2 changed files with 240 additions and 0 deletions.
diff --git a/docs/adrs/0013-pipelines-change-detection.md b/docs/adrs/0013-pipelines-change-detection.md
@@ -0,0 +1,28 @@
+# 12. Pipeline Promotions
+
+## Status
+
+Discovery in progress. This ADR currently is only to raise a draft PR to gather feedback.
+
+## Context
+
+As part of weave gitops, Sunglow team is working on deliverying Continuous Delivery Pipelines based in the following
+[initiative](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a)
+
+First iteration (v0.1) has covered the ability to view an application being deployed across different environments.
+
+Current iteration (v0.2) has a outcome to enable applicatio promotions by using weave gitops enteprrise
+
+Two main approaches are under discussion
+
+- [Promotions via Resource Watching](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481)
+- [Promotions via Webhooks](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487)
+
+
+## Decision
+
+TBA
+
+## Consequences
+
+TBA
diff --git a/docs/rfcs/0003-change-detection/README.md b/docs/rfcs/0003-change-detection/README.md
@@ -0,0 +1,212 @@
+# RFC-0003 Comparison of approaches to detect changes that trigger promotions
+
+<!--
+The title must be short and descriptive.
+-->
+
+**Status:** provisional
+
+<!--
+Status represents the current state of the RFC.
+Must be one of `provisional`, `implementable`, `implemented`, `deferred`, `rejected`, `withdrawn`, or `replaced`.
+-->
+
+**Creation date:** 2022-09-21
+
+**Last update:** 2022-09-21
+
+## Summary
+
+<!--
+One paragraph explanation of the proposed feature or enhancement.
+-->
+
+## Motivation
+
+<!--
+This section is for explicitly listing the motivation, goals, and non-goals of
+this RFC. Describe why the change is important and the benefits to users.
+-->
+
+### Terminology
+
+- **CD Pipeline**: A CD Pipeline declares a series of environments through which a given application is expected to be deployed.
+- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”.
+- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}.
+- **Application**: A Helm Release or an Image. 
+- **Progressive Delivery / Rollout**: Updates to an application flow from one environment to the next. This has no relationship to Flagger
+
+### Goals
+
+<!--
+List the specific goals of this RFC. What is it trying to achieve? How will we
+know that this has succeeded?
+-->
+
+### Non-Goals
+
+<!--
+What is out of scope for this RFC? Listing non-goals helps to focus discussion
+and make progress.
+-->
+
+
+## Comparison
+
+<!--
+This is where we get down to the specifics of what the proposal actually is.
+This should have enough detail that reviewers can understand exactly what
+you're proposing, but should not include things like API designs or
+implementation.
+
+If the RFC goal is to document best practices,
+then this section can be replaced with the the actual documentation.
+-->
+
+
+### Watcher approach
+
+This approach suggests the creation of a watcher per remote cluster. Each watcher would get notified whenever a Helm release in the remote cluster changes and take an action to start the next promotion based on the Pipeline definition.
+
+[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1481)
+
+#### Sequence diagram
+
+```mermaid
+   sequenceDiagram
+    actor U as operator
+    U->>+API Server: creates Pipeline
+    participant PC as Pipeline Controller
+    participant PS as Promotion Strategy
+    API Server->>+PC: notifies
+    participant dt1 as dev/target 1
+    
+    rect rgb(67, 207, 250)
+    note right of PC: setup phase
+    note right of PC: pipelines.wego.weave.works/name<br/>pipelines.wego.weave.works/env<br/>pipelines.wego.weave.works/target
+    PC->>+dt1: label AppRef with metadata
+    participant dt2 as dev/target 2
+    PC->>+dt2: label AppRef with metadata
+    participant pt1 as prod/target 1
+    PC->>+pt1: label AppRef with metadata
+    end
+   
+    rect rgb(50, 227, 221)
+    note right of PC: promotion phase
+    PC-->>+dt1: watches HelmRelease and Kustomizations changes
+    PC-->>+dt2: watches HelmRelease and Kustomizations changes
+    PC-->>+pt1: watches HelmRelease and Kustomizations changes
+    end
+
+
+    dt1->>+PC: update events from AppRef
+    PC ->>PC: filter upgrade events 
+    PC ->>PC: extract metadata 
+    PC->>+PS: kicks off
+  ```
+
+#### Advantages
+
+1. Plug n play: no further configurations or setup is needed to get updates.
+1. Simple authentication: No need to worry about who triggered the event, since we are talking directly with the target.
+
+
+#### Disadvantages and Mitigations
+
+1. Requires Flux on all leaf clusters.
+1. Scalability is unclear, we don't know the threshold at which the controller will be able to handle without issues.
+1. There is no way to kick off promotions externally 
+
+### Alert approach
+
+This approach suggests the use of Flux notification controller running on the remote cluster. An alert/provider CR can be setup to call a webhook running on the management cluster to notify the management cluster of a Helm release change in a remote cluster.
+
+[Tracking issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1487)
+
+
+#### Sequence diagram
+
+```mermaid
+  sequenceDiagram
+    actor U as operator
+    U->>+API Server: creates Pipeline ns1/p1
+    participant PC as Pipeline Controller
+    participant PS as Promotion Strategy
+    API Server->>+PC: notifies
+    participant dt1 as dev/target 1
+    rect rgb(67, 207, 250)
+    note right of PC: alerting setup phase
+    PC->>+dt1: creates Provider /ns1/p1/dev
+    PC->>+dt1: creates Alert
+    participant dt2 as dev/target 2
+    PC->>+dt2: creates Provider /ns1/p1/dev
+    PC->>+dt2: creates Alert
+    participant pt1 as prod/target 1
+    PC->>+pt1: creates Provider
+    PC->>+pt1: creates Alert
+    end
+    rect rgb(50, 227, 221)
+    note right of PC: promotion phase
+    dt1->>+PC: sends Event to /ns1/p1/dev
+    PC->>+PS: kicks off
+    PS->>+pt1: promotes app
+    end
+  ```
+
+#### Example Event
+
+```json
+{
+  "involvedObject": {
+    "kind": "HelmRelease",
+    "namespace": "flux-system",
+    "name": "metallb",
+    "uid": "57c3579b-42da-4f27-afc5-8bd7778286e1",
+    "apiVersion": "helm.toolkit.fluxcd.io/v2beta1",
+    "resourceVersion": "155540"
+  },
+  "severity": "info",
+  "timestamp": "2022-09-13T16:01:01Z",
+  "message": "Helm upgrade succeeded",
+  "reason": "info",
+  "metadata": {
+    "revision": "0.13.4",
+    "summary": "foobar"
+  },
+  "reportingController": "helm-controller",
+  "reportingInstance": "helm-controller-7cdc7874f8-9qpft"
+}
+```
+
+#### Advantages
+
+1. Simplicity: Uses Flux functionality as much as possible
+1. Flexiblility: Promotion can be kicked off from external systems by calling the webhook
+1. Flexibility: Promotion can be exercised by an external system
+
+#### Disadvantages and Mitigations
+
+1. Requires Flux on all leaf clusters. _Mitigations: ?_
+1. Authenticity of events needs to be taken care of. _Mitigations: add authentication to webhook; verify event by reaching out to leaf cluster_
+1. Network connectivity from all leaf clusters to management cluster necessary. _Mitigations: promotion can be kicked from any external system so if using notification-controller would not work, an external CI system could trigger promotion instead._
+
+#### Known Unknowns
+
+1. How does p-c set the correct Provider address?
+   2. Configuration (user burden)
+   3. Automatic determination (might get complicated quick to account for the different environments (with/without Ingress, external LB, ...)
+1. How does p-c create the Provider/Alert resources? If it creates them directly by going through the target clusters' API server then it doesn't have a way of making sure they don't get modified/deleted (owner references don't work cross-cluster). Having them be committed to Git can be very complicated as the controller would have to know (1) wich Git repository to commit them to, (2) in wich location to put them, (3) if there's a `kustomization.yaml` that would have to be patched. An alternative could be to use a [remote Kustomization](https://fluxcd.io/flux/components/kustomize/kustomization/#remote-clusters--cluster-api) and the management cluster's Git repository.
+
+#### Further Considerations
+
+##### delivery semantics/failure scenarios recovery for notifications
+
+The notification-controller is using [rate limiting](https://fluxcd.io/flux/components/notification/options/) that's only configurable globally with a default of 5m. This might lead to events not being emitted to the webhook.
+
+notification-controller has [at-most once delivery semantics](https://github.com/fluxcd/notification-controller/tree/main/docs/spec#events-dispatching-1):
+
+> The alert delivery method is at-most once with a timeout of 15 seconds. The controller performs automatic retries for connection errors and 500-range response code. If the webhook receiver returns an error, the controller will retry sending an alert for four times with an exponential backoff of maximum 30 seconds.
+
+#### enrichment of events for custom metadata
+
+The [Alert spec](https://fluxcd.io/flux/components/notification/alert/) allows for custom metadata to be added to events by means of the `.spec.summary` field. The content of this field will be added to the event's `.metadata` map with the key "summary".