Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline promotions designs #76

Merged
merged 53 commits into from
Oct 28, 2022
Merged
Show file tree
Hide file tree
Changes from 12 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
bb0dc4a
adding adr for change detection
enekofb Sep 26, 2022
fb57569
adding adr
enekofb Sep 26, 2022
126df4d
consequences added
enekofb Sep 26, 2022
5ec346c
slicing promotion designs
enekofb Oct 5, 2022
f28fc7b
updated ADR
enekofb Oct 5, 2022
9e41851
updated notification change rfc
enekofb Oct 5, 2022
49860d3
detect deployment changes
enekofb Oct 5, 2022
0a6a5e7
added ADR for promotions solution
enekofb Oct 5, 2022
7c70e6b
unified promotions rfcs into one
enekofb Oct 6, 2022
e2b44a5
rfc promotions completed
enekofb Oct 6, 2022
ad93f0a
section on nfrs
enekofb Oct 6, 2022
d4165b1
RFC and ADR ready for draft
enekofb Oct 6, 2022
ee90b89
adr updated
enekofb Oct 7, 2022
21a4bab
promotions rfc reviewed
enekofb Oct 7, 2022
0ff3a15
change detection rfc updated
enekofb Oct 7, 2022
261354f
completing arguments for pipeline controller
enekofb Oct 10, 2022
104f79f
wording reviewed
enekofb Oct 10, 2022
4d43115
updated ADR justification
enekofb Oct 10, 2022
d532d57
wording from PR review
enekofb Oct 10, 2022
f59cc58
Fix typos
yiannistri Oct 10, 2022
faa409d
Fix typos
yiannistri Oct 10, 2022
50353c8
update ADR - typos + more context
Oct 10, 2022
d7750fb
typos
Oct 10, 2022
bb8d0a0
removed the part of the motivation that doesnt talk on problem state…
enekofb Oct 11, 2022
d5940d9
updated the token section to reflect the supported scenario
enekofb Oct 11, 2022
d53e62f
added cons for api layer
enekofb Oct 11, 2022
472ea7a
Merge remote-tracking branch 'origin/promotions-comparison' into prom…
enekofb Oct 11, 2022
dbbcd59
a bit more wordings around last alertnative
enekofb Oct 11, 2022
c3c0963
Small tweaks
yiannistri Oct 11, 2022
10fe75b
rfc split in: overview doc + deeper detail doc by stage in the path
enekofb Oct 12, 2022
84c130f
execute promotion document reviewed
enekofb Oct 12, 2022
3c8807d
detect deployment changes refactored
enekofb Oct 13, 2022
a43bc5f
reviewed promotion needs section / document
enekofb Oct 13, 2022
2227e0f
execute promotions
enekofb Oct 13, 2022
8431f05
pr suggestions applied (mostly)
enekofb Oct 13, 2022
a5ba247
Merge remote-tracking branch 'origin/promotions-comparison' into prom…
enekofb Oct 13, 2022
a6897c7
started with scenarios
enekofb Oct 13, 2022
34e6762
scenarios covered
enekofb Oct 13, 2022
d24e001
scenarios reviewed
enekofb Oct 13, 2022
234a61d
some more definitions
enekofb Oct 13, 2022
a87a2c6
hmac security updated
enekofb Oct 14, 2022
712b4ae
updated reliability section
enekofb Oct 14, 2022
4f651ba
strategies reviewed
enekofb Oct 14, 2022
c2756e4
added modularity reason
enekofb Oct 24, 2022
79f4bfd
Update docs/rfcs/0003-pipelines-promotion/execute-promotion.md
enekofb Oct 25, 2022
d5bc894
Update docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md
enekofb Oct 25, 2022
2623f38
changed comment
enekofb Oct 25, 2022
cdc7f87
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
5d56f99
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
cf766ab
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
42aba60
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
8a2b403
pr comments
enekofb Oct 26, 2022
f46679a
changed status to merge
enekofb Oct 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
66 changes: 66 additions & 0 deletions docs/adrs/0013-pipelines-promotions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
# 13. Pipelines Promotions

## Status
Proposed

## Context
As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where
yiannistri marked this conversation as resolved.
Show resolved Hide resolved
[first iteration has been delivered](https://docs.gitops.weave.works/docs/next/enterprise/pipelines/intro/index.html)
covering the ability to view an application deployed across different environments.

The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims
to enable promotions between environments.

This ADR records the major decisions taken during its design.

## Decision

### Promotions solution

As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternatives were discussed:

- weave gitops
- pipelines controller
- weave gitops + pipeline controller + promotion executor
- new service
enekofb marked this conversation as resolved.
Show resolved Hide resolved

The `pipeline controller` solution has been chosen over its alternatives (see alternatives section) due to

- it enables promotions.
- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion.
- it is easier to develop over other alternatives.

On the flip side, the solution has the following constraints:

- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api.
- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource.


### Deployment Change

As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each of approaches has associated unknowns.
The major ones are:

- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments
that it would suppose for customers while adopting the solution, as well its security management.
- Watching: how reliable the solution could be as not having existing examples of products using it for watching remote clusters.

We envision weave gitops as needs to be a flexible solution that eventually would need to support both approaches
to accommodate the range of potential enterprises using weave gitops.

In order to start with one of the approaches, we have decided to start by `webhooks` solution due to:

- Allow us to provide promotions for wge customers based on our own promotions capability with better scalability approach.
- Reinforces the vision of weave gitops being a continuum of Flux by using Flux core components, in this context, [notification
controller](https://fluxcd.io/flux/components/notification/), to provide the basic building blocks around deployment notification.

## Consequences

- A path forward for pipelines to deliver promotions capability. Sunglow could deliver promotions based on this approach.
- A risk to manage in the context of customer adoption: the network path opened.
- Sunglow would need to establish the customer feedback loop with SAs/CXs to manage and mitigate the risk once it happens.
- Same for security.
- A scenario further to develop: existing CI scenarios based on the approach. Sunglow would need to use customer feedback to
determine which existing systems are of relevance to provide the integration experience.


284 changes: 284 additions & 0 deletions docs/rfcs/0003-pipelines-promotion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,284 @@
# RFC-0003 Pipeline promotions

**Status:** provisional

**Creation date:** 2022-10

**Last update:** 2022-10-05

## Summary

Given a continuous delivery pipeline, the application goes via different environments, in its way to production. We
need an action to efectively move applications beteween environments. That concept is generally known as a
promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap
as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258)
enekofb marked this conversation as resolved.
Show resolved Hide resolved

## Terminology

- **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed.
- **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline.
For example promote stating to production would attempt to deploy an application existing in staging environment to production environment.
yiannistri marked this conversation as resolved.
Show resolved Hide resolved
- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”.
- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}.
- **Application**: A Helm Release.

## Motivation
yiannistri marked this conversation as resolved.
Show resolved Hide resolved

Given a continuous delivery pipeline, the application goes via different environments, in its way to production. We
need an action to efectively move applications beteween environments. That concept is generally known as a
promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap
as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258)

### Goals

- Design the e2e solution for promotions on weave gitops pipelines.
makkes marked this conversation as resolved.
Show resolved Hide resolved
- Should support the [scenarios identified](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#5b514ad575544595b1028d73e5b6dd23)
enekofb marked this conversation as resolved.
Show resolved Hide resolved
enekofb marked this conversation as resolved.
Show resolved Hide resolved

### Non-Goals

- Anything beyond the scope of promotions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be worth mentioning as the title, summary and motivation already make this clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please expand here on the request?

- Scenarios other than the identified in the product initiative.

## Proposal

The proposed solution architecture is shown below.

//TODO: review diagram
```mermaid
sequenceDiagram
participant F as Flux
participant LC as Leaf Cluster
participant MC as Management Cluster
F->>LC: deploy helm release
LC->>MC: notify deployment via notification controller
MC->>pc: process deployment notification
participant pc as Pipeline Controller
participant k8s as Kubernetes Api
pc->>k8s: get pipeline
pc->>pc: promotion business loic
participant k8s as Kubernetes Api
pc->>configRepo: raise PR
participant configRepo as Configuration Repo
```

With three main responsibilities

1. Notify deployment changes
2. Determine whether a promotion is needed
3. Execute the promotion

### Notify deployment changes

The solution leverages [flux native notification capabilities](https://fluxcd.io/flux/components/notification/) for this responsibility.
An evaluation of different alternatives solutions to this concern could be found [here](detect-deployment-changes.md).

### Determine whether a promotion is needed

This responsibility is assumed by pipeline controller living in the management cluster that
- would expose a webhook to ingest deployment change events.
- process concurrently these requests
- determine whether at the back of the event and a pipeline definition, a promotion is required.

### To execute the promotion

Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge
of orchestrating and executing the task according to its configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "according to its configuration" mean here? Where is that configuration defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detailed a bit more, let me know in case is not clear enough, a suggestion to add


The current solution has been chosen over its alternatives (see alternatives section) due to

- it enables promotions.
- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion.
- it is easier to develop over other alternatives.

On the flip side, the solution has the following constraints:

- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api.
- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource.

### Non-functional requirements

As enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure
that no major impediments are found in the future.

#### Security

Promotions have a couple of activities that requires to drill down in terms of security:

1. communication of deployment changes via webhook so over the network.
2. to create pull requests, so write access to gitops configuration repo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, the promotion flow itself might not be part of this RFC (which I would in fact support) so security concerns around it shouldn't be in this RFC as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored this part of the document to make it more general and moved the pr creation to a promotions strategy document.


**Security for deployment changes via webhook**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, the promotion flow itself might not be part of this RFC (which I would in fact support) so security concerns around it shouldn't be in this RFC as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we should split the security concerns of the solution in a different RFC. As scoped, this RFC aims to design the e2e solution for promotion which sounds sensible to include security concerns.


//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594
enekofb marked this conversation as resolved.
Show resolved Hide resolved
enekofb marked this conversation as resolved.
Show resolved Hide resolved

**Security for pull request creation**

//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594
enekofb marked this conversation as resolved.
Show resolved Hide resolved

#### Scalability

The initial strategy to scale the solution by number of request, would be vertically by using goroutines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing prevents us from scaling horizontally. It's just a webhook. We can scale it up to a million replicas if the need arises in a certain environment. Why is that not mentioned here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i think it is fair adding it.


#### Reliability

It will be implemented as part of the business logic of pipeline controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to expand here with more details


#### Monitoring

To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need
to enhance default controller metrics with business metrics like `latency of a promtion by application`.
yiannistri marked this conversation as resolved.
Show resolved Hide resolved

## Alternatives
makkes marked this conversation as resolved.
Show resolved Hide resolved

Other alternatives solutions have been discovered and discussed

- Alternative A: to use weave gitops api
- Alternative B: create a new service - promotions service
- Alternative C: weave gitops api + pipeline controller + promotion executor

### Alternative A: weave gitops api

```mermaid
sequenceDiagram
participant F as Flux
participant LC as Leaf Cluster
participant MC as Management Cluster
F->>LC: deploy helm release
LC->>MC: notify deployment via notification controller
MC->>WGE: process deployment notification
participant WGE as Weave Gitops Backend
participant k8s as Kubernetes Api
WGE->>k8s: get pipeline
WGE->>WGE: promotion business loic
participant k8s as Kubernetes Api
WGE->>configRepo: raise PR
participant configRepo as Configuration Repo
```

This solution is different from `pipeline controller` in that the three responsibilities

1. Notify deployment changes
2. Determine whether a promotion is needed
3. Execute the promotion

are fulfilled within weave gitops backend app.

**Pro**
- Already setup and *should* be more easily exposed.
- No need to manage other exposed surface, therefore less to secure.
- No need to generate TS client
enekofb marked this conversation as resolved.
Show resolved Hide resolved

**Cons**
- Notifier service account needs permissions for promotion resources.

### Alternative B: weave gitops api + pipeline controller + promotion executor

```mermaid
sequenceDiagram
participant F as Flux
participant LC as Leaf Cluster
participant MC as Management Cluster
F->>LC: deploy helm release
LC->>MC: notify deployment via notification controller
MC->>WGE: process deployment notification
participant WGE as Weave Gitops Backend
participant k8s as Kubernetes Api
WGE->>k8s: write deployment event
participant pc as pipeline controller
k8s->>pc: watch deployment event & pipelines
pc->>pj: create promotion job
participant pj as promotion job
pj->>pj: promotion business logic
pj->>configRepo: raise PR
participant configRepo as Configuration Repo
```

This solution is different from `pipeline controller` in that the three responsibilities are split

1. Notify deployment changes: ingestion is done via weave gitops api.
2. Determine whether a promotion is needed: pipeline controller watches for changes in pipeline.
3. Execute the promotion: extracted to a kubernetes job layer.

**Pro**
- Already setup and *should* be more easily exposed.
- No need to manage other exposed surface, therefore less to secure.
- No need to generate TS client
- Separation of concerns with scalability and fault-tolerance by design

**Cons**
- Needs to write in pipeline resource
- Most complex solution
- Kubernetes jobs not a popular choice
sympatheticmoose marked this conversation as resolved.
Show resolved Hide resolved

### Alternative C: new service called promotions service

```mermaid
sequenceDiagram
participant F as Flux
participant LC as Leaf Cluster
participant MC as Management Cluster
F->>LC: deploy helm release
LC->>MC: notify deployment via notification controller
MC->>PS: process deployment notification
participant PS as Promotions Svc
participant k8s as Kubernetes Api
PS->>k8s: get pipeline
PS->>PS: promotion business loic
participant k8s as Kubernetes Api
PS->>configRepo: raise PR
participant configRepo as Configuration Repo
```
**Pro**
- easiest to dev against

**Cons**
- 1 more component for the team to maintain
- new repo/CI (?)

## Design Details

### Promotions Webhook

//TBA added at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594

### Pipeline spec changes for promotions

In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below

```yaml
apiVersion: pipelines.weave.works/v1alpha1
kind: Pipeline
metadata:
name: podinfo
namespace: default
spec:
appRef:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
name: podinfo
promotion:
- name: promote-via-pr
enekofb marked this conversation as resolved.
Show resolved Hide resolved
type: pull-request
url: git@github.com:organization/repo
branch: main
secretRef: my-other-deployed-secret
environments:
- name: dev
targets:
- namespace: podinfo
clusterRef:
kind: GitopsCluster
name: dev
```
The promotion field used to capture the promotion tasks for the next environment in the pipeline after a successful deployment has taken place.
Each task will include the following fields:

- `name`: the task name
enekofb marked this conversation as resolved.
Show resolved Hide resolved
- `type`: the task type, either webhook or pull-request
enekofb marked this conversation as resolved.
Show resolved Hide resolved
- `url` : the git repository url or the webhook url
- `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request)
- `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook.

## Implementation History

- [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589)
Loading