Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pipeline promotions designs #76

Merged
merged 53 commits into from
Oct 28, 2022
Merged
Changes from 15 commits
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
bb0dc4a
adding adr for change detection
enekofb Sep 26, 2022
fb57569
adding adr
enekofb Sep 26, 2022
126df4d
consequences added
enekofb Sep 26, 2022
5ec346c
slicing promotion designs
enekofb Oct 5, 2022
f28fc7b
updated ADR
enekofb Oct 5, 2022
9e41851
updated notification change rfc
enekofb Oct 5, 2022
49860d3
detect deployment changes
enekofb Oct 5, 2022
0a6a5e7
added ADR for promotions solution
enekofb Oct 5, 2022
7c70e6b
unified promotions rfcs into one
enekofb Oct 6, 2022
e2b44a5
rfc promotions completed
enekofb Oct 6, 2022
ad93f0a
section on nfrs
enekofb Oct 6, 2022
d4165b1
RFC and ADR ready for draft
enekofb Oct 6, 2022
ee90b89
adr updated
enekofb Oct 7, 2022
21a4bab
promotions rfc reviewed
enekofb Oct 7, 2022
0ff3a15
change detection rfc updated
enekofb Oct 7, 2022
261354f
completing arguments for pipeline controller
enekofb Oct 10, 2022
104f79f
wording reviewed
enekofb Oct 10, 2022
4d43115
updated ADR justification
enekofb Oct 10, 2022
d532d57
wording from PR review
enekofb Oct 10, 2022
f59cc58
Fix typos
yiannistri Oct 10, 2022
faa409d
Fix typos
yiannistri Oct 10, 2022
50353c8
update ADR - typos + more context
Oct 10, 2022
d7750fb
typos
Oct 10, 2022
bb8d0a0
removed the part of the motivation that doesnt talk on problem state…
enekofb Oct 11, 2022
d5940d9
updated the token section to reflect the supported scenario
enekofb Oct 11, 2022
d53e62f
added cons for api layer
enekofb Oct 11, 2022
472ea7a
Merge remote-tracking branch 'origin/promotions-comparison' into prom…
enekofb Oct 11, 2022
dbbcd59
a bit more wordings around last alertnative
enekofb Oct 11, 2022
c3c0963
Small tweaks
yiannistri Oct 11, 2022
10fe75b
rfc split in: overview doc + deeper detail doc by stage in the path
enekofb Oct 12, 2022
84c130f
execute promotion document reviewed
enekofb Oct 12, 2022
3c8807d
detect deployment changes refactored
enekofb Oct 13, 2022
a43bc5f
reviewed promotion needs section / document
enekofb Oct 13, 2022
2227e0f
execute promotions
enekofb Oct 13, 2022
8431f05
pr suggestions applied (mostly)
enekofb Oct 13, 2022
a5ba247
Merge remote-tracking branch 'origin/promotions-comparison' into prom…
enekofb Oct 13, 2022
a6897c7
started with scenarios
enekofb Oct 13, 2022
34e6762
scenarios covered
enekofb Oct 13, 2022
d24e001
scenarios reviewed
enekofb Oct 13, 2022
234a61d
some more definitions
enekofb Oct 13, 2022
a87a2c6
hmac security updated
enekofb Oct 14, 2022
712b4ae
updated reliability section
enekofb Oct 14, 2022
4f651ba
strategies reviewed
enekofb Oct 14, 2022
c2756e4
added modularity reason
enekofb Oct 24, 2022
79f4bfd
Update docs/rfcs/0003-pipelines-promotion/execute-promotion.md
enekofb Oct 25, 2022
d5bc894
Update docs/rfcs/0003-pipelines-promotion/detect-deployment-changes.md
enekofb Oct 25, 2022
2623f38
changed comment
enekofb Oct 25, 2022
cdc7f87
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
5d56f99
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
cf766ab
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
42aba60
Update docs/rfcs/0003-pipelines-promotion/README.md
enekofb Oct 26, 2022
8a2b403
pr comments
enekofb Oct 26, 2022
f46679a
changed status to merge
enekofb Oct 28, 2022
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
68 changes: 68 additions & 0 deletions docs/adrs/0013-pipelines-promotions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# 13. Pipelines Promotions

## Status
Proposed

## Context
As part of weave gitops, Sunglow is working on delivering [Continuous Delivery Pipelines](https://www.notion.so/weaveworks/CD-Pipeline-39a6df44798c4b9fbd140f9d0df1212a) where
yiannistri marked this conversation as resolved.
Show resolved Hide resolved
[first iteration has been delivered](https://docs.gitops.weave.works/docs/next/enterprise/pipelines/intro/index.html)
covering the ability to view an application deployed across different environments.

The [second iteration](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258) aims
to enable promotions between environments.

This ADR records a couple of decision we think are important:
yiannistri marked this conversation as resolved.
Show resolved Hide resolved

- how the promotion solutions looks like end to end.
yiannistri marked this conversation as resolved.
Show resolved Hide resolved
- how deployment changes are detected.

## Decision

### How promotions solution looks like end to end

As [discussed in RFC](../rfcs/0003-pipelines-promotion/README.md) four alternatives were discussed:

- weave gitops
- pipelines controller
- weave gitops + pipeline controller + promotion executor
- new service
enekofb marked this conversation as resolved.
Show resolved Hide resolved

The `pipeline controller` solution has been chosen over its alternatives (see alternatives section) due to

- it enables promotions.
- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion.
- it is easier to develop over other alternatives.

On the flip side, the solution has the following constraints:

- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api.
- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource.

### How deployment changes are detected
Copy link
Contributor

@enekofb enekofb Oct 13, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would consider removing this section to simplify the ADR to just have a decision at the level for the solution.


As [discussed in RFC](../rfcs/0003-pipelines-promotion/detect-deployment-changes.md) each of approaches has associated unknowns.

The major ones are:

- Webhooks: the need for a new network flow in the product, from leaf cluster to management, and the potential impediments
that it would suppose for customers while adopting the solution, as well its security management.
- Watching: how reliable the solution could be as not having existing examples of products using it for watching remote clusters.

We envision weave gitops as needs to be a flexible solution that eventually would need to support both approaches
to accommodate the range of potential enterprises using weave gitops.

In order to start with one of the approaches, we have decided to start by `webhooks` solution due to:

- Allow us to provide promotions for wge customers based on our own promotions capability with better scalability approach.
- Reinforces the vision of weave gitops being a continuum of Flux by using Flux core components, in this context, [notification
controller](https://fluxcd.io/flux/components/notification/), to provide the basic building blocks around deployment notification.

## Consequences

- A path forward for pipelines to deliver promotions capability. Sunglow could deliver promotions based on this approach.
- A set of further actions needs to be risks that needs management:
- To manage the risk associated with the network flow between leaf to management cluster for deployment notifications.
- To determine concrete CI scenarios that we need to integrate with.
- To discover the reliability aspects of the watchers approach to understand its feasibility.


298 changes: 298 additions & 0 deletions docs/rfcs/0003-pipelines-promotion/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,298 @@
# RFC-0003 Pipeline promotions

**Status:** provisional

**Creation date:** 2022-10

**Last update:** 2022-10-05

## Summary

Given a continuous delivery pipeline, the application goes via different environments in its way to production. We
need an action to sign the intent of deploying an application between environments. That concept is generally known as a
yiannistri marked this conversation as resolved.
Show resolved Hide resolved
promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap
as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258)
enekofb marked this conversation as resolved.
Show resolved Hide resolved

## Terminology

- **Pipeline**: a continuous delivery Pipeline declares a series of environments through which a given application is expected to be deployed.
- **Promotion**: action of moving an application from a lower environment to a higher environment within a pipeline.
For example promote stating to production would attempt to deploy an application existing in staging environment to production environment.
yiannistri marked this conversation as resolved.
Show resolved Hide resolved
- **Environment**: An environment consists of one or more deployment targets. An example environment could be “Staging”.
- **Deployment target**: A deployment target is a Cluster and Namespace combination. For example, the above “Staging” environment, could contain {[QA-1, test], [QA-2, test]}.
- **Application**: A Helm Release.

## Motivation
yiannistri marked this conversation as resolved.
Show resolved Hide resolved

Given a continuous delivery pipeline, the application goes via different environments in its way to production. We
need an action to sign the intent of deploying an application between environments. That concept is generally known as a
enekofb marked this conversation as resolved.
Show resolved Hide resolved
promotion. Current pipelines in weave gitops does not support promotion. This RFC addresses this gap
as specified in the [product initiative](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258)

### Goals

- Design the e2e solution for promotions on weave gitops pipelines.
makkes marked this conversation as resolved.
Show resolved Hide resolved
- Should support the [scenarios identified](https://www.notion.so/weaveworks/Pipeline-promotion-061bb790e2e345cbab09370076ff3258#5b514ad575544595b1028d73e5b6dd23)
enekofb marked this conversation as resolved.
Show resolved Hide resolved
enekofb marked this conversation as resolved.
Show resolved Hide resolved

### Non-Goals

- Anything beyond the scope of promotions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be worth mentioning as the title, summary and motivation already make this clear.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please expand here on the request?

- Scenarios other than the identified in the product initiative.

## Proposal
We propose to use a solution as specified in the following diagram.

```mermaid
sequenceDiagram
participant F as Flux
participant LC as Notification Controller (Leaf)
F->>LC: deploy helm release
LC->>pc: send deployment change event
participant pc as Pipeline Controller (Managment)
pc->>pc: authz and validate event
participant k8s as Kubernetes Api
pc->>k8s: get pipeline
pc->>pc: promotion business loic
participant k8s as Kubernetes Api
pc->>configRepo: raise PR
participant configRepo as Configuration Repo
```

With three main activities

1. Notify deployment changes
2. Determine whether a promotion is needed
3. Execute the promotion

### Notify deployment changes

The solution leverages [flux native notification capabilities](https://fluxcd.io/flux/components/notification/) for this responsibility.
An evaluation of different alternatives solutions to this concern could be found [here](detect-deployment-changes.md).

### Determine whether a promotion is needed

This responsibility is assumed by `pipeline controller` living in the management cluster that
yiannistri marked this conversation as resolved.
Show resolved Hide resolved
- would expose a webhook to ingest deployment change events.
- process concurrently the deployment events
- determine whether at the back of the event and a pipeline definition, a promotion is required.

### To execute the promotion

Once the previous evaluation considers that a promotion is required, pipeline controller would be in charge
of orchestrating and executing the task according to its configuration.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does "according to its configuration" mean here? Where is that configuration defined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

detailed a bit more, let me know in case is not clear enough, a suggestion to add


### Non-functional requirements

As enterprise feature, we try also to understand the considerations in terms of non-functional requirements to ensure
that no major impediments are found in the future.

#### Security

Promotions have a couple of activities that requires to drill down in terms of security:

1. communication of deployment changes via webhook so over the network.
2. to create pull requests, so write access to gitops configuration repo.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, the promotion flow itself might not be part of this RFC (which I would in fact support) so security concerns around it shouldn't be in this RFC as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored this part of the document to make it more general and moved the pr creation to a promotions strategy document.


**Security for deployment changes via webhook**
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As I said above, the promotion flow itself might not be part of this RFC (which I would in fact support) so security concerns around it shouldn't be in this RFC as well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not sure if we should split the security concerns of the solution in a different RFC. As scoped, this RFC aims to design the e2e solution for promotion which sounds sensible to include security concerns.


//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594
enekofb marked this conversation as resolved.
Show resolved Hide resolved
enekofb marked this conversation as resolved.
Show resolved Hide resolved

**Security for pull request creation**

//TODO complete at the back of https://github.com/weaveworks/weave-gitops-enterprise/issues/1594
enekofb marked this conversation as resolved.
Show resolved Hide resolved

#### Scalability

The initial strategy to scale the solution by number of request, would be vertically by using goroutines.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nothing prevents us from scaling horizontally. It's just a webhook. We can scale it up to a million replicas if the need arises in a certain environment. Why is that not mentioned here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yup, i think it is fair adding it.


#### Reliability

It will be implemented as part of the business logic of pipeline controller.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What does this mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tried to expand here with more details


#### Monitoring

To leverage existing [kubebuilder metrics](https://book.kubebuilder.io/reference/metrics.html). There will be the need
to enhance default controller metrics with business metrics like `latency of a promtion by application`.
yiannistri marked this conversation as resolved.
Show resolved Hide resolved

### Why this solution
sympatheticmoose marked this conversation as resolved.
Show resolved Hide resolved

The current solution has been chosen over its alternatives (see alternatives section) due to

- it enables promotions.
- it allows to separations roles, therefore permissions between the components notifying the change and executing the promotion.
- it is easier to develop over other alternatives.

On the flip side, the solution has the following constraints:

- there is a need to manage and expose the endpoint for deployment changes separately to weave gitops api.
enekofb marked this conversation as resolved.
Show resolved Hide resolved
- non-canonical usage of controllers as its behaviour is driven by ingested event than change in the declared state of a resource.
enekofb marked this conversation as resolved.
Show resolved Hide resolved
enekofb marked this conversation as resolved.
Show resolved Hide resolved

## Alternatives
makkes marked this conversation as resolved.
Show resolved Hide resolved

Other alternatives solutions have been discovered and discussed. They difference among them is around
the component serving the promotion logic, therefore the alternatives names are based on it.

- Alternative A: weave gitops backend
- Alternative B: weave gitops api + pipeline controller + promotion executor
- Alternative C: promotions service (new service)

### Alternative A: weave gitops backend
enekofb marked this conversation as resolved.
Show resolved Hide resolved

```mermaid
sequenceDiagram
participant F as Flux
participant LC as Notification Controller (Leaf)
F->>LC: deploy helm release
LC->>wge: send deployment change event
participant wge as Weave Gitops Backend (Managment)
wge->>wge: authz and validate event
participant k8s as Kubernetes Api
wge->>k8s: get pipeline
wge->>wge: promotion business loic
participant k8s as Kubernetes Api
wge->>configRepo: raise PR
participant configRepo as Configuration Repo
```

This solution is different from `pipeline controller` in that the three responsibilities

1. Notify deployment changes
2. Determine whether a promotion is needed
3. Execute the promotion

are fulfilled within weave gitops backend app.

**Pro**
- Already setup and *should* be more easily exposed.
- No need to manage other exposed surface, therefore less to secure.
- No need to generate TS client
enekofb marked this conversation as resolved.
Show resolved Hide resolved

**Cons**
- Notifier service account needs permissions for promotion resources.

### Alternative B: weave gitops api + pipeline controller + promotion executor

```mermaid
sequenceDiagram
participant F as Flux
participant LC as Leaf Cluster
F->>LC: deploy helm release
LC->>WGE: notify deployment via notification controller
participant WGE as Weave Gitops API
participant k8s as Kubernetes Api
WGE->>k8s: write deployment event
participant pc as pipeline controller
k8s->>pc: watch deployment event & pipelines
pc->>pj: create promotion job
participant pj as promotion job
pj->>pj: promotion business logic
pj->>configRepo: raise PR
participant configRepo as Configuration Repo
```

This solution is different from `pipeline controller` in that the three responsibilities are split

1. Notify deployment changes: ingestion is done via weave gitops api. the event is written in pipeline resource.
2. Determine whether a promotion is needed: pipeline controller watches for changes in pipeline.
3. Execute the promotion: extracted to a kubernetes job layer.

**Pro**
- Using ingestion layer so not increased operational costs.
- No need to generate TS client
- Pipeline controller with reconcile loop so canonical usage.
- Scalability and fault-tolerance by design.

**Cons**
- Needs to write in pipeline resource
- Most complex solution
- Kubernetes jobs not a popular choice
sympatheticmoose marked this conversation as resolved.
Show resolved Hide resolved

### Alternative C: promotions service

This solution is a simplified approach to pipeline controller with only the promotion responsibility.

```mermaid
sequenceDiagram
participant F as Flux
participant LC as Notification Controller (Leaf)
F->>LC: deploy helm release
LC->>PS: notify deployment via notification controller
participant PS as Promotions Svc (Management)
PS->>PS: authz and validate event
participant k8s as Kubernetes Api
PS->>k8s: get pipeline
PS->>PS: promotion business loic
participant k8s as Kubernetes Api
PS->>configRepo: raise PR
participant configRepo as Configuration Repo
```
**Pro**
- easiest to dev against
- no controller so no reconcile loop executed

**Cons**
- 1 more component for the team to maintain
- new repo/CI (?)

## Design Details

### Pipeline spec changes for promotions

In order to accommodate promotion logic, the pipeline spec would be extended with a `promotion` field as shown below

```yaml
apiVersion: pipelines.weave.works/v1alpha1
kind: Pipeline
metadata:
name: podinfo
namespace: default
spec:
appRef:
apiVersion: helm.toolkit.fluxcd.io/v2beta1
kind: HelmRelease
name: podinfo
promotion:
- name: promote-via-pr
enekofb marked this conversation as resolved.
Show resolved Hide resolved
type: pull-request
url: git@github.com:organization/repo
branch: main
secretRef: my-other-deployed-secret
environments:
- name: dev
targets:
- namespace: podinfo
clusterRef:
kind: GitopsCluster
name: dev
```
The promotion field used to capture the promotion tasks for the next environment in the pipeline after a successful deployment has taken place.
Each task will include the following fields:

- `name`: the task name
enekofb marked this conversation as resolved.
Show resolved Hide resolved
- `type`: the task type, either webhook or pull-request
enekofb marked this conversation as resolved.
Show resolved Hide resolved
- `url` : the git repository url or the webhook url
- `branch`: the branch to use for the update, defaults to main (only applicable when kind is pull-request)
- `secretRef`: a reference to a secret in the same namespace as the pipeline that holds the authentication credentials for the repository or the webhook.

### Promotions Webhook

The endpoint should receive webhook requests to indicate a promotion of an environment.

Each environment of each pipeline has its own webhook URL for triggering a promotion:

```
/pipelines/promotions/{namespace}/{name}/{environment}
```

When a request is received, the handler will look up the environment in the pipeline to:

- `authz` the request via hmac
- `validate` the promotion
- `lookup and execute` the promotion actions

The handler needs to run with it own set of permissions (not user permissions) to be able
to read app versions across environments in a pipeline.

## Implementation History

- [Promotions Issue](https://github.com/weaveworks/weave-gitops-enterprise/issues/1589)
Loading