diff --git a/teps/0004-tekton-integrations-annotations-statuses.md b/teps/0004-tekton-integrations-annotations-statuses.md new file mode 100644 index 000000000..24f035405 --- /dev/null +++ b/teps/0004-tekton-integrations-annotations-statuses.md @@ -0,0 +1,611 @@ +--- +title: tekton-integrations-annotations-status +authors: + - "@wlynch" +creation-date: 2020-07-13 +last-updated: 2020-07-13 +status: proposed +--- + + + +# TEP-0004: Tekton Integrations: Annotations and Statuses + + + + + + + +- [Summary](#summary) +- [Motivation](#motivation) + - [Goals](#goals) + - [Non-Goals](#non-goals) +- [Requirements](#requirements) +- [Proposal](#proposal) + - [Integration Annotations](#integration-annotations) + - [Integration Status](#integration-status) + - [User Stories (optional)](#user-stories-optional) + - [Error updates on Pipeline failures](#error-updates-on-pipeline-failures) + - [Why didn't this Pipeline update my PR?](#why-didnt-this-pipeline-update-my-pr) + - [Multitentant / root secrets](#multitentant--root-secrets) + - [Notes/Constraints/Caveats (optional)](#notesconstraintscaveats-optional) + - [Integration Event Handling](#integration-event-handling) + - [Triggers](#triggers) + - [Risks and Mitigations](#risks-and-mitigations) + - [External Integration Quotas](#external-integration-quotas) + - [User Experience (optional)](#user-experience-optional) + - [Storing data in Pipelines](#storing-data-in-pipelines) + - [Performance (optional)](#performance-optional) +- [Design Details](#design-details) + - [Integration Status](#integration-status-1) +- [Test Plan](#test-plan) +- [Drawbacks](#drawbacks) + - [Tekton Chains](#tekton-chains) +- [Alternatives](#alternatives) + - [Automatically injecting Tasks](#automatically-injecting-tasks) + - [Storing integration statuses outside of Run status](#storing-integration-statuses-outside-of-run-status) +- [Infrastructure Needed (optional)](#infrastructure-needed-optional) +- [Upgrade & Migration Strategy (optional)](#upgrade--migration-strategy-optional) + + +## Summary + + + +As part of CI/CD, Tekton needs to be able to interface with many different +systems. Common integrations may include: + +- Code Review tools (GitHub, Gitlab, Gerrit, etc.) +- Chat tools (Slack, Hangouts, etc.) +- Email +- Bug trackers (GitHub, Jira, etc.) +- Webhook / Cloud Event + +Currently most integration solutions for Tekton Pipelines require users to embed +update Tasks into their Pipelines. In practice, this ends up not being a good +mechanism for ensuring that external integration notifications always occur, for +example: + +- If the pipeline fails to start because of a configuration error, the external + integration is never notified. +- If the pipeline fails a step and never reaches the notification step, the + external integration is never notified. +- If a transient error (e.g. GitHub is down) prevents the task from completing + successfully, this may result in the pipeline failing. + +This proposal details improvements to Task/PipelineRuns to annotate and store +integration status information. + +## Motivation + + + +### Goals + +- Define a mechanism for Tekton and other providers to hook in external + integrations for pipeline and task runs. +- Allow integrations to report integration status for a pipeline for ease of + debugging. +- Minimize impact on pipeline/task execution spec. + + + +### Non-Goals + + + +- While GitHub integrations will often be used as an example, I want to avoid + going into too much deep details into how a specific GitHub integration will + be configured / what data will be rendered / what APIs will be used / etc, + unless it has direct implications on integrations as a whole. + +## Requirements + + + +1. Integrations should be optional from core Tekton Pipelines functionality. +2. Users should be able to pick and choose which integrations they have + installed in their cluster. +3. Integration data should be stored as close to the PipelineRun as possible + (ideally alongside). + +## Proposal + + + +### Integration Annotations + +[Annotations](https://kubernetes.io/docs/concepts/overview/working-with-objects/annotations/) +of the format `.integrations.tekton.dev/` to will be used to +annotate PipelineRun and TaskRun objects with integration metadata. This will +not be strictly enforced, in order to allow interoperability with other tools / +annotation namespaces, but we should recommend this as a default by convention. + +This fits into the annotations model - from the Kubernetes docs: + +> Fields managed by a declarative configuration layer. Attaching these fields as +> annotations distinguishes them from default values set by clients or servers, +> and from auto-generated fields and fields set by auto-sizing or auto-scaling +> systems. +> +> Build, release, or image information like timestamps, release IDs, git branch, +> PR numbers, image hashes, and registry address. +> +> User or tool/system provenance information, such as URLs of related objects +> from other ecosystem components. + +This allows for integrations to annotate arbitrary data without needing to +modify the Pipeline execution spec, with the expectation that another reconciler +/ tool will notice and act on these annotations. + +**Examples:** + +- GitHub + ``` + github.integrations.tekton.dev/owner: wlynch + github.integrations.tekton.dev/repo: test + github.integrations.tekton.dev/sha: 4d2f673c600a0f08c3f6737bc3f5c6a6a15d62e6 + ``` +- Webhook + ``` + webhook.integrations.tekton.dev/url: https://example.com:8080 + ``` +- Slack + ``` + slack.integrations.tekton.dev/url: https://tektoncd.slack.com + slack.integrations.tekton.dev/channel: build-cop + ``` + +### Integration Status + +Key questions we will need to answer is - did an integration complete +successfully for my pipeline? if not, why? + +In Tekton's +[dogfooding cluster](https://github.com/tektoncd/plumbing/blob/master/docs/dogfooding.md) +today, this answer is somewhat obfuscated. When a pipeline is complete, the +cluster kicks off a cloud event containing information about the PR to update. +This is picked up by another Trigger, which then updates the PR. It is difficult +to tell from the initial PipelineRun whether a PR was updated, and if there was +a problem, why. + +To solve this, we should add additional status fields in +TaskRunStatus/PipelineRunStatus, specifically for integration data. This should +include information such as: + +- What integration the status is for (e.g. `github.integrations.tekton.dev`). +- What object generation the status is for. +- What was the outcome of the integration for this object (e.g. success, + failure). +- (optional) More details of status outcome. + +Examples of what might be stored: + +- Integration Success/Failure +- GitHub CheckRun IDs +- Tekton Result IDs +- Integration error responses (unavailable, out of quota, etc.) + +### User Stories (optional) + + + +#### Error updates on Pipeline failures + +By making integration code independent of Pipeline execution, this enables us to +respond to external integrations even in cases where the Pipeline/Task fails to +start, as long as the Pipeline/TaskRun has been accepted by Tekton. + +This is particularly useful for validation / dereferencing errors that cannot be +caught at the time of initial validation. + +#### Why didn't this Run update my PR? + +With integration statuses, we have a place to annotate data alongside the +pipeline run itself. This gives us an easy mechanism for users to do the +following: + +1. Push a commit, notice pull request wasn't updated for a run. +2. Look for all Runs with Integration annotation for the repo / commit (if + empty, then you know there was no build kicked off). +3. Find latest Run, `kubectl get` it, inspect integrations status field(s) for + why it the status didn't appear. + +#### Multitentant / root secrets + +Some integrations like require shared secrets that aren't appropriate to be +shared with users in a multitenant environment since they may grant broad +access. For example, GitHub Apps authenticate by generating a token from a +private key. If you have access to this private key, you can access any +installation of the App across any user/org with the permissions of the App. + +Separating pipeline execution from integration handling allows for integrators +to have more control over how and where these types of secrets are used, +independent of general user workloads. As long as you have the ability to read +integration annotations and update the integration status, you have the ability +to run the actual integration handling any where (e.g. different cluster, or +even outside of kubernetes altogether) and have control over the authorization +with external systems. + +### Notes/Constraints/Caveats (optional) + + + +#### Integration Event Handling + +A key aspect of the proposal is that user Pipelines are not directly responsible +for handling integration events, and mainly focuses on how integrations can +configure itself and provide information back to users within the Run. It +intentionally does not define how integrations should handle events. + +It is up to individual integrations to decide how and when to handle integration +logic. Handlers should be installed independently of the core Pipeline +installation to allow users to customize their installation. + +As a recommendation, integrations should rely on some form of reconciler in +order to reliably process events, but it's also fine to rely on a push +mechanism. + +#### Annotation-less integrations + +Defining annotations is an option, not a requirement. If you want an integration +to apply to all Runs with no configuration needed (or provided by another source +like a ConfigMap), that is okay. Integrations of this type will still able to +use status fields. + +This is useful for use cases like webhook/pubsub notifications for all Run +events. + +#### Triggers + +While this proposal mainly focuses on integrations with Pipeline and Task Run +resources, Triggers are another important aspect to handle incoming integration +events. + +To start, we can treat triggers the same as normal user requests - users will +need to set relevant annotations in the created Task/PipelineRun. This is +similar to the existing user experience today of needing to configure +integration Tasks in their pipeline. + +There are Trigger related features being discussed that would lead to +incremental improvements: + +- ObjectMeta Templating (e.g. be able to template annotations on created objects + per trigger) ([#618](https://github.com/tektoncd/triggers/issues/618)) +- Integration Specific Interceptors / Bindings / Templates + ([#504](https://github.com/tektoncd/triggers/issues/504), + [#584](https://github.com/tektoncd/triggers/issues/584)) + +While these are out of scope for this specific proposal, this proposal remains +compatible with future changes to make configuring Triggers easier, since it is +agnostic to the source of configuration. + +### Risks and Mitigations + + + +#### External Integration Quotas + +If relying on reconcilers / events that trigger on every update, integration +providers will need to be careful to stay within quotas external providers may +impose. + +While Tekton Pipelines will not impose specific throttling for integration +providers, it will be a general best practice to use Status/Condition data (e.g. +`ObservedGeneration`, `Conditions`, `LastTransitionTime`) as well as reconciler +push back with `RequeueAfter` to stay within external quotas. + +### User Experience (optional) + + + +#### Storing data in Pipelines + +Similar to `TaskRunResults` and `PipelineResourceResults`, integration statuses +are another form of external artifact generated as a result of the Pipeline. +Because of this, it makes sense to store these statuses as a part of the +Pipeline/TaskRun status, so that it is easy to find and see this data when +something goes wrong. + +### Performance (optional) + + + +N/A + +For pipelines, this should have no impact on runtime performance. Integration +providers will need to be mindful of the information they store within the +Pipeline/TaskRun objects to stay within k8s limits, but we are not exposing +additional risk that did not already exist. + +## Design Details + +### Integration Status + + + +Most of the fields we want are already included in the Knative's +[Status](https://pkg.go.dev/knative.dev/pkg/apis/duck/v1beta1?tab=doc#Status) +object, so we can likely take a similar approach here. e.g.: + +```go +type IntegrationStatus struct { + // Name of the integration namespace. Generally of the form + // .integrations.tekton.dev + Name string + + // ObservedGeneration is the 'Generation' of the Service that + // was last processed by the controller. + // +optional + ObservedGeneration int64 `json:"observedGeneration,omitempty"` + + // Conditions the latest available observations of a resource's current state. + // +optional + // +patchMergeKey=type + // +patchStrategy=merge + Conditions api.Conditions `json:"conditions,omitempty" patchStrategy:"merge" patchMergeKey:"type"` + + // Annotations is additional Status fields for the Resource to save some + // additional State as well as convey more information to the user. This is + // roughly akin to Annotations on any k8s resource, just the reconciler conveying + // richer information outwards. + Annotations map[string]string `json:"annotations,omitempty"` +} +``` + +## Test Plan + + + +Integrations will be responsible for handling testing of their own components. +For pipelines, this is mainly the storage of additional metadata. As long as we +verify that it is preserved through a pipeline that is sufficient. + +## Drawbacks + + + +### Tekton Chains + +``` +<<[UNRESOLVED dlorenc ]>> +This section needs more clarification. Depending on how Tekton Chains operates, this might be okay. +<<[/UNRESOLVED]>> +``` + +Tekton Chains currently calls out storing TaskRun definition data for verifiable +builds. It is unclear if this includes the entire TaskRun definition, or just +the `TaskRunSpec`. + +If it includes `TaskRunStatus` data this proposal will be at odds with Tekton +Chains, since fundamentally this processes and annotates integration status +post-pipeline execution and is not guaranteed to complete before chains +snapshots and signs TaskRun data. + +If `TaskRunStatus` is omitted, then there are no issues. + +### Reconcile loops + +If integrators are not careful, they may accidentally trigger reconcile loops if +they are constantly updating state based on resource updates (e.g. notice +update, record new Generation version, update metadata with the version causing +a new Generation). This is a risk for any pattern that is self-updating a +resource, so no particular recommendations here beyond "don't do this", but it +is something I want to call out in case there are known best practices here. + +## Alternatives + + + +### Automatically injecting Tasks + +This alternative keeps the idea of running user tasks inside of the Pipeline, +but we make it easier for users by automatically appending integration tasks to +the end of their pipeline / adding them as finally tasks. + +This has a few drawbacks: + +- We can't call tasks if the pipeline never runs. +- The user pipeline needs to have access to all secrets, which limits what we + can do with shared multitenant secrets. + +### Storing integration statuses outside of Run status + +As an alternative to storing integration status information within a +Task/PipelineRun status, we could store this information in another CRD type +that references back to the run. The advantage of this is that it avoids making +an API change to the Task/PipelineRun object directly and the consequences that +may arise. + +I don't think this is compelling enough of a reason to separate this into its +own type, since it makes the information harder to find. I suspect most users +will look to the Task/PipelineRun for more details if their pull request isn't +updated as expected. Since the data is tightly coupled to the underlying Run and +is not shared among rungs, it makes the most sense for the data to exist with +the Run itself. + +Since this is a completely additive change, this will not require any sort of +API deprecation. + +The main counterarguemnt for this is to avoid risks of reconcile loops, but I am +not convinced the trade-off of adding a level of indirection to get integration +data is worth it. + +### Embedding all integration status data inside Conditions + +Another way to skirt API changes would be to encode all integration status +information into the existing conditions type, encoding data into existing +fields. + +I'd like to avoid this since this isn't really what these fields were meant for, +and we would be overloading the meaning of the fields in many cases. + +## Infrastructure Needed (optional) + + + +I suspect that this may be another component type to include in the catalog long +term, but for now this can be built in experimental. (In fact there is already a +project there that helped inspired this proposal - +https://github.com/tektoncd/experimental/tree/master/commit-status-tracker). + +## Upgrade & Migration Strategy (optional) + + + +Integration providers will be responsible for handling upgrades and migrations +for their own components. A `.integrations.tekton.dev/version` annotation +may be useful for identifying integration versions.