Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Chains needs to be able to "Identify entry point" - SLSA L3 requirement. #521

Closed
chuangw6 opened this issue Aug 16, 2022 · 15 comments · Fixed by #554
Closed

Chains needs to be able to "Identify entry point" - SLSA L3 requirement. #521

chuangw6 opened this issue Aug 16, 2022 · 15 comments · Fixed by #554
Labels
kind/bug Categorizes issue or PR as related to a bug.

Comments

@chuangw6
Copy link
Member

chuangw6 commented Aug 16, 2022

Expected Behavior

The intoto attestation with SLSA predicate generated by Chains should include configSource section in predicate.invocation that is supposed to have 3 fields uri, digest and entryPoint per SLSA spec.

Actual Behavior

Chains currently doesn't set this field at all for taskrun level provenance.

Motivation

Identifies entry point is one of SLSA L3/L4 requirements on the provenance content. That requires Chains to be able to capture the source information of a remote build definition in the provenance. Remote build definition means the build script that are stored in version control (see Build as code for more details). And Tekton supports referencing a remote pipeline/task definition in a pipelinerun/taskrun via remote resolution.

Solution

Essentially, the source of truth is the Source field in a ResolutionRequest Status. Remote resolution reconciler is responsible for capturing the correct source value and update the status field. Next, Tekton reconcilers will fetch the value from ResolutionRequest.Status and store it in a subfield of the Provenance field in the PipelineRun.Status (or TaskRun.Status) named ConfigSource so that Chains can grab the information from *run status and finally record in the provenance. Note: Source in ResolutionRequest Status and Provenancce in *Run Status are the new fields we introduced.

The following diagram shows the data flow.

Blank diagram (7)

Related PRs

Q&A

How do we make sure the source of truth is not maliciously tampered?

We will leverage SPIRE integration to sign the content of ResolutionRequest Status and *Run Status to guarantee the integrity. See tektoncd/pipeline#5474.

@chuangw6 chuangw6 added the kind/bug Categorizes issue or PR as related to a bug. label Aug 16, 2022
@chuangw6
Copy link
Member Author

RE: Comment "we currently don't set ConfigSource because we don't know which material the Task definition came from"

We might need to preserve an entry in the materials that is specifically for invocation.configSource.uri and invocation.configSource.digest.

If the entry in materials created with the values of CHAINS-GIT_COMMIT and CHAINS-GIT_URL params is what we want for configSource, we also need a preserved param name i.e. CHAINS-ENTRYPOINT for invocation.configSource.entryPoint.

Thoughts?

@chuangw6
Copy link
Member Author

@wlynch
Copy link
Member

wlynch commented Aug 16, 2022

IIUC, this would be source information for the Run object? We don't really have a concept of that today, since all Run objects come to us directly through the k8s API.

CHAINS_GIT* is really source provenance information, which describes what is happening inside the build not the how the build was created.

Top level Run refs (i.e. pipelineRef, taskRef) are probably going to be the closest approximation, but really we might need something like Workflows in place first.

@chuangw6
Copy link
Member Author

IIUC, this would be source information for the Run object? We don't really have a concept of that today, since all Run objects come to us directly through the k8s API.

From slsa spec,

invocation.configSource object, optional
Describes where the config file that kicked off the build came from. This is effectively a pointer to the source where buildConfig came from.

For the url/digest, I am thinking the repo url/digest that triggers the pipelinerun i.e. the gitrevision and gitrepositoryurl in this example.

If so, we can teach trigger to record the config source information in the annotation of the run object so that Chains can pick up that information rather than using params. wdyt?

@wlynch
Copy link
Member

wlynch commented Aug 16, 2022

Triggers are almost too generic here since they don't have a strong affinity for particular source or could have multiple sources, etc. I think workflows are meant to make this a bit more opinionated.

I'm hesitant to give special meaning to a user supplied value - any ideas on how can we verify this is actually correct?
@Yongxuanzhang this sounds similar to trusted resources, but instead of author identity, we want to verify the repo source👀 (idk if this is a good idea but maybe this is something that could be a part of the signature?)

Basing it off of top level refs might work, since you're only ever going to reference one source. Main downside is it wouldn't capture the Run config itself, but rather its referenced object.
Though maybe this is okay? If you think of the Run object as more as a request rather than a config, it might be fine. (I have to think through this some more)

@dibyom (not sure if you're the best person for this - feel free to redirect) has there been any discussion around remote resolution of Run objects that might be relevant here?

@lcarva
Copy link
Contributor

lcarva commented Aug 17, 2022

Could this be a good place to store information about the Task (for TaskRun attestations) and the Pipeline (for PipelineRun attestations) definitions? Currently, this information is not stored in the attestation at all.

If a Tekton Bundle is used, this would be really useful:

  • uri -> OCI image reference
  • digest -> digest of the image? of the Task/Pipeilne definition?
  • entrypoint -> name of the task within the bundle

I guess it would be a bit of a stretch to make those attributes meaningful for in-cluster Task/Pipeline definitions but it seems plausible.

@Yongxuanzhang
Copy link
Member

I'm hesitant to give special meaning to a user supplied value - any ideas on how can we verify this is actually correct? @Yongxuanzhang this sounds similar to trusted resources, but instead of author identity, we want to verify the repo source👀 (idk if this is a good idea but maybe this is something that could be a part of the signature?)

Could you elaborate more on "that could be a part of the signature", you mean also signing it or it can be another signature that we want to verify?

@wlynch
Copy link
Member

wlynch commented Aug 18, 2022

@lcarva

Could this be a good place to store information about the Task (for TaskRun attestations) and the Pipeline (for PipelineRun attestations) definitions? Currently, this information is not stored in the attestation at all.

👍 This is inline with what I was thinking with top level refs.

I guess it would be a bit of a stretch to make those attributes meaningful for in-cluster Task/Pipeline definitions but it seems plausible.

I actually don't think it's that far off! entrypoint is optional so we could omit it for API objects that only have 1 item, uri could be the API path (i.e. /apis/tekton.dev/v1beta1/namespaces/default/tasks/foo, maybe throw in an @UUID to uniquely ID it), digest could be a SHA256 digest of the json.


@Yongxuanzhang

Could you elaborate more on "that could be a part of the signature", you mean also signing it or it can be another signature that we want to verify?

Either or. Just wanted to call it out since it feels like a similar problem.

@chuangw6
Copy link
Member Author

Could this be a good place to store information about the Task (for TaskRun attestations) and the Pipeline (for PipelineRun attestations) definitions? Currently, this information is not stored in the attestation at all.

The explanation for the invocation field in slsa spec is "Identifies the event that kicked off the build.". In that sense, I feel like refs (i.e. oci bundles url, digest, task/pipeline name) are something part of the build instead of an event that kicked off the build itself. Also, how are we going to store refs in a pipelinerun attestation's invocation field in case there are multiple oci bundle refs used in the pipeline?

@wlynch
Copy link
Member

wlynch commented Aug 18, 2022

The explanation for the invocation field in slsa spec is "Identifies the event that kicked off the build.". In that sense, I feel like refs (i.e. oci bundles url, digest, task/pipeline name) are something part of the build instead of an event that kicked off the build itself.

Bit of a philosophical question which is why I'm on the fence here - is there meaningful difference between kubectl apply -f taskrun.yaml and tkn task start task.yaml if they both generate the equivalent API request? The latter feels like it fits the definition of the configSource field (especially if you compare this to similar flows like GCB yamls or GitHub workflow_runs).

Also, how are we going to store refs in a pipelinerun attestation's invocation field in case there are multiple oci bundle refs used in the pipeline?

There's only 1 top level ref for a pipeline (the pipeline itself might have several taskrefs, but those would apply to the tasks, not the pipeline).

@bendory
Copy link
Contributor

bendory commented Aug 23, 2022

Observation / question: for human invocations of kubectl or tkn task start, is the intention that invocation will be blank (which I suppose means "unknown"?) or will we explicitly populate "manual" or some such?

@dibyom
Copy link
Member

dibyom commented Aug 23, 2022

@dibyom (not sure if you're the best person for this - feel free to redirect) has there been any discussion around remote resolution of Run objects that might be relevant here?

I'm not sure what this means :) We can resolve pipelines and tasks from a variety of places but what does it mean to resolve a run remotely?

I do have a question though - The SLSA spec says the invocation field "Identifies the event that kicked off the build." while the invocation.configSource field "Describes where the config file that kicked off the build came from".

For Tekton, this config file (i.e. the pipeline definition) could come from git, oci registry, the cluster or could even be embedded in the run. To me, it seems like the pipelineRef or the pipelineSpec fields are better indicators of where the config file came from rather than the incoming event?

Re: for passing provenance info from the eventing/triggering side, I generally agree with @wlynch 's assesment that Workflows will make it better (probably with the eventSource field. That being said, I think we can still make progress/prototype with current Triggers. Some ideas based on chatting with @chuangw6 and @bendory

  1. TriggerBindings are user specified and are probably not a good place to capture the triggering provenance. But we could have the interceptors provide this info. For instance, the GitHub interceptor could attach the needed provenance fields as part of extensions with a well-known name.

  2. We could configure the EventListeners to look for these well known extension fields and add them to the pipeline runs being created either as annotations or well known extra parameter passed on to the pipelineRun (or as part of additional input contexts in the future). Chains can observe these known annotations/params for provenance generation.

  3. This assumes we are trusting the Triggers EventListener and the interceptor. We could also decide to have the EventListener sign the information that is passed down and use a similar mechanism as Trusted Tasks to verify.

@chuangw6
Copy link
Member Author

chuangw6 commented Aug 31, 2022

I do have a question though - The SLSA spec says the invocation field "Identifies the event that kicked off the build." while the invocation.configSource field "Describes where the config file that kicked off the build came from".
For Tekton, this config file (i.e. the pipeline definition) could come from git, oci registry, the cluster or could even be embedded in the run. To me, it seems like the pipelineRef or the pipelineSpec fields are better indicators of where the config file came from rather than the incoming event?

Agreed! This echoes what @wlynch and @lcarva said about top-level refs (i.e. source of pipeline/task definition). We can focus on this instead of trigger information.

git source

gitresolver expects from the author the exact three fields that we need for the configSource url, revision and pathInRepo. However, there are some challenges:

  • revision field can be commit SHA, branch or tag, whereas SLSA's invocation.configSource.digest requires digest only. So we need to map the branch/tag to actual commit SHA.
  • We need to pass down this information for Chains to pick up without modifying user-defined spec.
  • Also, we need to make sure we can standardize the 3 fields that have corresponding values in other sources i.e. oci bundle mentioned by @lcarva.

I created this PoC to try to solve the 3 challenges. tektoncd/pipeline#5397 Please take a look.

oci bundle

Agreed with @lcarva 's suggestion for oci bundle. The bundle param expected by bundle resolver can be url@digest or url@tag. So we also need to fetch the actual digest if tag is provided.

in-cluster definition

Agreed with @wlynch said

uri could be the API path (i.e. /apis/tekton.dev/v1beta1/namespaces/default/tasks/foo, maybe throw in an @uuid to uniquely ID it), digest could be a SHA256 digest of the json.

But if the actual yaml is not in source control, do we want to consider this case and record the information in SLSA provenance?

inline/embedded definition

If pipelinerun.yaml is in source control, do we want to record its source information?

catalog

TODO: will follow up on this later.

chuangw6 added a commit to chuangw6/chains that referenced this issue Sep 9, 2022
Related to tektoncd#521 and tektoncd/pipeline#5397

Prior to this PR, `invocation.configSource` section in slsa provenance
was missing.

In this change, we want to record the configSource information for the
remote resources i.e. git, bundle, catalog.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 18, 2022
Related to tektoncd#521 and tektoncd/pipeline#5397

Prior to this PR, `invocation.configSource` section in slsa provenance
was missing.

In this change, we want to record the configSource information for the
remote resources i.e. git, bundle, catalog.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 18, 2022
Related to tektoncd#521 and tektoncd/pipeline#5397

Prior to this PR, `invocation.configSource` section in slsa provenance
was missing.

In this change, we want to record the configSource information for the
remote resources i.e. git, bundle, catalog.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 18, 2022
Related to tektoncd#521 and tektoncd/pipeline#5397

Prior to this PR, `invocation.configSource` section in slsa provenance
was missing.

In this change, we want to record the configSource information for the
remote resources i.e. git, bundle, catalog.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 28, 2022
Related to tektoncd#521.

Prior to this PR, `invocation.configSource` section in slsa provenance
was missing.

In this change, we want to record the source information in the provenance
to track where the remote pipeline/task definition came from i.e. git repo,
tekton bundles etc. The source information is available in
PipelineRun/TaskRun `Status.Provenance` field.

If a remote pipeline definition references remote task definitions from other remote places,
`predicate.invocation.configSource` will record the source information of
the remote pipeline definition only, and `predicate.buildConfig.tasks[x].invocation.configSource`
will record the source information for individual remote task definition.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 28, 2022
Related to tektoncd#521.

Prior to this PR, the SLSA provenance that Chains produces did not record
the source information of remote resources.

In this change, we want to record the source information in the provenance
to track where the remote pipeline/task definition came from i.e. git repo,
tekton bundles etc. The source information is available in
PipelineRun/TaskRun `Status.Provenance` field.

If a remote pipeline definition references remote task definitions from other remote places,
`predicate.invocation.configSource` will record the source information of
the remote pipeline definition only, and `predicate.buildConfig.tasks[x].invocation.configSource`
will record the source information for individual remote task definition.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 28, 2022
Related to tektoncd#521.

Prior to this PR, the SLSA provenance that Chains produces did not record
the source information of remote resources.

In this change, we want to record the source information in the provenance
to track where the remote pipeline/task definition came from i.e. git repo,
tekton bundles etc. The source information is available in
PipelineRun/TaskRun `Status.Provenance` field.

If a remote pipeline definition references remote task definitions from other remote places,
`predicate.invocation.configSource` will record the source information of
the remote pipeline definition only, and `predicate.buildConfig.tasks[x].invocation.configSource`
will record the source information for the corresponding remote task definition.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 28, 2022
Related to tektoncd#521.

Prior to this PR, the SLSA provenance that Chains produces did not record
the source information of remote resources.

In this change, we want to record the source information in the provenance
to track where the remote pipeline/task definition came from i.e. git repo,
tekton bundles etc. The source information is available in
PipelineRun/TaskRun `Status.Provenance` field.

If a remote pipeline definition references remote task definitions from other remote places,
`predicate.invocation.configSource` will record the source information of
the remote pipeline definition only, and `predicate.buildConfig.tasks[x].invocation.configSource`
will record the source information for the corresponding remote task definition.

Signed-off-by: Chuang Wang <chuangw@google.com>
chuangw6 added a commit to chuangw6/chains that referenced this issue Nov 28, 2022
Related to tektoncd#521.

Prior to this PR, the SLSA provenance that Chains produces did not record
the source information of remote resources.

In this change, we want to record the source information in the provenance
to track where the remote pipeline/task definition came from i.e. git repo,
tekton bundles etc. The source information is available in
PipelineRun/TaskRun `Status.Provenance` field.

If a remote pipeline definition references remote task definitions from other remote places,
`predicate.invocation.configSource` will record the source information of
the remote pipeline definition only, and `predicate.buildConfig.tasks[x].invocation.configSource`
will record the source information for the corresponding remote task definition.

Signed-off-by: Chuang Wang <chuangw@google.com>
tekton-robot pushed a commit that referenced this issue Nov 29, 2022
Related to #521.

Prior to this PR, the SLSA provenance that Chains produces did not record
the source information of remote resources.

In this change, we want to record the source information in the provenance
to track where the remote pipeline/task definition came from i.e. git repo,
tekton bundles etc. The source information is available in
PipelineRun/TaskRun `Status.Provenance` field.

If a remote pipeline definition references remote task definitions from other remote places,
`predicate.invocation.configSource` will record the source information of
the remote pipeline definition only, and `predicate.buildConfig.tasks[x].invocation.configSource`
will record the source information for the corresponding remote task definition.

Signed-off-by: Chuang Wang <chuangw@google.com>

Signed-off-by: Chuang Wang <chuangw@google.com>
@chuangw6
Copy link
Member Author

chuangw6 commented Dec 2, 2022

/close

as completed by #554

@tekton-robot
Copy link

@chuangw6: Closing this issue.

In response to this:

/close

as completed by #554

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

Repository owner moved this from Todo to Done in Tekton Community Roadmap Dec 2, 2022
@chuangw6 chuangw6 changed the title Chains missing configSource in predicate.invocation Chains needs to be able to "Identify entry point" - SLSA L3 requirement. Dec 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug.
Projects
Status: Done
Development

Successfully merging a pull request may close this issue.

7 participants