Restrict step volumes as read-only #4227

wlynch · 2021-09-14T22:07:09Z

Expected Behavior

Internal /tekton folders should only be manipulated by trusted Tekton components (i.e. controller / entrypoint).

Actual Behavior

Tekton mounts the following volumes for each step:

# Read-write
- mountPath: /tekton/tools
  name: tekton-internal-tools
- mountPath: /tekton/downward
  name: tekton-internal-downward
- mountPath: /tekton/creds
  name: tekton-creds-init-home-0
- mountPath: /workspace
  name: tekton-internal-workspace
- mountPath: /tekton/home
  name: tekton-internal-home
- mountPath: /tekton/results
  name: tekton-internal-results
- mountPath: /tekton/steps
  name: tekton-internal-steps

# Read-only
- mountPath: /tekton/scripts
  name: tekton-internal-scripts
  readOnly: true
- mountPath: /var/run/secrets/kubernetes.io/serviceaccount
  name: kube-api-access-9fk95
  readOnly: true

Since many of these directories are mounted read-write and are shared by all steps in the task, these can be manipulated to cause unexpected behavior for steps.

Directory	Description	Consequence	Recommendation
/tekton/tools	Contains entrypoint, post_files for coordinating step starts	Bad step could manipulate it's own entrypoint (requires container restart), or write post_files to prematurely cause steps to start.	Move entrypoint into its own directory Only let step write its own post_files
/tekton/steps	Step exit codes	Bad step can mutate the perceived exit code for other steps. This does not affect container exit / Task status.	Only let step write its own exit codes.
/tekton/results	Results output	None - Results are defined at the Task level.	Nothing - expected behavior
/tekton/downward	Contains files provided by the k8s Downward API to control Task start once sidecars are up	None - despite the `readOnly` option not being set, the volume appears to be read-only if you attempt to mutate it.	Make the mount readonly, even though it's redundant?
/tekton/creds	Used by creds-init to inject credentials for steps	None - we don't guarantee credential isolation in steps, and recommend manual volume mounts if this is required	Nothing - expected behavior (and required for the entrypoint to support creds-init)
/workspace	Used for workspaces	None - workspaces are currently mounted Task-wide, and [step-isolated workspaces]	Nothing - expected behavior step isolated workspaces would be used if users want to restrict this.
/tekton/home	Deprecated Tekton home directory	None	Nothing - expected behavior

Steps to Reproduce the Problem

/tekton/tools

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: step-script-
spec:
  taskSpec:
    steps:
    - name: bad
      image: ubuntu
      script: |
        # Create post_files - this can be done for any step 0-n 
        # to cause the next container to start.
        touch /tekton/tools/0
        touch /tekton/tools/0.err
        sleep 10
        echo "done!"
    - image: ubuntu
      script: |
        echo $(context.task.name)

/tekton/steps

apiVersion: tekton.dev/v1beta1
kind: TaskRun
metadata:
  generateName: step-script-
spec:
  taskSpec:
    steps:
    - name: pass
      image: ubuntu
      script: |
        exit 0
    - name: bad
      image: ubuntu
      script: |
        # Re-write 
        for f in `ls /tekton/steps`; do
          f="/tekton/steps/${f}/exitCode"
          echo "1" > ${f}
        done
    - name: echo
      image: ubuntu
      script: |
        # We now think step-pass failed, even though the
        # container status is 0.
        cat $(steps.step-pass.exitCode.path)

Additional Info

Kubernetes version:

Output of kubectl version:

Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"21", GitVersion:"v1.21.1", GitCommit:"5e58841cce77d4bc13713ad2b91fa0d961e69192", GitTreeState:"clean", BuildDate:"2021-05-21T23:01:33Z", GoVersion:"go1.16.4", Compiler:"gc", Platform:"linux/amd64"}

Tekton Pipeline version:

Client version: 0.20.0
Pipeline version: devel (78d6d75473ea00118fa1e07cbf6b8e97439689ad

This issue is similar in nature to #4160. Credit for discovering this + the ideas for fixes goes to @06kellyjac and the folks at https://github.com/controlplaneio/ !

Proposal

Mount /tekton/downward as read-only for consistency (even though it's already readonly)
Move step post_files to a separate directory (name TBD), make /tekton/tools readonly. (/tekton/tools is not bound by the Tekton deprecation policy.
(up for discussion - see below) Introduce sidecar to manage step lifecycle outside the control of step containers?

Open Questions

An interesting question that popped up from this is how much trust should steps within a Task should have with each other, particularly if the Task is made up of different container images being pulled in from different sources.

On one hand, we typically refer to a Task as the single unit of work for Tekton, similar to a Pod. If users require greater isolation guarantees between steps, they could convert each step into a Task within a Pipeline (and with things like #3476 / @bobcatfish's Pipeline in a Pod proposal - the differences between these might be minimal).

On the other hand, because Tasks can be composed of multiple images, it can make it harder to guarantee that each container will not disrupt each other in unexpected ways. Introducing a sidecar that would manage step lifecycles could help this out. High level proposal - The sidecar could rely on step container status to determine whether a step is done, then handle the post_file write (and be the only container with write access) to kick off the next step. The post_files could then be mounted as read-only with steps. This obviously comes with overhead of maintaining the sidecar, which increases the complexity (and possibly latency) of Tasks.

I'm on the fence here - would love to hear other's thoughts/opinions! If there's interest we can look into drafting a more detailed design, or at the very least we can document expectations/best practices for users who need more isolation.

/cc @bobcatfish - particularly interested in your thoughts here, since I think there's a lot of overlap here with Pipeline-in-a-Pod.

The text was updated successfully, but these errors were encountered:

vdemeester · 2021-09-15T07:40:13Z

An interesting question that popped up from this is how much trust should steps within a Task should have with each other, particularly if the Task is made up of different container images being pulled in from different sources.

If all "internals" of tekton are delicately crafted, aka everything is read-only except few allowed things (entrypoint hack, workspace), I think it's fine. And I feel the sidecar strategy would bring way too much complexity for it.

This means we should make sure each step has a definite space to write stuff necessary for the "entrypoint hack" (waitFile, errFile, …) and this would be the only part of our "internal" plumbing that would be writable by a given Step.

06kellyjac · 2021-09-15T08:10:59Z

This means we should make sure each step has a definite space to write stuff necessary for the "entrypoint hack" (waitFile, errFile, …) and this would be the only part of our "internal" plumbing that would be writable by a given Step.

Yeah waitfile/postfile should use separate volume mounts that are only writable for their relevant step, or at the very least be moved away from the tools dir.

and this would be the only part of our "internal" plumbing that would be writable by a given Step.

I'm not to familiar with the new tekton-internal-debug-scripts but do they currently need to be writable by a given step? Again I haven't looked into them much but when I set it to readwrite for init and readonly for the steps I had some errors...

Currently deleting the entrypoint binary within tools and replacing it is possible but since the process has already started in all steps/containers it's not affected. If you were able to restart a step/container within the task/pod then it'd execute the new entrypoint though.
Also it becomes a concern if more tools are shipped in the tool directory as they would probably be executed adhoc rather than in advance and for the whole run of the step/container

Also pretty unrelated to this issue but we could also get into some interesting discussions around workspaces being modified across multiple tasks in one of the existing hermetic build issues/discussions

vdemeester · 2021-09-15T08:36:22Z

I'm not to familiar with the new tekton-internal-debug-scripts but do they currently need to be writable by a given step? Again I haven't looked into them much but when I set it to readwrite for init and readonly for the steps I had some errors...

No they should be read-only.

Currently deleting the entrypoint binary within tools and replacing it is possible but since the process has already started in all steps/containers it's not affected. If you were able to restart a step/container within the task/pod then it'd execute the new entrypoint though.
Also it becomes a concern if more tools are shipped in the tool directory as they would probably be executed adhoc rather than in advance and for the whole run of the step/container

Indeed, /tekton/tools shouldn't be writable by any user step either.

Also pretty unrelated to this issue but we could also get into some interesting discussions around workspaces being modified across multiple tasks in one of the existing hermetic build issues/discussions

Indeed, cc @dlorenc @mattmoor 🙃

mattmoor · 2021-09-15T17:22:56Z

discussions around workspaces being modified across multiple tasks

Yeah @imjasonh flagged the possibility that the underlying persistent storage for workspace could be modified between the execution of two tasks in a pipeline, which is an interesting attack vector, and which makes me not trust workspace without some sort of verifiability.

To 🤓 out a little (digression warning!)

One idea I've been toying with (instead of persistent volumes) for workspace would be to use a technique like mattmoor/kontext to snapshot the working directory as an OCI image (which could be signed with cosign). The resulting image's digest could be a result, which is plumbed through to the next step as it's starting context (which can be verified).

While this is heavy from the standpoint of saving/restoring state between steps, you end up with an extremely strong trust story because not only would the context be passed using content-addressable hashes, but they could be signed with a key that verifies the provenance of the workload that produced it.

</digression>

imjasonh · 2021-09-15T17:26:48Z

We could also just have Tekton hash the contents of a workspace and report that in the TaskRun's status, and check that it matches before running the next TaskRun that uses it. That saves having to store the data in two places, and has roughly the same result. If you're using Tekton Chains (and you should 😎) then those hashes would be part of the payload that gets signed and added to Rekor.

mattmoor · 2021-09-15T17:29:58Z

@imjasonh Yep, absolutely. I'm told this would be a good thing to look at. There are some other fun benefits of the other way, but we can discuss those separately if you are so inclined because it's a huge digression here 🤣

wlynch · 2021-09-24T20:43:48Z

/assign me

(i've been working on this, just forgot to assign myself)

tekton-robot · 2021-09-24T20:43:49Z

@wlynch: GitHub didn't allow me to assign the following users: me.

Note that only tektoncd members, repo collaborators and people who have commented on this issue/PR can be assigned. Additionally, issues/PRs can only have 10 assignees at the same time.
For more information please see the contributor guide

In response to this:

/assign me

(i've been working on this, just forgot to assign myself)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

wlynch · 2021-09-24T20:43:55Z

/assign wlynch

squee1945 · 2021-09-24T20:47:28Z

Even if I can only write to my own step output, could I fake out a termination message that would cause the next step to start and use my faked out results as parameters?

wlynch · 2021-09-24T21:28:35Z

Even if I can only write to my own step output, could I fake out a termination message that would cause the next step to start

Yes. The next step here to lock this down would be to use the sidecar approach or some sort of chroot jail to isolate the user container from the entrypoint control files. This change is a bit more involved, so holding off for a more detailed design while we tackle the low hanging fruit first.

and use my faked out results as parameters?

So this is technically something you could do even without the post_file issue - a bad step could just write/overwrite results produced by other steps (results are task-wide, and not something that are produced by particular steps). Between steps, even though there is no param passing, you could have something similar if step 1 produces a file that step 2 depends on. This is not something we particularly guard against, since we don't make isolation guarantees between steps. The workaround here would be to split the target steps into separate Tasks if more isolation is needed.

ghost · 2021-11-01T19:07:37Z

@wlynch am wondering if this issue can be considered done now or whether we should hold it open for the sidecar-related work. What do you think?

06kellyjac · 2021-11-01T19:17:16Z

I think this should stay open to track the progress of the improvements & eventual sidecar-related work

tekton-robot · 2022-02-27T18:09:28Z

Issues go stale after 90d of inactivity.
Mark the issue as fresh with /remove-lifecycle stale with a justification.
Stale issues rot after an additional 30d of inactivity and eventually close.
If this issue is safe to close now please do so with /close with a justification.
If this issue should be exempted, mark the issue as frozen with /lifecycle frozen with a justification.

/lifecycle stale

Send feedback to tektoncd/plumbing.

vdemeester · 2022-02-28T08:57:13Z

/remove-lifecycle stale

wlynch · 2022-03-10T19:46:44Z

Haven't updated this in awhile, but want to give an update.

As of #4352 we're now isolating step directories so that they are only writable by their own step, and read-only to all other steps. This addresses most problems with steps interfering with each other. We are considering results out of scope because these are a shared Task resource, not something owned by a particular step.

What this doesn't prevent is a misbehaving step from manipulating its own step files and doing things like tricking the Tekton controller into thinking it is complete early. While we could try and lock this down further, it's unclear how much additional benefit we would get from this given that if a step is already misbehaving, it could simply run and return false results. Instead, our plan is to shore up this area with things like Trusted Tasks so that we can enforce policies on Tasks similar to admission controllers for container images.

Because of this I'm going to close out this issue, but feel free to reach out if you have an comments / questions / concerns!

wlynch added the kind/bug Categorizes issue or PR as related to a bug. label Sep 14, 2021

This was referenced Sep 16, 2021

Mark Downward API VolumeMount as readonly. #4236

Merged

Mount entrypoint volume as read-only. #4242

Merged

tekton-robot assigned wlynch Sep 24, 2021

wlynch mentioned this issue Oct 5, 2021

Split /tekton/run directories into separate volumes. #4278

Merged

5 tasks

bobcatfish added the priority/important-soon Must be staffed and worked on either currently, or very soon, ideally in time for the next release. label Oct 18, 2021

wlynch mentioned this issue Nov 9, 2021

Lockdown /tekton/step folders to their own steps. #4352

Merged

5 tasks

tekton-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 27, 2022

tekton-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 28, 2022

wlynch closed this as completed Mar 10, 2022

bobcatfish mentioned this issue Nov 30, 2022

TEP-0127: Larger Results via Sidecar Logs tektoncd/community#887

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Restrict step volumes as read-only #4227

Restrict step volumes as read-only #4227

wlynch commented Sep 14, 2021 •

edited

Loading

vdemeester commented Sep 15, 2021

06kellyjac commented Sep 15, 2021

vdemeester commented Sep 15, 2021

mattmoor commented Sep 15, 2021

imjasonh commented Sep 15, 2021

mattmoor commented Sep 15, 2021

wlynch commented Sep 24, 2021

tekton-robot commented Sep 24, 2021

wlynch commented Sep 24, 2021

squee1945 commented Sep 24, 2021

wlynch commented Sep 24, 2021 •

edited

Loading

ghost commented Nov 1, 2021

06kellyjac commented Nov 1, 2021

tekton-robot commented Feb 27, 2022

vdemeester commented Feb 28, 2022

wlynch commented Mar 10, 2022 •

edited

Loading

Restrict step volumes as read-only #4227

Restrict step volumes as read-only #4227

Comments

wlynch commented Sep 14, 2021 • edited Loading

Expected Behavior

Actual Behavior

Steps to Reproduce the Problem

/tekton/tools

/tekton/steps

Additional Info

Proposal

Open Questions

vdemeester commented Sep 15, 2021

06kellyjac commented Sep 15, 2021

vdemeester commented Sep 15, 2021

mattmoor commented Sep 15, 2021

imjasonh commented Sep 15, 2021

mattmoor commented Sep 15, 2021

wlynch commented Sep 24, 2021

tekton-robot commented Sep 24, 2021

wlynch commented Sep 24, 2021

squee1945 commented Sep 24, 2021

wlynch commented Sep 24, 2021 • edited Loading

ghost commented Nov 1, 2021

06kellyjac commented Nov 1, 2021

tekton-robot commented Feb 27, 2022

vdemeester commented Feb 28, 2022

wlynch commented Mar 10, 2022 • edited Loading

wlynch commented Sep 14, 2021 •

edited

Loading

wlynch commented Sep 24, 2021 •

edited

Loading

wlynch commented Mar 10, 2022 •

edited

Loading