Prevent workflows code from exploiting `pod patch` permission to change non-workflow pods #3961

Asaf-m · 2020-09-08T06:39:25Z

Summary

What change needs making?

Find a way to prevent malicious code from exploiting patch permission of the minimum RBAC privileges.

Details:

The minimum RBAC privileges of workflow includes patch pods permission, this seems to be a potential security issue.
patch permission allows to do actions like kubectl patch pod valid-pod -type='json' -p='[{"op": "replace", "path": "/spec/containers/0/image", "value":"new image"}]', meaning, it allows to change all the images in the namespace, in other words, bring the namespace down.

The problem is even grater since role is set on pod and not on container, so not only that argo's wait container is getting this role, but also user's main container` is getting it. This means any malicious code creeped into the pod can exploit this role.

I'm not sure how this can be done. I can say I tried the solution suggested here and it worked. But it's a big mess to make it work with kustomize, so I wish for a more elegant solution.

Use Cases

Always.

Message from the maintainers:

Impacted by this bug? Give it a 👍. We prioritise the issues with the most 👍.

The text was updated successfully, but these errors were encountered:

simster7 · 2020-09-08T14:49:05Z

I'll bring this up with the team

Asaf-m · 2020-09-22T15:02:14Z

@simster7 Any update?

appellod · 2020-12-28T07:00:02Z

I also have some concerns with my Workflow requiring Pod patch permissions. I am developing a system that allows external users to run arbitrary workflows within my system. I have each user segregated into their own Namespace, so it wouldn't be absolutely devastating if they achieved Pod patch access, but it is still something that could potentially wreak havoc on my system.

Currently, I am setting automountServiceAccountToken: false with the executor.serviceAccountName field set. Does this do a workaround similar to the solution @Asaf-m mentioned in his original post? Is there a way the main container can then extract that token from the Argo sidecar and gain unauthorized access?

alexec · 2021-01-18T19:55:09Z

I think we could address this simply another way. If the pod can patch the workflow, then we could directly update the status.

To avoid conflicts, and to work with node offloading. We would need a new field to store the data in.

Not sure how this scales with many patches. So we could introduce another CRD as discussed here:

https://docs.google.com/document/d/18hg6PTejp1knp5QTaCwP4j4gUTsRu4KDeKHs-4l9shs/edit

This is not a popular issue.

alexec · 2021-02-25T16:28:23Z

@jessesuen

https://kubernetes.io/docs/tasks/debug-application-cluster/determine-reason-pod-failure/#customizing-the-termination-message

The termination message is intended to be brief final status, such as an assertion failure message. The kubelet truncates messages that are longer than 4096 bytes. The total message length across all containers will be limited to 12KiB. The default termination message path is /dev/termination-log. You cannot set the termination message path after a Pod is launched

alexec · 2021-02-25T16:29:27Z

Moreover, users can set the terminationMessagePolicy field of a Container for further customization. This field defaults to "File" which means the termination messages are retrieved only from the termination message file. By setting the terminationMessagePolicy to "FallbackToLogsOnError", you can tell Kubernetes to use the last chunk of container log output if the termination message file is empty and the container exited with an error. The log output is limited to 2048 bytes or 80 lines, whichever is smaller.

Ohhhh.... interesting!

alexec · 2021-02-25T22:13:58Z

In v3.1 you will be able run workflows without pod patch, so long as they do not have outputs.

phelinor · 2021-07-20T21:00:57Z

In v3.1 you will be able run workflows without pod patch, so long as they do not have outputs.

@alexec I can see that v3.1.2 is already available but this issue is still Open and I can still see the Patch verb in the installation file https://github.com/argoproj/argo-workflows/blob/master/manifests/install.yaml

Could please confirm that we are able to run workflows without pod patch?

alexec · 2021-07-26T15:49:36Z

With the introduction of TaskSet we now have a way to replace pod patch with taskset patch.

alexec · 2022-02-22T18:09:19Z

We should test some attacks to verify this is true. Much of the pod spec is immutable, is it really true that you can change the image or args?

alexec · 2022-02-25T03:25:14Z

Notes from PoC:

We can use a seperate CRD to report back results from the wait container to the controller.
We can get race condition between pod informer and task-result informer.
We can't use taskset, it is not big enough for outputs.

SebastianGoeb · 2022-02-25T08:15:58Z

In v3.1 you will be able run workflows without pod patch, so long as they do not have outputs.

Does this refer to global workflow outputs or step outputs? We use step outputs to communicate things between steps, and I don't see how pod/patch is required for that. What's stopping the controller from just creating the workflow pod with the required volumes for passing outputs from main to wait in the first place, and then leaving it like that? Is this really something the sidecar must do?

alexec · 2022-02-25T15:27:00Z

Correct. Outputs are patched onto the pod using annotations. Not outputs, no need for annotations. You could pass outputs using a volume mounted to all pods in the workflow. This volume would need to be readable from the controller. Not sure if that’s possible.

@jessesuen I think we only need patch for result and logs, exit code is found by controller. We know that we need these in the controller. Maybe we should look at pods/log in more detail.

alexec · 2022-02-25T16:10:35Z

Reviewing pods/log option:

The current solution the executor capturing logs scales. Consider a 1000 node workflow, each pod captures its own logs. If this was moved to the controller, it would have to do much more work that it currently does. The controller is the wrong place to do heavy lifting as it creates a single point of failure.

On top of this, we don't know where to save the main.log (or any artifact) in the controller, because it does not have get secrets permission (which it needs).

We could write the outputs to the logs, rather than as an annotation, or to the container termination-log, but these all have different problems.

Who else could do this? The agent, but it's just moving the problem.

…oproj#3961 Signed-off-by: Alex Collins <alex_collins@intuit.com>

… (#8000) Signed-off-by: Alex Collins <alex_collins@intuit.com>

Asaf-m added the type/feature Feature request label Sep 8, 2020

alexec added the type/security Security related label Sep 22, 2020

alexec added epic/controller-enhancements and removed epic/controller-enhancements labels Sep 29, 2020

alexec mentioned this issue Jan 18, 2021

Container sequences #2551

Closed

alexec linked a pull request Feb 25, 2021 that will close this issue

feat: agent PoC #5155

Closed

alexec added the area/executor label Feb 7, 2022

alexec mentioned this issue Feb 22, 2022

Improve executor default security posture #7964

Open

5 tasks

alexec changed the title ~~Prevent workflow RBAC patch permission exploitation~~ Prevent workflows code from exploiting pod patch permission to change non-workflow pods Feb 22, 2022

This was referenced Feb 22, 2022

Remove automountServiceAccountToken for main containers #7970

Open

feat: Use different service account for the init/wait and main containers. #7973

Closed

alexec mentioned this issue Feb 25, 2022

Move executor container rbac operations to workflow controller #5067

Closed

alexec mentioned this issue Feb 25, 2022

feat: Replace patch pod with create workflowtaskresult. Fixes #3961 #8000

Merged

alexec added a commit to alexec/argo-workflows that referenced this issue Mar 1, 2022

feat: Replace patch pod with create workflowtaskresult. Fixes arg…

c0269c5

…oproj#3961 Signed-off-by: Alex Collins <alex_collins@intuit.com>

alexec closed this as completed in #8000 Mar 2, 2022

alexec added a commit that referenced this issue Mar 2, 2022

feat: Replace patch pod with create workflowtaskresult. Fixes #3961…

662a729

… (#8000) Signed-off-by: Alex Collins <alex_collins@intuit.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prevent workflows code from exploiting `pod patch` permission to change non-workflow pods #3961

Prevent workflows code from exploiting `pod patch` permission to change non-workflow pods #3961

Asaf-m commented Sep 8, 2020

simster7 commented Sep 8, 2020

Asaf-m commented Sep 22, 2020

appellod commented Dec 28, 2020

alexec commented Jan 18, 2021

alexec commented Feb 25, 2021

alexec commented Feb 25, 2021

alexec commented Feb 25, 2021

phelinor commented Jul 20, 2021

alexec commented Jul 26, 2021

alexec commented Feb 22, 2022

alexec commented Feb 25, 2022 •

edited

Loading

SebastianGoeb commented Feb 25, 2022

alexec commented Feb 25, 2022

alexec commented Feb 25, 2022

Prevent workflows code from exploiting pod patch permission to change non-workflow pods #3961

Prevent workflows code from exploiting pod patch permission to change non-workflow pods #3961

Comments

Asaf-m commented Sep 8, 2020

Summary

Use Cases

simster7 commented Sep 8, 2020

Asaf-m commented Sep 22, 2020

appellod commented Dec 28, 2020

alexec commented Jan 18, 2021

alexec commented Feb 25, 2021

alexec commented Feb 25, 2021

alexec commented Feb 25, 2021

phelinor commented Jul 20, 2021

alexec commented Jul 26, 2021

alexec commented Feb 22, 2022

alexec commented Feb 25, 2022 • edited Loading

SebastianGoeb commented Feb 25, 2022

alexec commented Feb 25, 2022

alexec commented Feb 25, 2022

Prevent workflows code from exploiting `pod patch` permission to change non-workflow pods #3961

Prevent workflows code from exploiting `pod patch` permission to change non-workflow pods #3961

alexec commented Feb 25, 2022 •

edited

Loading