Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TEP-0035: add tep for further documenting how-to's around policy, authentication, authorization, RBAC, and Tekton #152

Merged
merged 1 commit into from
Dec 17, 2020

Conversation

gabemontero
Copy link
Contributor

@tekton-robot tekton-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jul 15, 2020
"Security first" is a mantra often heard amongst Kubernetes cluster administrators. The fact that some form of
security enforcement is not on by default very possibly will come under scrutiny. This proposal for now does not go that
far, in the interest of minimizing disruption of the current adoption of Tekton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Kubernetes in general has suffered from a lot of security issues caused by trying to add features after the GA release. When these security improvements would cause breakages to existing use-cases, they have remained off by default.

I understand this approach, but given we're still pre-GA it might be worth being a bit more aggressive in rolling out some of these features. WDYT?

We would still need to evaluate each change on a case-by-case basis, but I don't think we need to necessarily always prioritize minimizing disruption.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, pre-GA is definitely the time to make disruptive changes!

In reality I assumed resistance to that so proposed something less potentially controversial. I do think allowing
to turn off on a per namespace basis could assist in migrating to the new norms.

When I push back against proposals, I am motivated by a desire to make sure we fully explore the problem(s) we're trying to solve and be very purposeful about what we implement.

I don't mind disrupting the norms we've established so far, but I want to make sure that we want to before we do it.

@gabemontero
Copy link
Contributor Author

gabemontero commented Jul 17, 2020 via email

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for putting this together @gabemontero !

I am very hesitant about going ahead with this as described and have 2 alternatives to suggest:

  • Having fine grained control over what can be run and what can't feels to me like potentially it's own project, and instead of adding the specific restrictions you suggest here, I'd rather see a separate project tackle this than build it into the tekton controller
  • Storing and referencing Tasks and ClusterTasks in a k8s cluster has turned out to be a less than desirable solution: even the existence of ClusterTasks points toward that. We're (slowly) moving toward storing Tasks in OCI image registries (and probably git repos as well). In that world, users would need to provide credentials to access the registries/repos. Would we still need this feature then? I do think this is the direction we're going, so I'm not sure it's worth focusing our efforts here - if someone really wanted to control access to Tasks, they could then do so by requiring Task references always be via OCI registry for example (which would be another thing to enforce!)

"Security first" is a mantra often heard amongst Kubernetes cluster administrators. The fact that some form of
security enforcement is not on by default very possibly will come under scrutiny. This proposal for now does not go that
far, in the interest of minimizing disruption of the current adoption of Tekton.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, pre-GA is definitely the time to make disruptive changes!

In reality I assumed resistance to that so proposed something less potentially controversial. I do think allowing
to turn off on a per namespace basis could assist in migrating to the new norms.

When I push back against proposals, I am motivated by a desire to make sure we fully explore the problem(s) we're trying to solve and be very purposeful about what we implement.

I don't mind disrupting the norms we've established so far, but I want to make sure that we want to before we do it.


Among these various scenarios, there has also been discussion around introducing OPA (either via the admission webhook
for Kubernetes provided by the OPA org, or some other form of integration) into Tekton. This proposal is not taking
on how OPA would be integrated / leveraged by Tekton.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you elaborate on why using a tool like OPA isn't under consideration for evaluating this problem? I think this could be a really promising solution.

If what we're trying to do is restrict who has access to what Tasks / ClusterTasks, it feels like a small step from "service account B shouldn't be able to access ClusterTasks" to "service account B should only be able to access ClusterTasks A, B and C".

If we direct users to a solution like OPA for this use case (maybe even building a Tekton admission controller to help with this) we can offer a lot more flexibility.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would contend OPA was considered @bobcatfish since I listed it here as an alternative.

I'll respond by adjusting your question slightly. Why was it not listed as the preferred solution? My thoughts there were:

a) simplicity and solving this particular scenario
b) importance of solving this scenario sooner rather than later for users I see consuming tekton
c) uncertainly if we would ship OPA integration with only predefined rego policies for our scenarios of concern, vs. the potentially slippery slop of opening up any OPA integration to the user defining their own rego policies ... they are very flexible and powerful, but there will be an educational costs learning how to properly set up rego policies ... so an incremental step here

Certainly OPA has come up in several scenario discussions besides this one. Have there been plans already that I am unaware at the board level around introducing OPA to tekton in general?

Something of that nature feels like a separate project in and of itself as well.

And do we have folks in the tekton community already comfortable enough with OPA, gatekeeper, etc. to take on that effort?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly OPA has come up in several scenario discussions besides this one. Have there been plans already that I am unaware at the board level around introducing OPA to tekton in general?

I believe that some folks at RedHat are looking into integrations between Tekton + OPA - @vdemeester might know more, not sure who else to point you toward ( @dlorenc maybe? )

b) importance of solving this scenario sooner rather than later for users I see consuming tekton

I don't want to rush something in - I might need to understand better why we need something quickly here. My perspective is that via webhooks, users have the power to add whatever additional requirements they want, i.e. tekton doesn't need to add support for this in order for users to have it (which is one of the biggest wins of building on kubernetes in my opinion!), i.e. i dont see the hurry to get this functionality in.

I basically don't want to compromise the quality of our solution b/c we want something quickly:
a) if we think this is important enough that tekton needs to support it out of the box, id like to treat this with the importance it needs and approach it carefully, and potentially even encourage people to stop working on features to work on this instead
b) if this needs to be done quickly and we are planning to devote the time to it that it needs, whatever we do now quickly should be designed to be removed

I don't see the need for (a) yet, but I might need to better understand the use cases that are motivating this.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Certainly OPA has come up in several scenario discussions besides this one. Have there been plans already that I am unaware at the board level around introducing OPA to tekton in general?

I believe that some folks at RedHat are looking into integrations between Tekton + OPA - @vdemeester might know

yep I'm familiar with that piece (and am involved with it as well behind the scenes) ... but I was curious what existed outside of Red Hat, given the amount of OPA references I see here and elsewhere under the broader tektoncd umbrella

thanks

more, not sure who else to point you toward ( @dlorenc maybe? )

yeah let's see if @dlorenc chimes in

b) importance of solving this scenario sooner rather than later for users I see consuming tekton

I don't want to rush something in - I might need to understand better why we need something quickly here. My perspective is that via webhooks, users have the power to add whatever additional requirements they want, i.e. tekton doesn't need to add support for this in order for users to have it (which is one of the biggest wins of building on kubernetes in my opinion!), i.e. i dont see the hurry to get this functionality in.

I basically don't want to compromise the quality of our solution b/c we want something quickly:
a) if we think this is important enough that tekton needs to support it out of the box, id like to treat this with the importance it needs and approach it carefully, and potentially even encourage people to stop working on features to work on this instead
b) if this needs to be done quickly and we are planning to devote the time to it that it needs, whatever we do now quickly should be designed to be removed

I don't see the need for (a) yet, but I might need to better understand the use cases that are motivating this.


A Kubernetes RBAC Subject Access Review (SAR) check will be executed to see if the ServiceAccount in question
is part of a Role/RoleBinding or ClusterRole/ClusterRoleBinding that allows access to the referenced
ClusterTask, Task, or PipelineTask. If access is denied, the admission webhook will prevent creation/update
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

s/PiplineTask/Pipeline? (a PipelineTask isnt a stand alone thing that can be referenced)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep that is what I meant ... will update - thanks!

- "all" is specified, preceded/followed by a space, factoring in beginning and end of the value of the environment
variable
- or a space delimited list of names corresponding to namespaces is provide (where "all" could be one of those
entries as well, hence turning the feature on for every namespace)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What would you think about creating a separate, initially experimental, project around a Tekton admission webhook for apply security controls? We could allow users to integrate that webhook with systems like OPA.

This way the existing Tekton controller and webhook can be concerned with just pipeline execution, and if users want to enforce their custom security requirements, they can install an optional controller to do so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK that is the clarification I asked for in #152 (comment) :-) ... thanks

Certainly that is possible. The pros/cons are pretty typical .. at least some of them

  • pro: separate of concerns
  • pro: no on/off switch for tektoncd/pipelines
  • con: more work :-)
  • con: some more pod sprawl
  • it being separate implies off by default, where in these discussion we talked about on by default ... but when also considering the moving parts around OCI and whether Task/ClusterTask/Pipeline get replaced also muddies those waters.

But I could see how one might prefer those pros in relation to the cons.

That said, enablement of enforcement on a per namespace basis I think would still be useful config. It corresponds to what sig-auth folks have noted on the problem with PSP being an all or nothing thing. Existing tekton users could benefit from enabling on a per namespace and migrating in a more granular fashion.

of required permissions to get/read namespace scoped resources like Task from a TaskRun or Task/Pipeline from a
PipelineRun is possible, where all objects are in the same namespace, using the ServiceAccount (specified or
default) as the actor in the permission check, and where enablement can be done on a global basis or per
namespace basis, and where enablement can vary for cluster scoped resources vs. namespace scoped resources.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you update these user stories to explain more about why we have a situation where a user is able to create TaskRuns and PipelineRuns, but we want to make it so that they cannot access the referenced Task / Pipeline / ClusterTask unless the service account is modified?

I understand in principle what this proposal is suggesting but I don't understand why: if a user can create a PipelineRun / TaskRun, they can embed the Task / Pipeline spec if needed, i.e. they can ultimately run anything they want.

if this is something we want to prevent also, I think this is more motivation to tackle these more fine grained permissions as a whole, and leverage solutions like OPA that are being created for this purpose

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sure I can add an example. Will be part of next push.

@dlorenc
Copy link
Contributor

dlorenc commented Jul 18, 2020

Agree with all of that! This commentary was more about the general principle of not breaking things, even if it would result in a security improvement.

Here's the main example that came to mind (I just realized is still an open issue, 4 years later!): kubernetes/enhancements#135

TLDR: k8s went GA without a default seccomp policy. It's been basically impossible to add without breaking lots of very minor things, even though the security trade-off is almost definitely better in the short and long term.

PodSecurityPolicy admission plugin, from timestampls 19:11 to 21:14, you'll hear

- "Granting permissions to the requesting user is intuitive, but it breaks controllers"
- "Dual mode weakens security ... you cannot have a privileged controller create pods on behalf of a user"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for the pointer to the video here! I agree that using the user permissions is the intuitive choice, so I was kind of shocked to hear that it isn't recommended.

I just watched that section a few times, and I'm not convinced it actually applies in the case of Tekton. This might be worth a deeper discussion with those sig-auth members to clarify.

A couple big difference:

  • one problem mentioned is that controllers break this model. The typical entity creating a pod is not a user, but another k8s controller (they used the deployment example). This means the creating user is almost always cluster-admin. In the case of Tekton, we don't create intermediate objects, we create pods directly.
  • the other example is around container-breakouts using privileged containers and psp. I don't think that's really relevant here, or at least I can't see a connection.

In short, I'm (maybe naively) still hopeful that using the credentials of the requesting user, rather than the controller, would make this simpler and work for our case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'm not sure the intermediate steps needed or not needed to ultimately result in Pod creation matters. Ultimately, we are creating Pods.

On Steps/Containers escalating in any fashion (given we exposed corev1.PodTemplate, corev1.Container) based on what either the requesting user (if we went down that path) or SA is allowed to do vs. what the Tekton controller(s) can do is pretty much exactly the point in my opinion.

All that said, I do agree that perhaps more direct conversations with folks in sig-auth, vs. our interpretations of their kube con discussions, would be ideal. Common conversations with as many of the stakeholders and SMEs is always a better thing. The person from my extended team who is on sig-auth is away from the office today, but I'll figure out when I can get some more cycles from him and report back. Presumably we would want to attend an upcoming sig-auth meeting and present. Certainly if you have thoughts / contacts / experience engaging with sig-auth @dlorenc let me know.

Certainly switching to include a requesting user is not a big change in the prototype I've posted (in fact I originally had a version that added requesting users to TaskRun and PipelineRun until my aforementioned team-mate on sig-auth talked me away from that).

Lastly, @skaegi and I have had some conversations on this (and he commented some on this topic in my precursor design document). I believe my discussions with him (following my initial conversion to "use SAs, not requesting users") along with his personal experiences in his service has lead to a "use SAs, not requesting users" appraoch.

@skaegi - any chance you have cycles to chime in here?

thanks

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that regardless of what sig-auth says, I think it's important that the choices we make be grounded in specific use cases and understanding what scenarios we are envisioning as far as what happens without the functionality being proposed here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

for sure ... I certainly was not trying to imply otherwise

@gabemontero
Copy link
Contributor Author

OK I'm back from bereavement/PTO and have caught up enough that I can start focusing on comments here. Thanks.

@gabemontero
Copy link
Contributor Author

Agree with all of that! This commentary was more about the general principle of not breaking things, even if it would result in a security improvement.

Here's the main example that came to mind (I just realized is still an open issue, 4 years later!): kubernetes/enhancements#135

TLDR: k8s went GA without a default seccomp policy. It's been basically impossible to add without breaking lots of very minor things, even though the security trade-off is almost definitely better in the short and long term.

OK then. I'll add changing the default to on/potentially breaking. We can the continue discussion the need to allow disablement around global/per namespace scopes once I have that change up.

thanks

@gabemontero
Copy link
Contributor Author

Thanks for putting this together @gabemontero !

I am very hesitant about going ahead with this as described and have 2 alternatives to suggest:

* Having fine grained control over what can be run and what can't feels to me like potentially it's own project, and instead of adding the specific restrictions you suggest here, I'd rather see a separate project tackle this than build it into the tekton controller

So to make sure I understand @bobcatfish , you mean a new https://github.com/tektoncd/authorization or some such that would provide a way to install the additional admission webhooks, et. al. vs. installing the webhook at part of https://github.com/tektoncd/pipelines

* Storing and referencing Tasks and ClusterTasks in a k8s cluster has turned out to be a less than desirable solution: even the existence of ClusterTasks points toward that. We're (slowly) moving toward storing Tasks in OCI image registries (and probably git repos as well). In that world, users would need to provide credentials to access the registries/repos. Would we still need this feature then? I do think this is the direction we're going, so I'm not sure it's worth focusing our efforts here - if someone really wanted to control access to Tasks, they could then do so by requiring Task references always be via OCI registry for example (which would be another thing to enforce!)

Interesting, I was aware of the OCI stuff, but had missed that it would serve as a replacement for Task/ClusterTask/etc. CRDs. Only that it would be another means of injecting such shared content.

I'm not seeing that call out of replacement in https://github.com/tektoncd/community/pull/137/files either. Am I simply missing it, or is it
a) either something that should be called out there
b) or was it called out somewhere else
c) or is it commonly known it will be called out in a subsequent TEP

That aside, yeah that path can certainly change things. But for me at least, roadmaps for all this are my next questions.

Do we plan to GA pipelines in a form were Tasks/ClusterTasks etc. still exist as k8s objects/CRDs, or will the evolution you note be a gating factor for GA, and those CRDs will no longer be in play when the project reaches GA status?

I think the need for some form of the authorization discussed here is needed if Tasks/ClusterTasks, etc. make it to GA.

If they will not, then I agree, these authorization notions around object cross references seem mitigated.

@bobcatfish
Copy link
Contributor

I'm not seeing that call out of replacement in https://github.com/tektoncd/community/pull/137/files either

No one has officially proposed completely replacing in cluster Pipelines and Tasks with this. I think we'd want to get some experience with the feature before going that far, however, if using this approach and the authorization requirements that this implies (i.e. access to pull from the image registry) works for your use cases, I wonder if this could solve your problem? (if your users were limited to pulling from the image registry and couldnt refer to Tasks and Pipelines in cluster)

Do we plan to GA pipelines in a form were Tasks/ClusterTasks etc. still exist as k8s objects/CRDs, or will the evolution you note be a gating factor for GA, and those CRDs will no longer be in play when the project reaches GA status?

this doc has been collecting input on what we need for GA

I think the need for some form of the authorization discussed here is needed if Tasks/ClusterTasks, etc. make it to GA.

I am not yet convinced: I really need to understand the use cases that we're trying to avoid. As far as I'm concerned, if a user can create a TaskRun, they can create a pod. Being able to embed a Task spec in a TaskRun means they can make that TaskRun (and pod) do basically anything they want. I don't understand what locking down access to ClusterTasks provides.

I can see wanting to express fine grained control over what is and isn't allowed in a Tekton cluster, but I think that makes sense as a separate project vs. one off features added to the Tekton controller. Like you said this is a lot more work, but if it's important enough, I'd rather we put in the work. If it's not important enough, I don't want to put in one off work arounds if we can avoid it.

@gabemontero
Copy link
Contributor Author

I'm not seeing that call out of replacement in https://github.com/tektoncd/community/pull/137/files either

No one has officially proposed completely replacing in cluster Pipelines and Tasks with this. I think we'd want to get some experience with the feature before going that far, however, if using this approach and the authorization requirements that this implies (i.e. access to pull from the image registry) works for your use cases, I wonder if this could solve your problem? (if your users were limited to pulling from the image registry and couldnt refer to Tasks and Pipelines in cluster)

Do we plan to GA pipelines in a form were Tasks/ClusterTasks etc. still exist as k8s objects/CRDs, or will the evolution you note be a gating factor for GA, and those CRDs will no longer be in play when the project reaches GA status?

this doc has been collecting input on what we need for GA

I think the need for some form of the authorization discussed here is needed if Tasks/ClusterTasks, etc. make it to GA.

I am not yet convinced: I really need to understand the use cases that we're trying to avoid. As far as I'm concerned, if a user can create a TaskRun, they can create a pod. Being able to embed a Task spec in a TaskRun means they can make that TaskRun (and pod) do basically anything they want. I don't understand what locking down access to ClusterTasks provides.

Understood. As I noted elsewhere in the responses, I'll look to providing a more detailed scenario given the lack of consensus here. But I'll be balancing that with spending cycles on other elements we've discussed here (possibly more sig-auth engagement, and moving up OPA based efforts).

It is certainly conceivable that different customer sets have different expectations.

I can see wanting to express fine grained control over what is and isn't allowed in a Tekton cluster, but I think that makes sense as a separate project vs. one off features added to the Tekton controller. Like you said this is a lot more work, but if it's important enough, I'd rather we put in the work. If it's not important enough, I don't want to put in one off work arounds if we can avoid it.

@gabemontero
Copy link
Contributor Author

I'm not seeing that call out of replacement in https://github.com/tektoncd/community/pull/137/files either

No one has officially proposed completely replacing in cluster Pipelines and Tasks with this. I think we'd want to get some experience with the feature before going that far, however, if using this approach and the authorization requirements that this implies (i.e. access to pull from the image registry) works for your use cases, I wonder if this could solve your problem? (if your users were limited to pulling from the image registry and couldnt refer to Tasks and Pipelines in cluster)

Yes, as I tried to convey at the bottom of #152 (comment) if we eliminated the *Run refs to ClusterTasks, Tasks, etc. that eliminates the auth checks at the heart of this TEP.

Just reposting in case it got lost in the shuffle or was not clear ... thanks

Do we plan to GA pipelines in a form were Tasks/ClusterTasks etc. still exist as k8s objects/CRDs, or will the evolution you note be a gating factor for GA, and those CRDs will no longer be in play when the project reaches GA status?

this doc has been collecting input on what we need for GA

I think the need for some form of the authorization discussed here is needed if Tasks/ClusterTasks, etc. make it to GA.

I am not yet convinced: I really need to understand the use cases that we're trying to avoid. As far as I'm concerned, if a user can create a TaskRun, they can create a pod. Being able to embed a Task spec in a TaskRun means they can make that TaskRun (and pod) do basically anything they want. I don't understand what locking down access to ClusterTasks provides.

I can see wanting to express fine grained control over what is and isn't allowed in a Tekton cluster, but I think that makes sense as a separate project vs. one off features added to the Tekton controller. Like you said this is a lot more work, but if it's important enough, I'd rather we put in the work. If it's not important enough, I don't want to put in one off work arounds if we can avoid it.

@gabemontero
Copy link
Contributor Author

Several threads of discussion manifested during the initial review. I'm going to attempt to summarize them in this single comment and then lay out as concisely as possible next steps.

First, discussion threads:

  • ask for more details on the scenarios itself, lack of consensus on its need
  • lack of consensus on requesting user vs. serviceaccount based authorization
  • debate on "native" admission webhook based solution with pipelines vs. "third party" based solutions (i.e. trying to capture the use of OPA thread here)

Then, next steps:

  • short term, I'll be updating the user scenario with more details on specifics, motivations, where this is coming from, etc.
  • I will also just list alternatives vs. specify a preferred approach and then list the remaining
  • next, I closed my current WIP PR and am pivoting to increase cycles on the "OPA/tekton integration" exploration currently going on at Red Hat (though certainly if others want to contribute cycles here, let me know and we can figure out how to collaborate). My goal is to demo in the coming weks at either the Monday API or Wednesday Global WG's an OPA version of the demo I did a few weeks ago. As a run at I may also take a stab at @jlpettersson 's Security: Protect a ServiceAccount token within a namespace pipeline#2962 as well.
  • on requesting user vs. serviceaccount, I may just hit the pause button on that for now, as the remaining work items themselves will most likely consume the cycles I can budget for tekton; certainly if others can propagate that discussion thread in the interim, by all means, but otherwise ideally we can revisit after I'm able to demo something OPA based

thanks everyone

@bobcatfish
Copy link
Contributor

Sounds great, very excited to see more on the OPA/tekton integration :D thanks @gabemontero !!

@gabemontero
Copy link
Contributor Author

Some good news: today I was able to approximate my RBAC/admission webhook based demo around enforcing which ServiceAccounts could include a given ClusterTask in a TaskRun. I believe it also provides a means for addressing tektoncd/pipeline#2962 as well (or could do so with minor adjustments).

My plan is demo this in the Aug 17 WG meeting (if I can wake up in time ;-) ) or the Aug 24 WG meeting. In the interim my next set of Tekton cycles will go toward updating this TEP based on these findings, along with making the other non-OPA adjustments discussed here to date.

@pritidesai
Copy link
Member

/kind tep

@tekton-robot tekton-robot added the kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). label Aug 20, 2020
@gabemontero
Copy link
Contributor Author

gabemontero commented Aug 24, 2020

ok @bobcatfish @dlorenc I have pushed updates for your various comments from last month ... at least I believe I have gotten to each point of feedback you had ... let me know if something was missed

please review etc. as your time permits

and as quick reminder I'll demo the OPA based approach in the Wednesday WG meeting in two days

of course if discussion around your feedback on this TEP following that demo is warranted, just let me know during the call

thanks

@gabemontero gabemontero force-pushed the sa-access-tep branch 2 times, most recently from b82b00f to 6bbc0ba Compare August 26, 2020 14:54
@gabemontero
Copy link
Contributor Author

bump on review cycles for this authorization/opa related tep

@gabemontero
Copy link
Contributor Author

UPDATE: I've been able to validate a sufficient POC level example of using k8s Subject Access Review (SAR) from a OPA Gatekeeper ConstraintTemplate and associated Constraint as a means of validating whether a TaskRun service account can access a ClusterTask that it references.

I've updated this TEP with new example yaml and text around this.

@gabemontero
Copy link
Contributor Author

gabemontero commented Oct 26, 2020

I've missed the last few API WG between PTO's and conflicts while in the office. I have a few updates, as well as some thoughts on possible adjustments to the path currently proscribed in this TEP, though I'd like if possible to discuss those possible adjustments with reviewers and/or the WG before diving in.

Aside from posting here, I'll make of mention of this in today's API WG when we go over TEPs.

First, some updates:

  1. In working with both the office of the CTO at Red Hat, as well as some of the lead consultants, those folks have taken a to-do of "productizing" some of the OPA based examples I've included here. Things like enabling TLS by default, possibly pull API server creds from the underlying pod if possible, minimally parameterizing those communication elements more robustly, etc., as part of a larger OPA catalog

  2. Similarly, in reviewing recent k8s upstream sig policy WG activity, it became clear to me that OPA is not the only "policy engine" if you will with some presence in the k8s eco system (though it could be fair to say it is the most prominent). There are several out there. Some tend to focus on alternatives to PSP. Others delve into RBAC styled policy.

On the later, one example that allows pattern matching based RBAC (though not the actual use of SAR like is possible with OPA is Kyverno. Although I have not attempted it yet, I believe the non-SAR, SA pattern matching RBAC example I have in this TEP is possible with kyverno, for example.

So, with that background, a couple of thoughts:

  1. Should we strive to be "policy engine agnostic", and have examples from more than one policy engine.

  2. With burgeoning per policy engine catalogs potentially propping up elsewhere, should the Tekton project stop short of providing its own repo of curated/supported policy examples, and rather just cite some POC styled examples (like the ones I have here) to help users get started (vs. saying "you should use this as is"), with possible references to the catalogs provided by orgs more focused on policy in k8s etc. as they evolve to the point we are comfortable referencing them. We could start with OPA and Kyverno POC examples as a starting point, and then add additional policy engine examples as user demand dictates.

All of 2) then could probably be boiled down to a new "rbac.md" or "policy.md" that sits in https://github.com/tektoncd/pipeline/tree/master/docs, and we adjust this TEP, with the doc being the end result for the TEP, and forgo any official catalogs, etc. out of github.com/tektoncd ... aside from referencing these policy engines, I could review the recent changes to the pod security policy in the config dir, and if the details / reasons why we have what we have there and not explicitly cited, we can add some detail on that as well if warranted in these new docs.

Thoughts?

@bobcatfish
Copy link
Contributor

Should we strive to be "policy engine agnostic", and have examples from more than one policy engine.

Good question - I feel strongly "yes". This seems similar to how we integrate CEL support: we want to make it simple for folks to use the expression language of their choosing, however in Triggers we have explicit CEL interceptor support (and will likely have a CEL custom Task in pipelines in the future (tektoncd/pipeline#3149).

So I think:

  1. Definitely policy engine agnostic
  2. If we only have examples for one (or two) policy engine(s) (at least initially) that seems fine too, as long as it's possible for folks to submit examples for more policy engines

With burgeoning per policy engine catalogs potentially propping up elsewhere, should the Tekton project stop short of providing its own repo of curated/supported policy examples, and rather just cite some POC styled examples (like the ones I have here) to help users get started (vs. saying "you should use this as is"), with possible references to the catalogs provided by orgs more focused on policy in k8s etc. as they evolve to the point we are comfortable referencing them.

Good point!

Maybe the way to go is to add some Tekton specific policies to these other catalogs and link to them from our own docs? It'd be nice if we could provide some kind of testing for these polices as well so people know they can rely on them 🤔

All of 2) then could probably be boiled down to a new "rbac.md" or "policy.md" that sits in https://github.com/tektoncd/pipeline/tree/master/docs, and we adjust this TEP, with the doc being the end result for the TEP, and forgo any official catalogs, etc. out of github.com/tektoncd

Sounds good to me!

Thanks again for all your work here @gabemontero 🙏

@gabemontero
Copy link
Contributor Author

Unfortunately, non-tekton responsibilities prevented me from updating this TEP per recent exchanges here and our discussion in the last API WG call on Nov 9.

My hope is to get to it sometime this week.

@tekton-robot tekton-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 23, 2020
@gabemontero
Copy link
Contributor Author

TEP moved to the teps folder @afrittoli @vdemeester

thanks gentlemen

@gabemontero
Copy link
Contributor Author

I don't think the latest tep lint failure is related to this new tep:

ERROR:root:No TEP number title (# TEP-NNNN) in /workspace/source/teps/0001-tekton-enhancement-proposal-process.md
TEP number TEP-0003 from filename does not match TEP number TEP-0002 from title (# TEP-NNNN) in /workspace/source/teps/0003-tekton-catalog-organization.md

but of course if you all know otherwise please advise

@afrittoli
Copy link
Member

/hold cancel

@tekton-robot tekton-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 30, 2020
Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a few nitpicks and comments from me but overall looks great! Thanks for your dedication to this through all the back and forth @gabemontero !! 🙏

- [Proposal](#proposal)
- [User Stories (optional)](#user-stories-optional)
- [Story 1](#story-1)
- [Story 2](#story-2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

total nit: i think the TOC might be a bit out of sync

(side note: @afrittoli @vdemeester do you think ppl are finding the table of contents useful? i dont think ive ever used it and it seems like its a bit annoying to maintain)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yep it is ... I was able to cobble together an update with doctoc ... part of next push

- the ability to inject data into the Tekton controllers/reconcilers outside of the classic, authenticated,
K8s client to server API flow or Pod Security Policy validation:

- A part of triggers is pulling in events over an unauthenticated K8s service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "unauthenticated k8s service" mean here? maybe you mean that interceptors are optional, so theoretically anything could be making requests to event listeners? im assuming that most folks will want to include at least one interceptor that verifies requests 🤔 (but i guess they can easily not do that?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does "unauthenticated k8s service" mean here? maybe you mean that interceptors are optional, so theoretically anything could be making requests to event listeners?

yep that is the basic gist ... and even with the example interceptors we provide in tektoncd/triggers, the precise means of authentication those examples provide may not be "robust" enough for some user shops, and so those shops will have to write code to meet their needs ... how "easy" that is for consumers will depend on the consumers in questions and how capable they are ... I do no think it is a slam dunk "easy for everyone"

I'll see about adding some if this elaboration in my next push.

- Extensibility
- Security, policy, authentication, and authorization

All those are a big advantage over pre-existing CI/CD systems, created before Containers and Kubernetes came en vouge, that have
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: vouge -> vogue

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed in next push

markdown files
- References to third party implementations and associated libraries and catalogs of policy definitions that can solve
various concerns around security, policy, authentication, and authorization with respect to Tekton
- There is some preliminary work already underway with the [OPA project](https://github.com/open-policy-agent/gatekeeper-library)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

looks like this is a library for gatekeeper specifically - just a note that folks might want to use OPA without using gatekeeper (im assuming for most use cases tekton users will have gatekeeper makes that most sense tho!)

i think the difference would come down to rego vs. yaml with rego embedded (that is used for k8s specific purposes) eg. https://github.com/open-policy-agent/gatekeeper-library/blob/master/library/pod-security-policy/proc-mount/template.yaml vs https://github.com/open-policy-agent/library/blob/master/docker/example.rego

Looks like there is a general OPA lib as well: https://github.com/open-policy-agent/library

(not sure this is actionable feedback from me but i wanted to bring it to your attention in case it's worth mentioning somewhere)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah this is certainly not a final list .... I fully expect things to evolve over time. The doc that I ultimately write will need to make sure it is clear that lists such as this are "living lists"

That said, we can certainly start evolving now :-) .... I'll add these links you noted here in some manner


We are building upon the fact that prior K8s design choices that allow for plugging in such solutions (i.e. admission
webhooks) is the key design point here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you plan to start with an .md file in the pipelines repo, or somewhere else, or do you think it could make sense to have a separate repo for this? (even if it was a little sparse) i could see the separate repo making sense if you wanted to have any specialized test automation, but maybe just starting with an .md file in an existing repo makes sense for now?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I want to start simple with a .md file in the pipelines repo in the docs folder there. Users are presumably already used to going there, and that is then the best place to make this subject matter available to them.

I would not be shocked at all that if things evolve in this space and gain traction, we can then evolve to a separate repo.

And I think you point about specialized test automation in spot on ... if we get to that point, a separate repo really becomes the way we should go.


Having to install a separate components like Gatekeeper into a cluster vs. having solutions land with just the
installing Pipelines may displease some consumers. We'll have to track community response as things progress and
adjust if needed. But so far, there has been more acceptance than pushback.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i was discussing with someone recently (maybe @afrittoli ?) that it would be really need to have an example repo (e.g. something like tektoncd/example) where we have triggers + pipelines setup and folks can pretty much copy it to get up and running - maybe if we have that one day we could include gatekeeper in the installation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting notion ... yeah let's try to collectively remember to include policy engine based stuff (opa, gatekeeper, or if other vendors happen to gain some traction by then) if such a thing came to fruition.

@gabemontero
Copy link
Contributor Author

OK @bobcatfish I've pushed the updated based on today's comments in a separate commit

based on where you are at with them, I'll squash if appropriate and we'll go from there

thanks !

@gabemontero
Copy link
Contributor Author

the teps-lint error still seems unrelated to this tep:

ERROR:root:No TEP number title (# TEP-NNNN) in /workspace/source/teps/0001-tekton-enhancement-proposal-process.md
TEP number TEP-0003 from filename doe

@vdemeester
Copy link
Member

@afrittoli ^^

@afrittoli
Copy link
Member

@gabemontero I'm still working on automatic rebase, sorry about that.
Would you mind rebasing the PR and refreshing the TEP table if needed? That should fix the issue.

@tekton-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: vdemeester

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@gabemontero
Copy link
Contributor Author

rebase pushed @afrittoli thanks

@gabemontero
Copy link
Contributor Author

OK I think the lint failure now could be related to my tep ... investigating:

diff --git a/teps/README.md b/teps/README.md
index f833933..e85eade 100644
--- a/teps/README.md
+++ b/teps/README.md
@@ -145,3 +145,4 @@ This is the complete list of Tekton teps:
 |[TEP-0030](0030-workspace-paths.md) | workspace-paths | proposed | 2020-10-18 |
 |[TEP-0031](0031-tekton-bundles-cli.md) | tekton-bundles-cli | proposed | 2020-11-18 |
 |[TEP-0032](0032-tekton-notifications.md) | Tekton Notifications | proposed | 2020-11-18 |
+|[TEP-0035](0035-doc-policy-authentication-authorization-rbac.md) | document-tekton-position-around-policy-authentication-authorization | implementable | 2020-11-20 |

@gabemontero
Copy link
Contributor Author

finally success with the lint !!!

I'll squash those commits along with the one from @bobcatfish 's comments once she signs off on those

Copy link
Contributor

@bobcatfish bobcatfish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great to me @gabemontero !! Thanks again :D Happy to lgtm post commit squash


- Even with in Gatekeeper, there are [Rego policies that stand on their own](https://github.com/open-policy-agent/library/blob/master/docker/example.rego)
- As well as [Rego policies embedded in Kubernets Object YAML](https://github.com/open-policy-agent/gatekeeper-library/blob/master/library/pod-security-policy/proc-mount/template.yaml)
- And finally, [there are example Kubernetes policies as part of core OPA](https://github.com/open-policy-agent/library)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 👍 👍 nice!!

@gabemontero
Copy link
Contributor Author

finally success with the lint !!!

I'll squash those commits along with the one from @bobcatfish 's comments once she signs off on those

Looks great to me @gabemontero !! Thanks again :D Happy to lgtm post commit squash

cool deal thanks @bobcatfish

commits squashed

linter passes

@bobcatfish
Copy link
Contributor

thank YOU @gabemontero ! 🙏

/lgtm
/meow

@tekton-robot
Copy link
Contributor

@bobcatfish: cat image

In response to this:

thank YOU @gabemontero ! 🙏

/lgtm
/meow

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@tekton-robot tekton-robot added the lgtm Indicates that a PR is ready to be merged. label Dec 17, 2020
@tekton-robot tekton-robot merged commit 8462e7c into tektoncd:master Dec 17, 2020
@gabemontero
Copy link
Contributor Author

Awesome thanks @bobcatfish

@gabemontero gabemontero deleted the sa-access-tep branch December 17, 2020 20:14
dlorenc pushed a commit to dlorenc/community that referenced this pull request Oct 27, 2022
…oncd#152)

Signed-off-by: cpanato <ctadeu@gmail.com>

Signed-off-by: cpanato <ctadeu@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/tep Categorizes issue or PR as related to a TEP (or needs a TEP). lgtm Indicates that a PR is ready to be merged. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants