Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add enhancement proposal for selinux-operator #327

Closed
wants to merge 1 commit into from

Conversation

JAORMX
Copy link
Contributor

@JAORMX JAORMX commented May 15, 2020

This operator aims to simply the installation, usage, removal and
auditing of SELinux policies at a cluster level.

@openshift-ci-robot
Copy link

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: JAORMX
To complete the pull request process, please assign jcantrill
You can assign the PR to them by writing /assign @jcantrill in a comment when ready.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@JAORMX JAORMX force-pushed the selinux-operator branch 2 times, most recently from fd13cc2 to 78df048 Compare May 15, 2020 11:02
@ashcrow ashcrow requested review from ashcrow and cgwalters May 15, 2020 12:55
Copy link
Member

@ashcrow ashcrow left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like this idea. One thing that hits me though is exposing SELinux at the cluster level makes sense, but SELinux is not the most user unfriendly thing. I'd love a future one to have the groupings of policies that can be reused -- or something else that makes the configuration feel less like modifying hosts (even though that's what's happening under the covers). Just a thought.

One question is around why we'd do this without modifying the MachineConfig. Is this to avoid rebooting?

Provide a way for deployers to install, remove and audit SELinux policies for
containers in the cluster.

### Non-Goals
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 Agreed with the non goals.

@JAORMX
Copy link
Contributor Author

JAORMX commented May 15, 2020

I like this idea. One thing that hits me though is exposing SELinux at the cluster level makes sense, but SELinux is not the most user unfriendly thing. I'd love a future one to have the groupings of policies that can be reused -- or something else that makes the configuration feel less like modifying hosts (even though that's what's happening under the covers). Just a thought.

One question is around why we'd do this without modifying the MachineConfig. Is this to avoid rebooting?

Right, so that definitely plays into this.

The worklow is as follows:

  • Install policy object
  • run workload (create pod/deployment)

We wouldn't want the nodes to have to restart before you can run your pod.

On the other hand, we don't really persist files directly into the host with this. The CIL file is just taken into use by the semodule command which would eventually persist the binary/compiled representation of it into the module store. So we don't necessarily need a MachineConfig in this case.

Copy link
Member

@derekwaynecarr derekwaynecarr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The use cases around SELinux are also very similar to what you may see with seccomp profile distribution. There is a proposal for a seccomp operator to distribute seccomp profiles in SIG Node upstream as a new sub-project. If you could describe why the MCO is unable to meet the need to distribute this information, and if it was evaluated and rejected, please add that to this proposal.

/cc @mrunalp

enhancements/security/selinux-operator.md Show resolved Hide resolved
* `status.usage`: This string indicates how the user is supposed to use this
policy in their `Pod` templates. The format is as follows:
`<policy name>_<policy namespace>.process>`
* `status.state`: This represents the current status of the policy in the
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what does status.state mean if the cluster is autoscaled? is the state actually installed until the the daemonset is scheduled to that new node and rendering its value? if this was driven through MCO, we have a way of knowing if a change is applied across a whole pool of nodes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The installation is managed via a DaemonSet. If you scale your nodes, the daemonset scales too. so those nodes should get the policy as well.

`<policy name>_<policy namespace>.process>`
* `status.state`: This represents the current status of the policy in the
cluster. The accepted states are: `PENDING`, `IN-PROGRESS`, `INSTALLED`,
`ERROR`. A typical reason why a policy would end up in an ERROR state would
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

when is this validation performed? is this asserting that validation is done not at admission time?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, it's not done at admission time. It could! I just haven't coded that yet.

* Given that this uses a custom operator and a custom resource, integration
with `Pods` and standard kubernetes resources doesn't necessarily use the
same names (e.g. why the `usage` section was needed in the first place). In
the future, we could consider adding a section to the `seLinuxOptions`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this would require a change upstream k8s itself, right? is it better to explore this upstream similar to the seccomp proposed sub-project?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be an option. However, I decided to post this here first cause it's better to first show something that's working and gain adoption, before trying to submit it there. SELinux is not available for all kube distros after all.

@derekwaynecarr
Copy link
Member

Is there a reason why we would not want this API to be part of the machine config operator? So a higher-level domain specific API, but still coordinated via a common delivery mechanism.

@JAORMX
Copy link
Contributor Author

JAORMX commented May 15, 2020

Is there a reason why we would not want this API to be part of the machine config operator? So a higher-level domain specific API, but still coordinated via a common delivery mechanism.

We wouldn't persist the policy files on the host, these get compiled and installed via SELinux utilities. So it wouldn't be able to use the same delivery mechanism that Machineconfig provides.

@ashcrow ashcrow requested a review from runcom May 15, 2020 16:46
This operator aims to simply the installation, usage, removal and
auditing of SELinux policies at a cluster level.
@mrunalp
Copy link
Member

mrunalp commented May 18, 2020

cc: @rhatdan

@mrunalp
Copy link
Member

mrunalp commented May 18, 2020

@rhatdan recently added support for kata containers to SELinux. @rhatdan do you think it may be worth only installing that policy on nodes where we end up installing kata using what where we end up with this proposal?

@JAORMX
Copy link
Contributor Author

JAORMX commented May 18, 2020

@rhatdan recently added support for kata containers to SELinux. @rhatdan do you think it may be worth only installing that policy on nodes where we end up installing kata using what where we end up with this proposal?

@mrunalp do you have a link to that enhancement or docs? I could add support for adding a "nodeSelector" to the policy Custom Resource, so it would only get installed in the nodes that are specified. This way, you could tell it to only install it on nodes that have kata (when needed). Note that this is also meant to address non-kata cases, so it would be useful without kata still.

@mrunalp
Copy link
Member

mrunalp commented May 18, 2020

@JAORMX It went into containers-selinux in containers/container-selinux@b321ea4 and follow on PRs. I am more wondering here whether that should be split out to be an optional policy that we only install on kata nodes. Thanks!

@rhatdan
Copy link

rhatdan commented May 18, 2020

No reason to make this an optional policy. It is in the base and will be used by Podman also. I would think the selinux-operator would be for custom user policy not for something that is in the base system.

@JAORMX JAORMX requested a review from ashcrow May 20, 2020 06:41
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Oct 13, 2020
@openshift-bot
Copy link

Stale issues rot after 30d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle rotten.
Rotten issues close after an additional 30d of inactivity.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle rotten
/remove-lifecycle stale

@openshift-ci-robot openshift-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Nov 12, 2020
@openshift-ci-robot openshift-ci-robot added the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 12, 2020
@ashcrow
Copy link
Member

ashcrow commented Nov 12, 2020

/remove-lifecycle rotten

@openshift-ci-robot openshift-ci-robot removed the lifecycle/rotten Denotes an issue or PR that has aged beyond stale and will be auto-closed. label Nov 12, 2020
@JAORMX
Copy link
Contributor Author

JAORMX commented Nov 12, 2020

/hold

I'll get back to this soon. I'm gonna do a bit more user-research to present more solid use-cases here.

@openshift-ci-robot openshift-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Nov 12, 2020
@openshift-bot
Copy link

Issues go stale after 90d of inactivity.

Mark the issue as fresh by commenting /remove-lifecycle stale.
Stale issues rot after an additional 30d of inactivity and eventually close.
Exclude this issue from closing by commenting /lifecycle frozen.

If this issue is safe to close now please do so with /close.

/lifecycle stale

@openshift-ci-robot openshift-ci-robot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Feb 10, 2021
@JAORMX JAORMX closed this Feb 11, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants