Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

KEP 1981: Windows privileged container KEP updates for alpha #2288

Merged
merged 23 commits into from
Feb 18, 2021

Conversation

marosset
Copy link
Contributor

@marosset marosset commented Jan 19, 2021

@k8s-ci-robot k8s-ci-robot added the cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. label Jan 19, 2021
@k8s-ci-robot k8s-ci-robot added size/L Denotes a PR that changes 100-499 lines, ignoring generated files. approved Indicates a PR has been approved by an approver from all required OWNERS files. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory sig/windows Categorizes an issue or PR as relevant to SIG Windows. labels Jan 19, 2021
@marosset marosset changed the title Windows privileged container KEP updates for alpha KEP 1981: Windows privileged container KEP updates for alpha Jan 19, 2021
@marosset
Copy link
Contributor Author

/sig api-machinery
/sig node

@k8s-ci-robot k8s-ci-robot added sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. labels Jan 19, 2021
Copy link
Member

@liggitt liggitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

suggest adding sig-auth as a participating sig, since we have an interest in the securitycontext aspects of the pod.

cc @tallclair for pod security standards intersection
cc @IanColdwater @tabbysable for podsecuritypolicy intersection


A new boolean field named `privileged` will be added to [WindowsSecurityContextOptions](https://kubernetes.io/docs/reference/generated/kubernetes-api/v1.20/#windowssecuritycontextoptions-v1-core).

On Windows, all containers in a pod mush be privileged. Because of this behavior and because `WindowsSecurityContextOptions` already exists on both `PodSecurityContext` and `Container.SecurityContext` Windows containers will use this new field instead of re-using the existing `privileged` field which only exists on `SecurityContext`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding the field to WindowsSecurityContextOptions so it is set at the pod level could be ok, but I would recommend requiring (in validation) that pods that set spec.securityContext.windowsOptions.privileged=true also set securityContext.privileged=true on all containers. Policy tools already look at the container field... letting that be false while adding another field that makes a pod privileged will confuse them.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It we only require/validate windowsOptions.privileged = true is set on all containers (in addition to the pod-level field) would that be sufficient for policy tools?

no existing policy tools would know about the new field... it seems misleading to allow privileged containers that don't set the existing privileged field in the API

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@liggitt If we require that securityContext.privileged=true is set for all containers what are your thoughts on the current behavior where pod-wide WindowsSecurityContextOptions are applied to all containers if not present?

Would it be ok if the pod-wide WindowsSecuriyContextOptions.privileged=true and each container only sets securityOptions.privileged=true

for example would this be OK

spec:
  securityContext:
    windowsOptions:
      privileged: true
  containers:
  - name: foo
    securityContext:
      privileged: true

or if the pod-wide privileged field is true ensure each container sets securityOptions.privileged=true and also explicitly sets securityContext.windowsOptions.privlieged?

example:

spec:
  securityContext:
    windowsOptions:
      privileged: true
  containers:
  - name: foo
    securityContext:
      privileged: true
      windowsOptions:
        privileged: true

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As discussed below, both myself and random-liu feel it would be a better user experience if we didn't rely on existing securityContext.privileged field.
Hopefully with sufficient documentation and announcement policy tools can learn about the new windowsOption.hostProcess flag for Windows containers.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, some more context in #2288 (comment)


In beta there is a possibility to enable the privileged container to be a part of a different network component. If this feature is enabled we will use the existing Pod HostNetwork field to enable/disable.
- this pod must run on a windows host, and kubelets must reject it if not on windows hosts
- all pods marked privileged on windows must have host network enabled, if not the pod does not validate
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems good

- OS support: 1809/Windows 2019 LTSC and 2004
- Containerd: v1.5
- Kubernetes Target 1.22 or later
- OS support: 1809/Windows 2019 LTSC and 2004
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what version of Windows SAC? cc @jeremyje

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any and all versions of Windows that support k8s + containerd would support this.
Will update the enhancement.

@marosset
Copy link
Contributor Author

@derekwaynecarr @dchen1107 @mrunalp @mikebrow Would any of you be able to help review the proposed CRI changes?

@marosset
Copy link
Contributor Author

@liggitt I think I addressed most of your feedback. Can you take another look and let me know if I missed anything?

@marosset
Copy link
Contributor Author

Also, we have a proof-concept for Windows privileged container working today.

Currently to test this out you would need the following:

@liggitt
Copy link
Member

liggitt commented Feb 17, 2021

the podspec bits lgtm for sig-auth

@yujuhong
Copy link
Contributor

lgtm for sig-node

@jsturtevant
Copy link
Contributor

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Feb 17, 2021
@dchen1107
Copy link
Member

/lgtm
/approve

@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dchen1107, deads2k, marosset

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@marosset
Copy link
Contributor Author

/hold cancel
Thanks all!

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Feb 18, 2021
@k8s-ci-robot k8s-ci-robot merged commit d3f05de into kubernetes:master Feb 18, 2021
@k8s-ci-robot k8s-ci-robot added this to the v1.21 milestone Feb 18, 2021
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Mar 26, 2021
See kubernetes/enhancements#2288 for more background.

There's been asks for Windows privileged containers, or something analagous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for host process containers, which is the name we've coined these under,
is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
HostProcess containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Mar 27, 2021
See kubernetes/enhancements#2288 for more background.

There's been asks for Windows privileged containers, or something analagous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for job containers, which is the name we've coined these under internally,
is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers
Keep in mind the name chosen for the cri API and the user facing k8s was chosen
to be named HostProcess.

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
job containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Mar 27, 2021
See kubernetes/enhancements#2288 for more background.
To avoid any confusion here the name chosen for this container type for
the cri API and the user facing k8s settings is HostProcess containers
Internally we've coined these as job containers but it's referring to the
same type of container, the cri HostProcess field being set would be our
key to fill in the JobContainer field on the runtime spec for example.

There's been asks for Windows privileged containers, or something analagous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for job containers is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
job containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Mar 27, 2021
See kubernetes/enhancements#2288 for more background.
To avoid any confusion here the name chosen for this container type for
the cri API and the user facing k8s settings is HostProcess containers
Internally we've coined these as job containers but it's referring to the
same type of container, the cri HostProcess field being set would be our
key to fill in the JobContainer field on the runtime spec for example.

There's been asks for Windows privileged containers, or something analagous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for job containers is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
job containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Mar 27, 2021
See kubernetes/enhancements#2288 for more background.
To avoid any confusion here the name chosen for this container type for
the cri API and the user facing k8s settings is HostProcess containers
Internally we've coined these as job containers but it's referring to the
same type of container, the cri HostProcess field being set would be our
key to fill in the JobContainer field on the runtime spec for example.

There's been asks for Windows privileged containers, or something analagous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for job containers is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
job containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Mar 29, 2021
See kubernetes/enhancements#2288 for more background.
To avoid any confusion here the name chosen for this container type for
the cri API and the user facing k8s settings is HostProcess containers
Internally we've coined these as job containers but it's referring to the
same type of container, we'd just like to keep the name the same as we use
internally at the OCI level and in our code. The cri HostProcess field
being set would be our key to fill in the WindowsJobContainer field on the
runtime spec for example.

There's been asks for Windows privileged containers, or something analagous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for job containers is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
job containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Mar 29, 2021
See kubernetes/enhancements#2288 for more background.
To avoid any confusion here the name chosen for this container type for
the cri API and the user facing k8s settings is HostProcess containers
Internally we've coined these as job containers but it's referring to the
same type of container, we'd just like to keep the name the same as we use
internally at the OCI level and in our code. The cri HostProcess field
being set would be our key to fill in the WindowsJobContainer field on the
runtime spec for example.

There's been asks for Windows privileged containers, or something analogous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for job containers is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
job containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
dcantah added a commit to dcantah/runtime-spec that referenced this pull request Apr 6, 2021
See kubernetes/enhancements#2288 for more background.
To avoid any confusion here the name chosen for this container type for
the cri API and the user facing k8s settings is HostProcess containers
Internally we've coined these as job containers but it's referring to the
same type of container, we'd just like to keep the name the same as we use
internally at the OCI level and in our code. The cri HostProcess field
being set would be our key to fill in the WindowsJobContainer field on the
runtime spec for example.

There's been asks for Windows privileged containers, or something analogous
to it, for quite some time. While in the Linux world this can be achieved just
be loosening some of the security restrictions normally in place
for containers, this isn't as easy on Windows for many reasons. There's no such
thing as just mounting in /dev for the easy example.

The model we've landed on to support something akin to privileged containers
on Windows is to keep using the container layer technology we currently use
for Windows Server and Hyper-V isolated containers, and to simply have the
runtime manage a process, or set of processes, in a job object as the container.
The work for job containers is open source and lives here:
https://github.com/microsoft/hcsshim/tree/master/internal/jobcontainers

This approach covers all of the use cases we've currently heard that privileged
containers would be useful for. Some of these include configuring network
settings, administrative tasks, viewing/manipulating storage devices, and
the ability to simplify running daemons that need host access (kube-proxy)
on Windows. Without these changes we'd likely set an annotation to specify
that the runtime should create one of these containers, which isn't ideal.

As for the one optional field, this is really the only thing that actually
differs/isn't configurable for normal Windows Server Containers. With
job containers the final writable layer (volume) for the container
is mounted on the host so it's accessible and viewable without enumerating
the volumes on the host and trying to correlate what volume is the containers.
This is contrary to Windows Server Containers, where the volume is never mounted
to a directory anywhere, although it's still accesible from the host for the
curious.

Signed-off-by: Daniel Canter <dcanter@microsoft.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/api-machinery Categorizes an issue or PR as relevant to SIG API Machinery. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/L Denotes a PR that changes 100-499 lines, ignoring generated files. tide/merge-method-squash Denotes a PR that should be squashed by tide when it merges.
Projects
None yet
Development

Successfully merging this pull request may close these issues.