-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create KEP for Windows Node Support #676
Conversation
Adds a KEP covering Windows support and a sig-windows directory for it to live in.
/milestone v1.14 |
/assign @bgrant0607 |
/sig node |
/sig windows |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This SIG Windows folder will also need an OWNERS file. Please use this as a template and s/azure/windows
/assign @michmike @PatrickLang |
Signed-off-by: Ben Moss <bmoss@pivotal.io>
/lgtm Once this is merged as a draft, we can split up the work including adding the test case list and other sections @spiffxp has requested for v1.14 release. |
/lgtm |
pinging @bgrant0607 @jdumars or @jbeda - can we get this /approve'd? we have multiple people ready to contribute more to the draft and we can't do that until we get our own directory+OWNERS file here merged. KEPs are about merging early and iterating quickly until they're done. |
/approve Clearly this is something that folks are taking up and getting the doc checked in will facilitate more focused discussions. Note that this isn't official until it is marked "implementable". |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: benmoss, jbeda, justaugustus, michmike, PatrickLang The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/milestone v1.14 |
Since I don't know of a better way on Github, I'm going to review this PR even though it's already merged. |
For reference, there was a draft of this sent by email: |
For visibility: SIG Testing, Release, and Docs will need followup also |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Made a first pass. I'll look again through previous emails and docs.
|
||
### Goals | ||
|
||
- Enable users to run nodes on Windows servers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would have written:
- Enable users to run Windows server containers on Windows servers using Kubernetes
- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing) | ||
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers | ||
|
||
<sup id="a1">1</sup> This list should be available at https://k8s-testgrid.appspot.com/sig-windows but this test setup is not currently working. https://k8s-testgrid.appspot.com/google-windows#windows-prototype is also running against a Windows cluster. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is addressing those issues part of #685?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Current list of skipped tests appears to be here:
https://github.com/kubernetes/test-infra/blob/master/config/jobs/kubernetes-sigs/sig-windows/sig-windows-config.yaml#L69
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm clarifying those in #685
**User experience**: Users today will need to use some combination of taints and node selectors in order to keep Linux and Windows workloads separated. In the best case this imposes a burden only on Windows users, but this is still less than ideal. | ||
|
||
## Graduation Criteria | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@craiglpeters agreed to draft these
To use as a starting point, here are some issues discussed in email and in prior SIG Arch meetings:
-
There need to be adequate, continuously run, non-flaky tests with publicly accessible results, enabled as part of the release-blocking suite. Without this it's hard to have reasonable discussions about what does and doesn't work, and the release team can't make a judgement about release readiness or risk. Really, this is needed for any feature at any stage of maturity in order for us to make it available to users in a Kubernetes release.
-
There needs to be adequate end user and admin documentation that describes what the user does and how to use it. I know there is a start on user documentation (WIP: Windows doc set for v1.13 stable website#10875), which at least covered "how to use it", and I'll take another look at it. One purpose of this KEP was to fill the role that a priori design proposals traditionally fill in providing a deeper level of detail about how a feature works and why.
-
Reliability needs to be sufficiently high. Users run GA features in production. Usually we have some mileage on features in beta before they go GA, and at least a quarter or two of e2e test results.
-
Compatibility can't be broken in GA features, either for existing users/clusters/features or for the new feature going forward, and the feature needs to adhere to the deprecation policy (https://kubernetes.io/docs/reference/using-api/deprecation-policy/).
Note that a draft document stated "you may want to wait for Windows Server 2019 availability from Microsoft and support in Kubernetes for production workloads", which needs to be clarified.
There were also questions about the user experience, particular for mixed-OS clusters. Alternatives for ensuring Windows containers land on Windows nodes and Linux containers land on Linux nodes include:
- Manual node labels and selectors for both Linux and Windows workloads
- Manual taints and tolerations just for Windows workloads
- Automatically applied nodeSelectors for both Linux and Windows workloads
- derived from image manifest
- derived from something else in PodSpec
- Automatically applied tolerations for at least Windows workloads
- derived from image manifest
- derived from something else in PodSpec
Some issues with the above:
- We don't want to break compatibility for existing Linux workloads
- We don't want the UX for Windows apps to be worse than for Linux forever
- Setting first-class os and arch properties by default in the apiserver would break existing use cases, such as ARM
os and arch node labels appear to be still be beta - Not clear that most container images contain the necessary OS info
- Not clear that extracting the OS info from the container image manifest during admission control is feasible for private image repos
Some of this was discussed in a document:
https://docs.google.com/document/d/1XLs8Mbz1-xOIiDW9XSSuhx9fshpxJM1NDD1a0oVbzfc/edit
- Privileged containers | ||
- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed) | ||
- CSI plugins, which require privileged containers | ||
- [Some parts of the V1 API](https://github.com/kubernetes/kubernetes/issues/70604) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please inline the contents of that issue into this document
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also, it seems we lost quite a bit of detail compared to previous discussions. Do those issues still hold true?
https://docs.google.com/document/d/1YkLZIYYLMQhxdI2esN5PuTkhQHhO0joNvnbHpW68yg8/edit#heading=h.4khm1q370oiq
For instance, some pod features didn't work due to: Single file volume mappings. No shipped releases of Windows can map a single file, only an entire folder, into a pod/container.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Other persistent issues: uid/guid vs usernames, per-user Linux filesystem permissions, read-only root filesystems
Other resolvable issues: images using Linux-specific tools, hardcoded images with no windows equivalent
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are system OOMs reported?
### What will never work (without underlying OS changes) | ||
- Certain Pod functionality | ||
- Privileged containers | ||
- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume that QoS (burstable, best effort) doesn't work
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are there equivalents of any of the shared namespaces (e.g., shareProcessNamespace)? Can containers within a pod see each other in any way?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does terminationGracePeriodSeconds work?
|
||
### What will never work (without underlying OS changes) | ||
- Certain Pod functionality | ||
- Privileged containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I assume Linux capabilities don't work?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
And Linux-specific security features, such as seccomp, SELinux, and AppArmor
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should enumerate all of the fields of PodSecurityContext that don't make sense for Windows
|
||
### What works today | ||
- Windows-based containers can be created by kubelet, [provided the host OS version matches the container base image](https://docs.microsoft.com/en-us/virtualization/windowscontainers/deploy-containers/version-compatibility) | ||
- ConfigMap, Secrets: as environment variables or volumes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How about volumes, such as emptyDir, shared between containers within a Pod?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about storage medium Memory or HugePages?
- Certain Pod functionality | ||
- Privileged containers | ||
- Reservations are not enforced by the OS, but overprovisioning could be blocked with `--enforce-node-allocatable=pods` (pending: tests needed) | ||
- CSI plugins, which require privileged containers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FlexVolume?
- Dockershim CRI | ||
- Many<sup id="a1">[1]</sup> of the e2e conformance tests when run with [alternate Windows-based images](https://hub.docker.com/r/e2eteam/) which are being moved to [kubernetes-sigs/windows-testing](https://www.github.com/kubernetes-sigs/windows-testing) | ||
- Persistent storage: FlexVolume with [SMB + iSCSI](https://github.com/Microsoft/K8s-Storage-Plugins/tree/master/flexvolume/windows), and in-tree AzureFile and AzureDisk providers | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do pod hostname and subdomain fields work? How about hostAliases? dnsConfig?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Basically I expect someone to read through PodSpec field by field to make sure we haven't forgotten something.
https://github.com/kubernetes/kubernetes/blob/master/staging/src/k8s.io/api/core/v1/types.go#L2743
Thanks @bgrant0607 . I updated some of the test sections in #685, and will continue working with Craig on the other areas in additional PRs as we finish this KEP. |
Adds a KEP covering Windows support and a sig-windows directory for it to live in.