-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Windows RuntimeClass KEP #1302
Windows RuntimeClass KEP #1302
Conversation
/assign @tallclair |
The KEP looks good to me, but want to make sure SIG Auth know the newly proposed self-labeling here. Will leave @tallclair to make the final call here. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good with some minor nits and clarifications.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for writing the KEP Patrick!
I agree with @dchen1107 that this generally looks good, but I would prefer we have a consistent opinion around osversion across windows and linux hosts. Right now, osversion seems to match the kernel label that we can apply from node-feature discovery centrally via its fleecing daemon. Otherwise, I agree with the approach of RuntimeClass per SIG Node discussion. defer to final approval on label to SIG Auth. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly LGTM with a few nits & a question about the osversion label schema for linux.
|
||
Windows product names such as "Windows Server version 1903" don't align with the actual build numbers, so build numbers will be used instead. Users can choose to reflect the product name instead as shown in the user stories. | ||
|
||
[Windows Update History] has a full list of version numbers by release & patch. Starting from an example of `10.0.17763.805` - the OS major, minor, and build number - `10.0.17763` - should match for containers to be compatible. The final `.805` refers to the monthly patches and isn't required for compatibility. Therefore, a value such as `node.kubernetes.io/osversion = 10.0.17763` will be used. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the schema for linux versions? Or will this be a windows-only label? If the latter, I'd prefer to use something like node.kubernetes.io/windows-build
for the label.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
windows-build is fine w/ me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
alright will change to windows-build. Thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does this version change on every windows update?
something similar was discussed for linux kernel version in kubernetes/kubernetes#51006 (comment)
were the need for the node to update the label value (as opposed to the set-once labels it sets now), and the issues with fine-grained selection discussed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Therefore, a value such as
node.kubernetes.io/osversion = 10.0.17763
will be used.
Are we confident the version portion will always fit within label values validation?
The final
.805
refers to the monthly patches and isn't required for compatibility. Therefore, a value such asnode.kubernetes.io/osversion = 10.0.17763
will be used.
Would a container ever have to target a specific fix level?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah I think that validation should be ok for a few reasons:
- The internal Windows build name (the full string) is limited to that character set. Anything we would use is a subset of that. Here's an example from Windows 10, but server follows the same format:
18362.1.amd64fre.19h1_release.190318-1202
- That version is not used as-is today. It's already massaged into
a more readable format returned fromkubectl describe node
https://github.com/kubernetes/kubernetes/blob/0599ca2bcfcae7d702f95284f3c2e2c2978c7772/vendor/github.com/docker/docker/pkg/parsers/operatingsystem/operatingsystem_windows.go#L10 . For the kubelet's purpose, I'll be implementing something like that to not include the patch (UBR) number and instead just return the build number such as10.0.17763
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sounds reasonable. might be worth adding a note with those findings.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure, working on it now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added @ line 290 in latest revision
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added at line 290 below
value: "true" | ||
``` | ||
|
||
1. Once sufficient testing is done, the RuntimeClass from Story 1 could be updated. This would cause new deployments to go to these nodes, without having to update them individually. The nodes running the previous Windows version could be drained and removed from the cluster. As the pods running on them are rescheduled, the new RuntimeClass will be applied. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The recommended approach is to create a new RuntimeClass and update the deployment. The reason is that forcing the deployment update means that the new class will be rolled out in a more controlled and predictable manner (through a rolling update). You sort of get that with drain, but the tooling around a proper rolling update is purpose built for dealing with rollouts.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
so do you think it's better to update an existing RuntimeClass to "work again", or just delete it and fail scheduling if someone mistakenly uses one that doesn't exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, I see. That's probably a case-by-case decision. The cluster admin will need to decide if they want people to explicitly opt-in to using hyperv.
/lgtm /hold |
For Kubernetes 1.17, we're proposing three key things to improve the experience deploying Windows containers across multiple versions, and enable experimentation with Hyper-V while ContainerD support is in alpha/beta stages. | ||
|
||
1. Add `node.kubernetes.io/windows-build` label using Windows kubelet | ||
2. Add RuntimeHandler to the CRI pull API |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm concerned we need this for other ImageService
methods as well. For example if we use the ImageStatus
call to determine if an image is present it will also need the RuntimeHandler
to know if the "specific" version of the image is present. Otherwise it could return the appropriate status for the wrong version.
name: mypod | ||
spec: | ||
os: windows | ||
osversion: 1809 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for the same reasons above we want this to be Build
and 1809 -> 10.0.17763
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed, will update for consistency but it shouldn't block merging. This section is detailing out an alternative that was rejected and won't be implemented
|
||
Here's the steps needed to pull and start a pod+container matching a specific os/arch/version: | ||
|
||
- ImageService.PullImage(PullImageRequest) - PullImageRequest includes a `PodSandboxConfig` reference |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any idea what SandboxConfig
is passed here? Is the order for Kubelet
RunPodSandbox
PullImage(worker1 image, config from step 1)
CreateContainer(worker1, config from step 1)
?
Or is PullImage allowed to run before aRunPodSandbox
? The reason I ask is because, if it is a requirement thatPullImage
happens AFTERRunPodSandbox
then I guess we can lookup the sandboxRuntimeHandler
by finding the existingsandbox
from theSandboxConfig.Metadata
. This although seems to be a very light contract and I would prefer we dont do it this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
during the main sync loop, sandbox is created first https://github.com/kubernetes/kubernetes/blob/344054bf4aa47fbcf1f3776a8210b7992ea08d11/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L631
then the image is checked, pulled if needed
https://github.com/kubernetes/kubernetes/blob/344054bf4aa47fbcf1f3776a8210b7992ea08d11/pkg/kubelet/kuberuntime/kuberuntime_container.go#L93
then container created then started
https://github.com/kubernetes/kubernetes/blob/344054bf4aa47fbcf1f3776a8210b7992ea08d11/pkg/kubelet/kuberuntime/kuberuntime_container.go#L126
Based on that ordering, I think you're right that the other image actions need the runtimeClass defined.
@Random-Liu - what do you think? Should we
- keep this as a loose contract as Justin mentioned where the runtimeHandler is assumed to match that of the sandbox
or
- also add
runtimeClass
to all of the image operations:PullImageRequest
ListImagesRequest
andRemoveImageRequest
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yep. The first thing it does is get the ImageStatus
and use that to determine if a PullImage
is required which means we do need RuntimeHandler
for the others.
--
I think that my idea is a bad one for the record. Because it also doesn't solve the ImageStatus
issue above where it doesnt have the SandboxConfig
. So even though the sandbox is indeed created we would fail the check.
/lgtm |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: ddebroy, PatrickLang, tallclair The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
|
||
[Windows Update History] has a full list of version numbers by release & patch. Starting from an example of `10.0.17763.805` - the OS major, minor, and build number - `10.0.17763` - should match for containers to be compatible. The final `.805` refers to the monthly patches and isn't required for compatibility. Therefore, a value such as `node.kubernetes.io/windows-build = 10.0.17763` will be used. Each one of these Windows Server version will have corresponding containers released - see [Container Base Images] and [Windows Insider Container Images] for details. | ||
|
||
This will pass the regex validation for labels (references: [src1](https://github.com/kubernetes/kubernetes/blob/release-1.16/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L30-L32) [src2](https://github.com/kubernetes/kubernetes/blob/release-1.16/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L88-L108)). Even the most specific identifier in the Windows registry `BuildLabEx`, for example `18362.1.amd64fre.19h1_release.190318-1202` is within the allowed length and characters to pass the existing validation, although we're not planning to use that whole string. Instead, the kubelet will shorten to just what's needed similar to what's returned today in the output from `kubectl describe node` ([src](https://github.com/kubernetes/kubernetes/blob/0599ca2bcfcae7d702f95284f3c2e2c2978c7772/vendor/github.com/docker/docker/pkg/parsers/operatingsystem/operatingsystem_windows.go#L10)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the kubelet will shorten to just what's needed
Wherever the transformation is implemented, the output of that transforming code needs to be treated as an API, since for a given windows build, consumers will write manifests with selectors against that label value.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure makes sense. I was planning to add a node unit test for it and cover it in the e2e runtimeclass tests for Windows.
mechanics of the proposed label look good to me (interaction with noderestriction plugins, considerations for label value validation and stability over time). based on #1302, node folks were ok with it as well /hold cancel |
This is the KEP for #1301
Adding @tallclair @derekwaynecarr @benmoss @dchen1107 @m2 as reviewers since they provided feedback in the SIG meetings
/sig windows
/sig node
/milestone v1.17