Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Windows RuntimeClass KEP #1302

Merged
merged 6 commits into from
Oct 17, 2019

Conversation

PatrickLang
Copy link
Contributor

@PatrickLang PatrickLang commented Oct 11, 2019

This is the KEP for #1301

Adding @tallclair @derekwaynecarr @benmoss @dchen1107 @m2 as reviewers since they provided feedback in the SIG meetings

/sig windows
/sig node
/milestone v1.17

@k8s-ci-robot k8s-ci-robot added the sig/windows Categorizes an issue or PR as relevant to SIG Windows. label Oct 11, 2019
@k8s-ci-robot k8s-ci-robot added this to the v1.17 milestone Oct 11, 2019
@k8s-ci-robot k8s-ci-robot added sig/node Categorizes an issue or PR as relevant to SIG Node. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. approved Indicates a PR has been approved by an approver from all required OWNERS files. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels Oct 11, 2019
@k8s-ci-robot k8s-ci-robot added the kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory label Oct 11, 2019
@PatrickLang
Copy link
Contributor Author

/assign @tallclair
/assign @dchen1107

@dchen1107
Copy link
Member

The KEP looks good to me, but want to make sure SIG Auth know the newly proposed self-labeling here. Will leave @tallclair to make the final call here.

Copy link
Member

@ddebroy ddebroy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good with some minor nits and clarifications.

Copy link

@pjh pjh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for writing the KEP Patrick!

keps/sig-windows/windows-runtimeclass-support.md Outdated Show resolved Hide resolved
keps/sig-windows/windows-runtimeclass-support.md Outdated Show resolved Hide resolved
@derekwaynecarr
Copy link
Member

I agree with @dchen1107 that this generally looks good, but I would prefer we have a consistent opinion around osversion across windows and linux hosts. Right now, osversion seems to match the kernel label that we can apply from node-feature discovery centrally via its fleecing daemon. Otherwise, I agree with the approach of RuntimeClass per SIG Node discussion.

defer to final approval on label to SIG Auth.

Copy link
Member

@tallclair tallclair left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly LGTM with a few nits & a question about the osversion label schema for linux.

keps/sig-windows/windows-runtimeclass-support.md Outdated Show resolved Hide resolved

Windows product names such as "Windows Server version 1903" don't align with the actual build numbers, so build numbers will be used instead. Users can choose to reflect the product name instead as shown in the user stories.

[Windows Update History] has a full list of version numbers by release & patch. Starting from an example of `10.0.17763.805` - the OS major, minor, and build number - `10.0.17763` - should match for containers to be compatible. The final `.805` refers to the monthly patches and isn't required for compatibility. Therefore, a value such as `node.kubernetes.io/osversion = 10.0.17763` will be used.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the schema for linux versions? Or will this be a windows-only label? If the latter, I'd prefer to use something like node.kubernetes.io/windows-build for the label.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

windows-build is fine w/ me.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

alright will change to windows-build. Thanks!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does this version change on every windows update?

something similar was discussed for linux kernel version in kubernetes/kubernetes#51006 (comment)

were the need for the node to update the label value (as opposed to the set-once labels it sets now), and the issues with fine-grained selection discussed?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Therefore, a value such as node.kubernetes.io/osversion = 10.0.17763 will be used.

Are we confident the version portion will always fit within label values validation?

The final .805 refers to the monthly patches and isn't required for compatibility. Therefore, a value such as node.kubernetes.io/osversion = 10.0.17763 will be used.

Would a container ever have to target a specific fix level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah I think that validation should be ok for a few reasons:

  1. The internal Windows build name (the full string) is limited to that character set. Anything we would use is a subset of that. Here's an example from Windows 10, but server follows the same format: 18362.1.amd64fre.19h1_release.190318-1202
  2. That version is not used as-is today. It's already massaged into
    a more readable format returned from kubectl describe node
    https://github.com/kubernetes/kubernetes/blob/0599ca2bcfcae7d702f95284f3c2e2c2978c7772/vendor/github.com/docker/docker/pkg/parsers/operatingsystem/operatingsystem_windows.go#L10 . For the kubelet's purpose, I'll be implementing something like that to not include the patch (UBR) number and instead just return the build number such as 10.0.17763

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sounds reasonable. might be worth adding a note with those findings.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure, working on it now.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added @ line 290 in latest revision

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

added at line 290 below

keps/sig-windows/windows-runtimeclass-support.md Outdated Show resolved Hide resolved
keps/sig-windows/windows-runtimeclass-support.md Outdated Show resolved Hide resolved
value: "true"
```

1. Once sufficient testing is done, the RuntimeClass from Story 1 could be updated. This would cause new deployments to go to these nodes, without having to update them individually. The nodes running the previous Windows version could be drained and removed from the cluster. As the pods running on them are rescheduled, the new RuntimeClass will be applied.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The recommended approach is to create a new RuntimeClass and update the deployment. The reason is that forcing the deployment update means that the new class will be rolled out in a more controlled and predictable manner (through a rolling update). You sort of get that with drain, but the tooling around a proper rolling update is purpose built for dealing with rollouts.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

so do you think it's better to update an existing RuntimeClass to "work again", or just delete it and fail scheduling if someone mistakenly uses one that doesn't exist?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, I see. That's probably a case-by-case decision. The cluster admin will need to decide if they want people to explicitly opt-in to using hyperv.

@PatrickLang PatrickLang changed the title Starting Windows RuntimeClass KEP Windows RuntimeClass KEP Oct 15, 2019
@tallclair
Copy link
Member

/lgtm
For the RuntimeClass changes. I'm also ok with the windows-specific label.

/hold
To let the other reviewers comment on the label schema.

@k8s-ci-robot k8s-ci-robot added do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. lgtm "Looks good to me", indicates that a PR is ready to be merged. labels Oct 16, 2019
For Kubernetes 1.17, we're proposing three key things to improve the experience deploying Windows containers across multiple versions, and enable experimentation with Hyper-V while ContainerD support is in alpha/beta stages.

1. Add `node.kubernetes.io/windows-build` label using Windows kubelet
2. Add RuntimeHandler to the CRI pull API
Copy link

@jterry75 jterry75 Oct 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm concerned we need this for other ImageService methods as well. For example if we use the ImageStatus call to determine if an image is present it will also need the RuntimeHandler to know if the "specific" version of the image is present. Otherwise it could return the appropriate status for the wrong version.

name: mypod
spec:
os: windows
osversion: 1809

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think for the same reasons above we want this to be Build and 1809 -> 10.0.17763

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, will update for consistency but it shouldn't block merging. This section is detailing out an alternative that was rejected and won't be implemented


Here's the steps needed to pull and start a pod+container matching a specific os/arch/version:

- ImageService.PullImage(PullImageRequest) - PullImageRequest includes a `PodSandboxConfig` reference

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any idea what SandboxConfig is passed here? Is the order for Kubelet

  1. RunPodSandbox
  2. PullImage(worker1 image, config from step 1)
  3. CreateContainer(worker1, config from step 1)
    ?
    Or is PullImage allowed to run before a RunPodSandbox? The reason I ask is because, if it is a requirement that PullImage happens AFTER RunPodSandbox then I guess we can lookup the sandbox RuntimeHandler by finding the existing sandbox from the SandboxConfig.Metadata. This although seems to be a very light contract and I would prefer we dont do it this way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

during the main sync loop, sandbox is created first https://github.com/kubernetes/kubernetes/blob/344054bf4aa47fbcf1f3776a8210b7992ea08d11/pkg/kubelet/kuberuntime/kuberuntime_manager.go#L631

then the image is checked, pulled if needed
https://github.com/kubernetes/kubernetes/blob/344054bf4aa47fbcf1f3776a8210b7992ea08d11/pkg/kubelet/kuberuntime/kuberuntime_container.go#L93

then container created then started
https://github.com/kubernetes/kubernetes/blob/344054bf4aa47fbcf1f3776a8210b7992ea08d11/pkg/kubelet/kuberuntime/kuberuntime_container.go#L126

Based on that ordering, I think you're right that the other image actions need the runtimeClass defined.

@Random-Liu - what do you think? Should we

  1. keep this as a loose contract as Justin mentioned where the runtimeHandler is assumed to match that of the sandbox

or

  1. also add runtimeClass to all of the image operations: PullImageRequest ListImagesRequest and RemoveImageRequest

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep. The first thing it does is get the ImageStatus and use that to determine if a PullImage is required which means we do need RuntimeHandler for the others.

--

I think that my idea is a bad one for the record. Because it also doesn't solve the ImageStatus issue above where it doesnt have the SandboxConfig. So even though the sandbox is indeed created we would fail the check.

@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 16, 2019
@ddebroy
Copy link
Member

ddebroy commented Oct 17, 2019

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Oct 17, 2019
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ddebroy, PatrickLang, tallclair

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment


[Windows Update History] has a full list of version numbers by release & patch. Starting from an example of `10.0.17763.805` - the OS major, minor, and build number - `10.0.17763` - should match for containers to be compatible. The final `.805` refers to the monthly patches and isn't required for compatibility. Therefore, a value such as `node.kubernetes.io/windows-build = 10.0.17763` will be used. Each one of these Windows Server version will have corresponding containers released - see [Container Base Images] and [Windows Insider Container Images] for details.

This will pass the regex validation for labels (references: [src1](https://github.com/kubernetes/kubernetes/blob/release-1.16/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L30-L32) [src2](https://github.com/kubernetes/kubernetes/blob/release-1.16/staging/src/k8s.io/apimachinery/pkg/util/validation/validation.go#L88-L108)). Even the most specific identifier in the Windows registry `BuildLabEx`, for example `18362.1.amd64fre.19h1_release.190318-1202` is within the allowed length and characters to pass the existing validation, although we're not planning to use that whole string. Instead, the kubelet will shorten to just what's needed similar to what's returned today in the output from `kubectl describe node` ([src](https://github.com/kubernetes/kubernetes/blob/0599ca2bcfcae7d702f95284f3c2e2c2978c7772/vendor/github.com/docker/docker/pkg/parsers/operatingsystem/operatingsystem_windows.go#L10))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the kubelet will shorten to just what's needed

Wherever the transformation is implemented, the output of that transforming code needs to be treated as an API, since for a given windows build, consumers will write manifests with selectors against that label value.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure makes sense. I was planning to add a node unit test for it and cover it in the e2e runtimeclass tests for Windows.

@liggitt
Copy link
Member

liggitt commented Oct 17, 2019

/hold
To let the other reviewers comment on the label schema.

mechanics of the proposed label look good to me (interaction with noderestriction plugins, considerations for label value validation and stability over time). based on #1302, node folks were ok with it as well

/hold cancel

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Oct 17, 2019
@k8s-ci-robot k8s-ci-robot merged commit c21bf3a into kubernetes:master Oct 17, 2019
@PatrickLang PatrickLang deleted the windows-runtimeclass branch October 17, 2019 19:53
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/kep Categorizes KEP tracking issues and PRs modifying the KEP directory lgtm "Looks good to me", indicates that a PR is ready to be merged. sig/node Categorizes an issue or PR as relevant to SIG Node. sig/windows Categorizes an issue or PR as relevant to SIG Windows. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants