-
Notifications
You must be signed in to change notification settings - Fork 222
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add TEP-0057: Windows Support #383
Conversation
|
/assign @imjasonh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you done any experiments to determine what's broken today when you run a TaskRun scheduled to run on a Windows node? It'd be great to get an idea roughly how much work this will take to land.
|
||
## Proposal | ||
Extend Tekton so TaskRuns can request, and be scheduled, to run on Windows nodes. This will require (at least) reworking the following Tekton components: | ||
- Entrypoint binary |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd love to have more detail in this proposal about what sorts of changes you expect to need to make to the entrypoint binary especially, since this is going to be pretty central to getting things to run on Windows. Anything else (e.g., script mode) you can just say aren't available on TaskRuns that are destined run on Windows nodes.
Will this just be as simple as using and testing filepath.Join
everywhere inside the entrypoint, to make sure it signals steps to start correctly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the slow response, we've been pretty slammed this past fortnight.
We haven't been able to test this at all yet, we're still working on setting up a cluster with both windows and linux nodes so we can run real tests. While we've been using Tekton for a while now, we're still very new to the development side of things. I suspect the first thing, as you suggest, is modifying the entrypoint binary to be sure it correctly signals steps. Once this is done we'd actually want to attempt to run a TaskRun on a windows node, and see exactly what breaks.
I'd be very happy if it was as simple as filepath.Join
but I'm not sure I'm that optimistic :)
Realistically we'll need some guidance from the Tekton team here, to point us towards the key, core pieces that will need work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @DrWadsy, that's helpful to know.
It sounds like we're still pretty far from even being able to identify exactly what work we'll need to do to get Windows support working. In that case, let's get the TEP submitted in roughly this form, and iterate on it with some experiments and prototyping.
Would it be helpful to schedule some time where we can chat over video and better understand what the next steps would be? If so I'd be willing to find a time to do that.
- The burden of supporting the Windows platform may be quite high | ||
- Windows containers may introduce some security vulnerabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing these out. They do make me quite nervous 😄
Can you think of any mitigations? We could explicitly state that while Tekton might work on Windows, but that it isn't explicitly supported for now, and until we get a bit more maturity we don't recommend it for real-world usage?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately the security issues around windows containers are not going to go away until a real fix for windows containers is implemented. Until then I think we should see use of windows containers as a 'here be dragons' situation - be careful using them, and absolutely don't run any untrusted workloads in them.
I'd agree that we should avoid recommending it for real-world usage until there is a decent degree of maturity present.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there specific vulnerabilities we're concerned about?
/assign |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking great! I have some questions around Task vs TaskRun changes but generally I feel like this would be great to merge as an initial problem statement and then start iterating on the design :D
|
||
## Summary | ||
The current state of Windows containers requires that they be run on Windows nodes, it is not possible to run Windows containers on a Linux system. For use cases that require the use of Windows containers there is no choice but to run those workloads on Windows nodes. The current state of Tekton does not support Windows nodes in any form, and so there is currently no way to run Tekton Pipelines to support Windows workloads. The goal of this TEP is to provide support for Windows workloads within Tekton Pipelines, meaning that support for Windows nodes will be added. In addition to supporting Windows nodes within Tekton Pipelines we would also need to ensure that all Tekton components (the Operator, Triggers, the CLI/Dashboard and the Catalog/Hub) support, or reflect, Tasks which require Windows nodes. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i think it makes sense to include these are requirements; i would say that (ideally) these components should support the Tasks which require Windows nodes without requiring any changes (i.e. these components should all be decoupled enough that they don't need updates in order to support this)
possible exceptions:
- maybe the operator, if it needs to be aware of when Tasks that support windows should be installed (not sure it does any Task installation tho)
- if we were to update more components to run on windows (i.e. controllers, cli, dashboard) but that's outside the scope of this TEP anyway
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that's definitely what we'd be aiming for - current elements should keep working as is, with no changes necessary, other thank the elements which need to handle the Windows nodes. Everything else should just keep ticking as it already is.
- The ability to run core Tekton components on Windows nodes is not a goal of this TEP | ||
- It is expected that users will have a mix of Linux and Windows nodes and that the core Tekton controller components will run on Linux nodes | ||
- Tekton will not be responsible for determining whether a Pod for a TaskRun should be created on a Windows node. This will be an opt-in feature determined via a nodeSelector. | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another possible non-goal to include is supporting PipelineResources (at least initially) - many of the images we build and publish as part of tekton pipelines (https://github.com/tektoncd/pipeline/tree/main/cmd) are just to support PipelineResources; I think it would be reasonable to not support these in windows tasks which would reduce the number of images we need to update (but maybe you need them anyway?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a fair point - we don't need PipelineResources for our use case, so if we're happy to leave this until later that works for us :)
|
||
Workflows that do not require Windows support should not be changed in any way. | ||
|
||
Tekton Hub and Catalog should allow a Task to indicate that it requires a Windows node to run, if this is absent, the default behaviour will be to assume Linux so that no change to existing Linux workflows is required. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in the API working group yesterday @jerop pointed out that there is an issue suggesting this more generally as well: tektoncd/catalog#661 - i think it makes sense to include this as a requirement, but i also could see hashing out the details of how we indicate the windows requirement in a separate TEP (i.e. so you don't have to tackle everything at once with this TEP!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed - the mechanics of this can probably wait, we just wanted to indicate that if a Task will require a Windows node then this should be flagged as early as possible for user-friendliness.
- Entrypoint binary | ||
- All internal support images used for resource management | ||
- File paths used for step ordering, either via polling or watching files | ||
- Script mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@sbwsg pointed out in the API working group notes that pkg/pod/pod.go might be a hotspot for windows related changes
- The burden of supporting the Windows platform may be quite high | ||
- Windows containers may introduce some security vulnerabilities |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are there specific vulnerabilities we're concerned about?
- Windows containers may introduce some security vulnerabilities | ||
|
||
### User Experience (optional) | ||
- User specifies via nodeSelector if they want to run a task in a Windows container |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i dont know much about nodeSelectors and im wondering maybe you could provide an example - im wondering if the "nodeSelector" for windows nodes is always going to look the same, or if it could vary (i.e. will some clusters refer to their windows nodes as "win" and some refer to them as "windows", etc.) - because this user experience would suggest that the tekton pipelines controller needs to be able to look at the node selector and understand when it refers to a windows node (so it uses the appropriate images)
also afaik the nodeselector would likely be supplied at runtime (via a pod template in a TaskRun/Pipeline) - it seems like Tasks should have something in the Task spec that indicates if they are meant to be run on windows nodes or not
(let me know if this isnt making any sense - im not sure im explaining it well - im trying to refer to the difference between runtime and authoring time - this feels like an authoring time thing - which leads me to the conclusion that the nodeselector is maybe not the way the controller detects that it needs to use windows compatible images)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can't respond to the comment above, so I'll answer that here first. The big one is that it is possible to break-out of containers when using Windows (one example is documented here. This is something that requires a fix from Microsoft, there's nothing that can be done at our end, bit obviously this is something we need to be conscious of moving forward.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In terms of nodeSelectors - as I understand it kubernetes provides a fairly well defined way to require that a container be scheduled on a node that meets certain requirements. These can include OS requirements as described here. I'd agree that we'd probably supply the nodeSelector at runtime, but we would definitely want something in the Task (i.e. authoring time) that indicates this need. So maybe we do need something other than a nodeSelector here, but I'd imagine we'd want it to be defined along similar lines...
Another issue we need to face is the lack of flexibility when using windows containers on windows nodes - unless things have changed very recently then Windows requires that the windows version of the node and container be the same.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
re. the vulnerabilites: thanks for the example! might be worth including that link in the content of the TEP as well
re. nodeSelectors:
Thanks for the explanation, I think that makes sense to me. My conclusion is that we ultimately will need to have something in the Task that indicates this need, and then at runtime folks will need to include the appropriate nodeselector for their cluster as well but (correct me if I'm wrong!!) Tekton probably can't be too opinionated about what the node selector looks like (since folks could be using any label they want to identify their windows nodes)? Or maybe there are already known conventions around identifying windows nodes we could embrace
### Flexibility | ||
As described above, Windows support would be explicitly opt-in, meaning that non-Windows users never need to know about this feature. Adding Windows support would actually improve the flexibility of Tekton significantly by supporting workflows that are dependent on Windows containers. | ||
### Conformance | ||
Other than having to specify that a given TaskRun needs to run on a Windows node there really shouldn’t be any significant user-facing changes. With that in mind, we intend that the user experience when creating TaskRuns for Windows nodes would be identical to the experience when creating Linux based TaskRuns, other than using nodeSelector to specify the need for a Windows node. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i mentioned this in a comment above, but im wondering it makes more sense for this info to be in the Task vs the TaskRun (though the TaskRun might ultimately need to provide a nodeSelector as well 😬 )
- Running a Windows VM inside a Linux container. This is a cumbersome alternative and has performance issues that make it untenable in production environments. | ||
|
||
## Infrastructure Needed (optional) | ||
We can work in a Fork of Tekton or the Tekton experiments. Whichever the Tekton contributors consider the better alternative. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we're also going to need (in the long run) a way to test on Windows - which means we'll need to add that to our existing test infra (which is totally reasonable! but worth calling out i think :D)
I'm happy to merge this TEP as is and keep iterating on it! /approve @imjasonh @afrittoli do you have any blockers for this PR? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed with @bobcatfish
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: bobcatfish, vdemeester The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/lgtm |
Adding and releasing a hold to see if it unblocks the merge queue /hold |
/hold cancel It looks like this PR might need to be rebased. Or at least that's what tide is implying with the error messages in its history:
|
Just a quick ping - rebase on latest |
The label will be removed automatically anyway when the PR is rebased, removing it manually to see if it unblocks things / stops spamming Tide. |
Sorry, I was away for a few days. Rebased this morning, looks like it's just waiting on that lgtm label :) |
/lgtm |
This TEP aims to address the need for Windows support in Tekton.
This was originally raised in #1826