-
Notifications
You must be signed in to change notification settings - Fork 5.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial design proposal for Scheduling Policy #1937
Changes from 1 commit
1e64624
0943979
8ed464f
d129432
aad9f2f
bfc1bb9
f7465aa
f968f81
00596bc
74e3cb2
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,355 @@ | ||
# Scheduling Policy | ||
|
||
_Status: Draft_ | ||
_Author: @arnaudmz, @yastij_ | ||
_Reviewers: @bsalamat, @liggitt_ | ||
|
||
# Objectives | ||
|
||
- Define the concept of scheduling policies | ||
- Propose their initial design and scope | ||
|
||
## Non-Goals | ||
|
||
- How taints / tolerations work | ||
- How NodeSelector works | ||
- How node / pod affinity / anti-affinity rules work | ||
- How several schedulers can be used within a single cluster | ||
- How priority classes work | ||
|
||
# Background | ||
|
||
During real-life Kubernetes architecting we encountered contexts where role-isolation (between administration and simple namespace usage in a multi-tenant context) could be improved. So far, no restriction is possible on toleration, priority class usage, nodeSelector, anti-affinity depending on user permissions (RBAC). | ||
|
||
Identified use-cases aim to ensure that administrators have a way to restrict users or namepace when | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. These aren't great use cases - I really expected a more end kubernetes user focus:
etc. Something as critical as policy needs a lot more use case design before we even get into implementation details. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm probably not going to be happy until at least 300 lines of this doc is a detailed justification for the design space and what we are actually trying to build. If that exists elsewhere, please link it here. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @smarterclayton - Indeed there's some another use case I've got with @arnaudmz, we'll add them. I'll try to put up a design section to have an overview about this policy, SGTY ? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See also discussions in: kubernetes/kubernetes/issues: |
||
- using schedulers, | ||
- placing pods on specific nodes (master roles for instance), | ||
- using specific priority classes, | ||
- expressing pod affinity or anti-affinity rules. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm confused by these last three items - those are things that are specified on the pod, not the policy? It seems like you already covered the policy use-cases? (except for the last one, maybe "Enforce anti-affinity requirements between pods in specific namespaces") There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes I'll do a rewrite on this. |
||
|
||
# Overview | ||
|
||
Implementing SchedulingPolicy implies: | ||
- Creating a new resource named **SchedulingPolicy** (schedpol) | ||
- Creating an **AdmissionController** that dehaves on a deny-all-but basis | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I don't understand this sentence. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. dehaves->behaves? |
||
- Allow SchedulingPolicy to be used by pods using RoleBindings or ClusterRoleBindings | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should consider applying the policies to namespaces. That's more aligned with similar K8s policies, such as quota. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The point was to be aligned with the PodSecurityPolicy principles:
If I understand well, you seem more found of a non-breaking approach There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @bsalamat - I'm not sure we want to do that, integrating with RBAC would be better in term of experience (e.g. grant cluster-wide usage on a There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think we should support namespaced policy as well. For example, we want I think we should have 2 policies or atleast have a field in this spec. The global scheduling policies are applied to every pod and some of the fields in global policy cannot be overridden by local policy(created at namespace level) and even if the local policy is beyond scope of current proposal, we should include the fields which cannot be overridden at namespace level, if we go this route. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ravisantoshgudimetla - this is all enabled by RBAC (roleBindings allow the verb «use » on a schedulingPolicy on a specific namespace, ClusterRoleBindings on the other hand will allow a cluster-wide usage of a schedulingPolicy, having a namespaceSelector is not viable for this usecase) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @ravisantoshgudimetla: to be very precise, enforcing
apiVersion: extensions/valpha1
kind: SchedulingPolicy
metadata:
name: policyB
spec:
allowed:
schedulerNames: ["default-scheduler", "schedulerB"]
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: policyB
rules:
- apiGroups: ['extensions']
resources: ['schedulingpolicies']
verbs: ['use']
resourceNames:
- policyB
apiVersion: rbac.authorization.k8s.io/v1
kind: RoleBinding
metadata:
name: policyB
namespace: namespaceB
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: policyB
subjects:
- kind: Group
name: system:serviceaccounts:namespaceB
apiGroup: rbac.authorization.k8s.io Other service accounts (in any other namespaces) will fallback to the default There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arnaudmz @yastij Thanks. Clearly my example is not complex enough. My question is more on the lines of how do we ensure that certain attributes of scheduling like nodeSelector could come from a namespace rather than created by whom and others created by whom. For example as of now there is namespace level whitelist for tolerations. How can we tell that this toleration is valid or not until the pod creation happens. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See https://github.com/tallclair/k8s-community/blob/pod-restrict/contributors/design-proposals/auth/pod-restriction.md#policy-binding. I do not think using RBAC for policy binding is a good user experience. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. The use of RBAC to determine which PodSecurityPolicy applies is one of the most confusing things we've done in the entire system, which is saying a lot. Another model to consider is NetworkPolicy: |
||
|
||
# Detailed Design | ||
|
||
SchedulingPolicy resources are supposed to apply in a deny-all-except approach. They are designed to apply in an additive way (i.e and'ed). From Pod's perspective, a pod can use one or N of the allowed items. | ||
|
||
An AdmissionController must be added to the validating phase and must reject pod scheduling if the serviceaccount running the pod is not allowed to specify requested NodeSelectors, Scheduler-Name, Anti-Affinity rules, Priority class, and Tolerations. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Absence of node selectors is also problematic. The current podnodeselector admission plugin allows admins to force specific nodeselectors onto pods to constrain them to a subset of nodes. Any replacement for that mechanism would need to provide the same capability. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. SGTM There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Please see below some proposal which could go this way. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Will that introduce order dependence of the two admission controller? for example, It's arguable that cluster admin should configure it correctly; but that'll take time to do trouble-shooting :) |
||
|
||
All usable scheduling policies (allowed by RBAC) are merged before evaluating if scheduling constraints defined in pods are allowed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Clarify what "merged" means. That seems potentially problematic, especially in case of computing coverage of conflicting scheduling components (policy A allowed this toleration, policy B allowed that toleration, policy C required nodeSelector component master=false, policy D allows nodeSelector component master=true, etc) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As long as there was no required components nor default value, merging was quite trivial, but given that need, I guess we'll have to work on it. I'm thinking of some ways like:
Any thoughts? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. prefer to have
That's error-prone and complex; especially some corner cases. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I mean the the single term; so the request is passed only all term passed :) There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't work with the workflow of having a default cluster wide policy, and then granting specific users (or namespaces) elevated privileges. See Policy matching - union or intersection for a breakdown. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Other than pod restriction (#1950), are there other policies we're trying to align with here, approach-wise? |
||
|
||
## SchedulingPolicy | ||
|
||
Proposed API group: `extensions/v1alpha1` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Extensions is locked down. This should be in the |
||
|
||
SchedulingPolicy is a cluster-scoped resource (not namespaced). | ||
|
||
### SchedulingPolicy content | ||
|
||
SchedulingPolicy spec is composed of optional fields that allow scheduling rules. If a field is absent from a SchedulingPolicy, this schedpol won't allow any item from the missing field. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You're using a policy intersection approach to handle multiple policies, but locking down fields by default breaks composition since there's no way to open them back up. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. +1 What composition scenarios do we expect? |
||
|
||
```yaml | ||
apiVersion: extensions/valpha1 | ||
kind: SchedulingPolicy | ||
metadata: | ||
name: my-schedpol | ||
spec: | ||
allowedSchedulerNames: # Describes schedulers names that are allowed | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As Jordan has also mentioned, none of these field should have the "allowed" prefix. They should be "schedulerNames", "priorityClassNames", etc. Then the spec for each one should have a "condition" (or a similar word) that can be set to one of the "allowed", "forbidden", "default", or "required". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, that's what I meant. One more point to add is that, in Kubernetes, we usually apply Pod policies at the granularity of namespaces. So, a user should be able to specify the namespace that these rules are applied. For example, default priority class of Pods in namespace "ns-1" is "pc-1". There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @yastij: yes, that was the point of mimicing the PSP RBAC principle: using RoleBindings or ClousterRoleBindings to apply the policies at serviceaccount, namespace or cluster scope. |
||
allowedPriorityClasseNames: # Describes priority classe names that are allowed | ||
allowedNodeSelectors: # Describes node selectors that can be used | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Discuss default vs required vs allowed vs forbidden Typically, fencing nodes via selector involves requiring a specific set of labels/values (in addition to whatever else the pod wants), e.g. master=false,compute=true |
||
allowedTolerations: # Describes tolerations that can be used | ||
allowedAffinities: # Describes affinities that can be used | ||
``` | ||
|
||
### Scheduler name | ||
|
||
It should be possible to allow users to use only specific schedulers using `allowedSchedulerNames` field. | ||
|
||
If `allowedSchedulerNames` is absent from SchedulingPolicy, no scheduler is allowed by this specific policy. | ||
|
||
#### Examples | ||
|
||
Allow serviceaccounts to use either the default-scheduler (which is used by specifying `spec.schedulerName` in pod definition) or the `my-scheduler` scheduler (by specifying `spec.schedulerName: "my-scheduler"`): | ||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedSchedulerNames: | ||
- default-scheduler | ||
- my-scheduler | ||
``` | ||
|
||
|
||
Allow all schedulers: | ||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedSchedulerNames: [] | ||
``` | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This second policy needs a description. |
||
|
||
|
||
### Tolerations | ||
|
||
Toleration usage can be allowed using fine-grain rules with `allowedTolerations` field. If specifying multiple `allowedTolerations`, pod will be scheduled if one of the allowedTolerations is satisfied. | ||
|
||
If `allowedTolerations` is absent from SchedulingPolicy, no toleration is allowed. | ||
|
||
#### Examples | ||
|
||
##### Fine-grain allowedTolerations | ||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedTolerations: | ||
- keys: ["mykey"] | ||
operators: ["Equal"] | ||
values: ["value"] | ||
effects: ["NoSchedule"] | ||
- keys: ["other_key"] | ||
operators: ["Exists"] | ||
effects: ["NoExecute"] | ||
``` | ||
This example allows tolerations in the following forms: | ||
- tolerations that tolerates taints with key named `mykey` that has a value `value` and with a `NoSchedule` effect. | ||
- tolerations that tolerates taints with key `other_key` that has a `NoExecute` effect. | ||
|
||
##### Coarse-grain allowedTolerations | ||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedTolerations: | ||
- keys: [] | ||
operators: [] | ||
values: [] | ||
effects: ["PreferNoSchedule"] | ||
- keys: [] | ||
operators: ["Exists"] | ||
effects: ["NoSchedule"] | ||
``` | ||
This example allows tolerations in the following forms: | ||
- tolerations that tolerates all `PreferNoSchedule` taints with any value. | ||
- tolerations that tolerates taints based on any key existence with effect `NoSchedule`. | ||
Also note that this SchedulingPolicy does not allow tolerating NoExecute taints. | ||
|
||
|
||
### Priority classes | ||
|
||
We must be able to enforce users to use specific priority classes by using the `allowedPriorityClasseNames` field. | ||
|
||
If `allowedPriorityClasseNames` is absent from SchedulingPolicy, no priority class is allowed. | ||
|
||
#### Examples | ||
|
||
##### Only allow a single priority class | ||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedPriorityClasseNames: | ||
- high-priority | ||
``` | ||
In this example, only the `high-priority` PriorityClass is allowed. | ||
|
||
|
||
##### Allow all priorities | ||
|
||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedPriorityClasseNames: [] | ||
``` | ||
In this example, all priority classes are allowed. | ||
|
||
### Node Selector | ||
|
||
As anti-affinity rules are really time-consuming, we must be able to restrict their usage with `allowedNodeSelectors`. | ||
|
||
If `allowedNodeSelectors` is totally absent from the spec, no node selector is allowed. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This doesn't make sense. A pod with no nodeSelector targets the most nodes possible. Adding more selectors constrains a pod. Generally, you want to require a set of nodeSelector labels be present, error if the pod tries to specify nodeSelector components that conflict with that required set, and allow the pod to specify any additional nodeSelector components it wants. That is what the current podnodeselector admission plugin does There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Do you think we could do it this way? apiVersion: extensions/valpha1
kind: SchedulingPolicy
metadata:
name: my-schedpol
spec:
nodeSelectors:
required:
beta.kubernetes.io/arch: ["amd64", "arm"] # pick one of thoses mandadory values
default:
beta.kubernetes.io/os: amd64 # Here is the default value unless specified
allowed:
failure-domain.beta.kubernetes.io/region: [] # any value can be sepcified Given the deny-by-default design, some kind of There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @arnaudmz - I agree, given the design, |
||
|
||
#### Examples | ||
|
||
##### Fine-grained policy | ||
|
||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedNodeSelectors: | ||
disk: ["ssd"] | ||
region: [] # means any value | ||
``` | ||
In this example, pods can be scheduled only if they either: | ||
- have no nodeSelector | ||
- or have a `disk: ssd` nodeSelector | ||
- and / or have a `region` key nodeSelector with any value | ||
|
||
### Affinity rules | ||
|
||
As anti-affinity rules are really time-consuming, we must be able to restrict their usage with `allowedAffinities`. | ||
`allowedAffinities` is supposed to keep a coarse-grained approach in allowing affinities. For each type (`nodeAffinities`, `podAffinities`, `podAntiAffinities`) a schedulingpolicy can list allowed constraints (`requiredDuringSchedulingIgnoredDuringExecution` | ||
or `requiredDuringSchedulingIgnoredDuringExecution`). | ||
|
||
If `allowedAffinities` is totally absent from the spec, no affinity is allowed whatever its kind. | ||
|
||
#### Examples | ||
|
||
##### Basic policy | ||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedAffinities: | ||
nodeAffinities: | ||
- requiredDuringSchedulingIgnoredDuringExecution | ||
podAntiAffinities: | ||
- requiredDuringSchedulingIgnoredDuringExecution | ||
- preferredDuringSchedulingIgnoredDuringExecution | ||
``` | ||
|
||
##### Allow-all policy | ||
In this example, all affinities are allowed: | ||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedAffinities: | ||
nodeAffinities: [] | ||
podAffinities: [] | ||
podAntiAffinities: [] | ||
``` | ||
|
||
If a sub-item of allowedAffinities is absent from SchedulingPolicy, it is not allowed e.g: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is confusing. You're saying that if no sub-items are specified they're all allowed, but as soon as you specify one, the others are implicitly denied? |
||
```yaml | ||
Kind: SchedulingPolicy | ||
spec: | ||
allowedAffinities: | ||
nodeAffinities: [] | ||
``` | ||
In this example, only soft and hard nodeAffinities are allowed. | ||
|
||
### When both `allowedNodeSelectors` and `nodeAffinities` are specified | ||
|
||
Use of both `allowedNodeSelectors` and `nodeAffinities` is not recommended as the latter being way more permissive. | ||
|
||
## Default SchedulingPolicies | ||
|
||
### Restricted policy | ||
Here is a reasonable policy that might be allowed for any cluster without specific needs: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I've already lost track of which fields are closed by default, and which are open. I'm worried this is too difficult to reason about. |
||
```yaml | ||
apiVersion: extensions/valpha1 | ||
kind: SchedulingPolicy | ||
metadata: | ||
name: restricted | ||
spec: | ||
allowedSchedulerNames: ["default-scheduler"] | ||
``` | ||
It only allows usage of the default scheduler, no tolerations, nodeSelectors nor affinities. | ||
|
||
Multi-archi (x86_64, arm) or multi-OS (Linux, Windows) clusters might also allow the following nodeSelectors: | ||
```yaml | ||
apiVersion: extensions/valpha1 | ||
kind: SchedulingPolicy | ||
metadata: | ||
name: restricted | ||
spec: | ||
allowedSchedulerNames: ["default-scheduler"] | ||
allowedNodeSelectors: | ||
beta.kubernetes.io/arch: [] | ||
beta.kubernetes.io/os: [] | ||
``` | ||
|
||
### Privileged Policy | ||
|
||
This is the privileged SchedulingPolicy, it allows usage of all schedulers, priority classes, nodeSelectors, affinities and tolerations. | ||
|
||
```yaml | ||
apiVersion: extensions/valpha1 | ||
kind: SchedulingPolicy | ||
metadata: | ||
name: privileged | ||
spec: | ||
allowedSchedulerNames: [] | ||
allowedPriorityClasseNames: [] | ||
allowedNodeSelectors: {} | ||
allowedTolerations: | ||
- keys: [] # any keys | ||
operators: [] # => Equivalent to ["Exists", "Equals"] | ||
values: [] # any values | ||
effects: [] # => Equivalent to ["PreferNoSchedule", "NoSchedule", "NoExecute"] | ||
allowedAffinities: | ||
nodeAffinities: [] | ||
podAffinities: [] | ||
podAntiAffinities: [] | ||
``` | ||
|
||
## RBAC | ||
SchedulingPolicy are supposed to be allowed using the verb `use` to apply at pod runtime | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I strongly discourage this approach. See Policy Binding There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @tallclair @bsalamat @liggitt @smarterclayton - I'm fine with having a list of namespaces + namespace selector. |
||
|
||
the following default ClusterRoles / ClusterRoleBindings are supposed to be provisioned to ensure at least the default-scheduler can be used. | ||
|
||
RBAC objects are going to be auto-provisioned at cluster creation / upgrade. | ||
|
||
|
||
This ClusterRole allows the use of the default scheduler: | ||
```yaml | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRole | ||
metadata: | ||
annotations: | ||
rbac.authorization.kubernetes.io/autoupdate: "true" | ||
labels: | ||
kubernetes.io/bootstrapping: rbac-defaults | ||
name: sp:restricted | ||
rules: | ||
- apiGroups: ['extensions'] | ||
resources: ['schedulingpolicies'] | ||
verbs: ['use'] | ||
resourceNames: | ||
- restricted | ||
``` | ||
|
||
This ClusterRoleBinding ensures any serviceaccount can use the default-scheduler: | ||
```yaml | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: ClusterRoleBinding | ||
metadata: | ||
annotations: | ||
rbac.authorization.kubernetes.io/autoupdate: "true" | ||
labels: | ||
kubernetes.io/bootstrapping: rbac-defaults | ||
name: sp:restricted | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: ClusterRole | ||
name: sp:restricted | ||
subjects: | ||
- kind: Group | ||
name: system:authenticated | ||
apiGroup: rbac.authorization.k8s.io | ||
``` | ||
|
||
This RoleBinding ensures that kube-system pods can run with no scheduling restriction: | ||
```yaml | ||
apiVersion: rbac.authorization.k8s.io/v1 | ||
kind: RoleBinding | ||
metadata: | ||
annotations: | ||
rbac.authorization.kubernetes.io/autoupdate: "true" | ||
labels: | ||
kubernetes.io/bootstrapping: rbac-defaults | ||
name: sp:kube-system-privileged | ||
namespace: kube-system | ||
roleRef: | ||
apiGroup: rbac.authorization.k8s.io | ||
kind: ClusterRole | ||
name: sp:privileged | ||
subjects: | ||
- kind: Group | ||
name: system:serviceaccounts:kube-system | ||
apiGroup: rbac.authorization.k8s.io | ||
``` | ||
# References | ||
- [Pod affinity/anti-affinity](https://kubernetes.io/docs/concepts/configuration/assign-pod-node/#affinity-and-anti-affinity) | ||
- [Pod priorities](https://kubernetes.io/docs/concepts/configuration/pod-priority-preemption/) | ||
- [Taints and tolerations](https://kubernetes.io/docs/concepts/configuration/taint-and-toleration/) | ||
- [RBAC](https://kubernetes.io/docs/admin/authorization/rbac/) | ||
- [Using multiple schedulers](https://kubernetes.io/docs/tasks/administer-cluster/configure-multiple-schedulers/) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
true 😄