-
Notifications
You must be signed in to change notification settings - Fork 813
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] Add a secondary scheduler with policy we can tweak #543
Conversation
Still running into kubernetes/kubernetes#60469, which makes this unusable with an autoscaler. policy.json content comes from kubernetes/kubernetes#59401
Setting |
@yuvipanda KubeCon 2018 videos are out and this was very relevant for me too look at considering implementing a scheduler, especially for the singleuser-server pods! I got excited! :D) Presentations regarding scheduling
Other stuff regarding scheduling
|
roleRef: | ||
kind: ClusterRole | ||
name: {{ .Chart.Name }}-{{ .Release.Name }}-scheduler | ||
apiGroup: rbac.authorization.k8s.io |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We can do name: system:kube-scheduler
instead, that means we don't need to define our own ClusterRole as we use the already defined ClusterRole.
Picked that up from this presentation. Also found it later in this kubernetes documentation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But... if we want to support having replicas >1 of our scheduler we should add stuff to the ClusterRole under resourceNames as described in configure multiple schedulers - part 3. So if that is the case, we may need to keep using a custom defined ClusterRole.
metadata: | ||
name: {{ .Chart.Name }}-scheduler-config | ||
data: | ||
policy.json: | |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be policy.cfg ?
- --leader-elect=true | ||
- --scheduler-name={{ .Chart.Name }}-{{ .Release.Name }}-scheduler | ||
- --lock-object-namespace={{ .Release.Namespace }} | ||
- --lock-object-name={{ .Chart.Name }}-{{ .Release.Name }}-scheduler-lock |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Research note
Is the --leader-elect
is supposed to be enabled "[...] when running replicated components for high availability." and it is defaulting to true.
Reading in configure multiple schedulers - section 3 they write that if you want to setup leader election, you must update the following...
--leader-elect
--lock-object-namespace
--lock-object-name
And that one must also add the name of the scheduler to the ClusterRole under rules.apiGroups.resourceNames.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While reading in the kube-scheduler documentation I understood it as the --lock-object-name
should be an existing Endpoint object by default, but can also be a ConfigMap if we also pass --leader-elect-resource-lock=configmaps
.
Since we don't have an endpoint for our scheduler, since we don't have a Service for it currently and I'm unaware of the need for one, we should specify the configmap we use.
command:
- /usr/local/bin/kube-scheduler
- --address=0.0.0.0
- --scheduler-name=jupyterhub-scheduler
- --policy-configmap=scheduler-config
- --policy-configmap-namespace={{ .Release.Namespace }}
- --leader-elect=true
- --leader-elect-resource-lock=configmaps
- --lock-object-name=scheduler-config
- --lock-object-namespace={{ .Release.Namespace }}
- --v=4
- --scheduler-name={{ .Chart.Name }}-{{ .Release.Name }}-scheduler | ||
- --lock-object-namespace={{ .Release.Namespace }} | ||
- --lock-object-name={{ .Chart.Name }}-{{ .Release.Name }}-scheduler-lock | ||
- -v=4 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not see information about this in the documentation, it some verbosity level i figure. I've seen --v
used on the kube-scheduler binary as well (as compared to -v
). Does it matter? Hmmm...
Notes
|
TL;DRI set the image using this helper, allowing for a potential override of the kube-scheduler version but defaulting to the clusters version. {{- /*
Renders the kube-scheduler's image based on .Values.scheduler.name and
optionally on .Values.scheduler.tag. The default tag is set to the clusters
kubernetes version.
*/}}
{{- define "jupyterhub.scheduler.image" -}}
{{- $name := .Values.scheduler.image.name -}}
{{- $valuesVersion := .Values.scheduler.image.tag -}}
{{- $clusterVersion := (split "-" .Capabilities.KubeVersion.GitVersion)._0 -}}
{{- $tag := $valuesVersion | default $clusterVersion -}}
{{ $name }}:{{ $tag }}
{{- end }}
Regarding the
|
About RBACI'm not happy about needing to create a ClusterRoleBinding, but I figure we must in order to have a scheduler that works. About the policiesThe scheduler can have performance issues. How can we minimize them? I figure we might be able to remove some node node filters aka. The heavy work is probably done in the preferences / priorities. |
About kube-schedulerhttps://github.com/kubernetes/community/blob/master/contributors/devel/scheduler.md A custom NodeLabelPriority or NodeLabelPredicate does not bother with the value of the label, just if it is there or not. The MetadataPriority stuff seem to be a mixed bag of affinities etc. Resources |
Wieeeeeeeeeeeeee this took some time but your checklist was excellent @yuvipanda ! |
That would probably even allow us to use the default scheduler, but that could influence things at a broader level than we want our chart to do, so we should probably still deploy our own. Additional documentation of the KubeSchedulerConfiguration object |
Continued on in #758 |
Still running into kubernetes/kubernetes#60469,
which makes this unusable with an autoscaler.
policy.json content comes from kubernetes/kubernetes#59401
Fixes #542
automatically