Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Part 13 of #758] Added scheduling.userDummy #931

Closed
wants to merge 10 commits into from

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented Sep 11, 2018

Warning

Needs rebase before merge. The actual content of this PR is actually only the following commit.


About

Read about it in schema.yaml.

We are now using a trick to restart the hub whenever the hash of the
hub's configmap changes. This commit modernizes old logic from when
that trick wasn't in use and reduces the need for an inline explanation
of the code.
```yaml
hub.jupyter.org/storage-kind: user # or core, for the hub-db-dir
```

Extra storage labels are also configurable through config.yaml and
`singleuser.storageExtraLabels`.
<core|user> pods now get `tolerations` for the node taint
`hub.jupyter.org/dedicated=<user|core>:NoSchedule` that could optionally
be added to nodes or all nodes in a node group (aka. node pool).

Note that due to a bug with using the `gcloud` CLI, we also add the
toleration for the same taint where `/` is replaced with `_`.

In this commit, `singleuser.extraTolerations` are now also made
configurable allowing you to add your own custom tolerations to the
user pods.
These affinities allow for more fine grained control of where a pod will 
schedule.
Read about them in schema.yaml
When a user pod is about to schedule, should it favor scheduling on a
node with real users on it? A real user is a user pod as compared to a
user-placeholder pod.

Enabling this will pack real users together with each other even in
situations where a few situations where the scheduler may fail to
discern two nodes from a resource request perspective.

Note that this is a minor tweak, and is not recommended for very large
clusters as it can reduce the schedulers performance.
@minrk
Copy link
Member

minrk commented Sep 12, 2018

Even after reading, I don't quite understand what the user-dummy is for. This isn't the placeholder, is it? Is it strictly a testing facility? If so, I think it probably doesn't belong in the chart.

@consideRatio
Copy link
Member Author

consideRatio commented Sep 12, 2018

It is to mimic a real user as compared to a placeholder that has lower priority. If we as developers or a person setting up autoscaling want to verify eviction logic works as intended, I think we should have this. If we do not want to help developers / chart users verify this we can certainly delete it, but its worst cost currently is being a resource with 0 pods, and simply existing in the code base.

This could allow for quite simple automated tests that takes very short time to execute.

@minrk
Copy link
Member

minrk commented Sep 12, 2018

From my reading of #929, placeholders can be used to do the same thing. Is that not true?

@consideRatio
Copy link
Member Author

@minrk you can test the eviction logic if you have a pods with different priority. With user-placeholder pods and together with either real users or user dummies you can do it. I did it by actually signing in and out, clicking start server stop server etc initially. It was very inefficient and I realized that this was costing a huge amount of my time and would cost a bit of cloud resources as well.

So the essence is that one need both placeholders + real users (or dummies), as they will need to be given different priority based on having different priorityClassName specified. Attempts to use the same deployment / statefulset for the same purpose is failing because the pods are recreated if you change the spec. The priority field, provided by the kubernetes-API server based on specifying a priorityClassName cannot be changed after the resource has been created either.

@minrk
Copy link
Member

minrk commented Sep 13, 2018

I think it makes the most sense to move this to a documentation/example rather than a utility we deploy.

@consideRatio
Copy link
Member Author

@minrk I think should be available as a zero replica controller, or dropped without regret entirely and only documented as a closed PR. Closing!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants