Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Part 7 of #205] Affinities (node, pod, anti_pod - required and preferred) #239

Merged
merged 1 commit into from
Aug 18, 2018

Conversation

consideRatio
Copy link
Member

@consideRatio consideRatio commented Aug 10, 2018

There are a lot of affinities that can be set that currently isn't support to be set through kubespawner without using extra_pod_config, they are listed below.

Without having these explicitly written out, I realized it was super hard to write a concise Helm template code to generate proper affinities for various pods specs declared in helm templates at the same time as configuring kubespawner through traitlets through values hardcoded in a jupyterhub_config.py file along with a config.yaml file.

All in all, this is therefore required to accomplish the best autoscaling experience in z2jh.

# will be supported by this PR
node_affinity_preferred
node_affinity_required
pod_affinity_preferred
pod_affinity_required
pod_anti_affinity_preferred
pod_anti_affinity_required

Fixes #200

@minrk
Copy link
Member

minrk commented Aug 18, 2018

This PR makes me think we don't have the right strategy for exposing Kubernetes options. As much as possible, I would like to passthrough config to Kubernetes itself, rather than requiring a new traitlet for every conceivable Kubernetes option. That's never going to be sustainable, as we need to develop as fast as Kubernetes, and duplicate their documentation for every option.

Instead, I thing we should as much as possible, rely on "passthrough" options (e.g. extra_pod_spec) and only directly expose options that are applied in several places, or require additional processing (e.g. namespace, pod_name_template).

So I would like to set this goal for exposing new options via traitlets:

  1. is it possible to set the option already, via a passthrough? If so, don't add it.
  2. if it's not possible via a passthrough, is there a passthrough that should be added, instead of an individual option? If so, add the passthrough instead of the individual option.
  3. does special logic need to be added in KubeSpawner to handle it, e.g. templating, re-use in multiple contexts, etc.? If so, can this be done at a higher level?

Only if none of these conditions are met should we expose a new Kubernetes option via a new traitlet. DockerSpawner takes this approach, and has very few options, but still exposes ~everything you can possibly do with Docker because it lets you passthrough arbitrary extra info to each call to the docker API via config. The advantage of the extra_foo strategy is that documentation of options is not our responsibility. Our responsibility is to document the few passthroughs we have, and then link to the Kubernetes docs themselves for those APIs, because our API is their API.

What do you think? Are there reasons that can't work? As a test case, can any/all of these options be set via extra_pod_config or other options now? If so, I'd vote that we don't merge this or #238 for now, instead opting for documenting extra_pod_config as the official way to do these things.

@consideRatio
Copy link
Member Author

consideRatio commented Aug 18, 2018

Thank you @minrk for the review ❤️!

TL;DR Summary
Yes! It isn't great having to map k8s API to kubespawner API and keep it updated. Until we have made a bigger rework of how things are done though, I think we should at least add mappings to fields exactly like the affinity field of a Pod's PodSpec, since it is such a "deep dictionary" where values are dicts whose values are dicts etc.

This PR contains essential logic for constructing the affinity field of the PodSpec based on what's been set in six different arrays, and it needs to be either here or in Z2JH. I'm confident it will fit best here and help the most people like this.


I think the biggest work with creating field mapping for kubespawner API to k8s API was to actually start supporting most available fields in the first place, after that the PodSpec will be quite stable I think. This is the k8s API defintion for the PodSpec-V1-Core.

I figure there will be a big difference using extra_pod_spec for a field that is directly in the PodSpec. A field that is nested deep within PodSpec it is troubling to use with extra_pod_spec as one would need quite extensive logic to add/modify these fields. The affinity field is the most cumbersome field there is in the PodSpec.

the node_affinity_required is nested like NodeSelectorTerm objects in the final array:

pod.spec.affinity.nodeAffinity.requiredDuringSchedulingIgnoredDuringExecution.nodeSelectorTerms[]

K8s API types involved in these field names:

  1. affinity aka Affinity:

  2. nodeAffinity aka NodeAffinity:

  3. requiredDuringSchedulingIgnoredDuringExecution aka NodeSelector

  4. nodeSelectorTerms[] aka NodeSelectorTerm[]

So node_affinity_required for example is nested deep within the affinity field of the PodSpec, it is a mess to add it and if you would update the field after adding it it would be even messier! I think this logic can be delegated to kubespawner or required to be reeimplimented by any users of kubespawner, I'd love to see it in kubespawner rather in each jupyterhub_config.py that builds up the final jupyterhub_config.py. This would lead to us mitigating the issue by adding such helper function to the z2jh.py file for the hub image though.


I would really want to avoid asking users to user kubespawner through extra_pod_config systematically, but perhaps we could make easier in another way or so, such as using the API definitions of PodSpec to declare what fields are available automatically etc. This would be a big project though.


  1. is it possible to set the option already, via a passthrough? If so, don't add it.

Ok for me: Is this option a bool or string right on top of PodSpec, don't add it.

Those kind of fields are the simplest to add though and could help users of kubespawner avoid needing to know the full kubernetes API though.

  1. if it's not possible via a passthrough, is there a passthrough that should be added, instead of an individual option? If so, add the passthrough instead of the individual option.

There are two passthroughs currently, extra_pod_spec and extra_container_spec, I don't think there will be a need for more.

  1. does special logic need to be added in KubeSpawner to handle it, e.g. templating, re-use in multiple contexts, etc.? If so, can this be done at a higher level?

I think kubespawner could be remade to be more dynamic/flexible and general, but the change I consider would require infrastructure additions that I don't see anyone being willing to make. I'm thinking about dynamically reading a schema like kubeval does and kubevals underlying library k8s schema downloader and to-json converter.

We could also write logic to navigate the provided pod configuration using kubernetes.client.model without assuming any knowledge of the configuration. This would allow for verification and quick failures if the verification fails at least... Hmmm...

@consideRatio consideRatio changed the title [Awaiting part 6] [Part 7 of #205] Affinities (node, pod, anti_pod - required and preferred) [Part 7 of #205] Affinities (node, pod, anti_pod - required and preferred) Aug 18, 2018
@consideRatio
Copy link
Member Author

Rebased on #238

@minrk
Copy link
Member

minrk commented Aug 18, 2018

I think as much as possible our goal in KubeSpawner should be:

we expose kubernetes options unmodified, so the Kubernetes docs are the KubeSpawner docs

I think it is more complicated for users to have to look at Kubernetes docs to see how the options behave and what values they can have, then check the KubeSpawner docs for how to specify them. If we had a single "Pod spec goes here, docs are in Kubernetes", then there is less information that users need, not more.

In particular, I think we should try very hard to have config keys where the name is exactly identical to the Kubernetes name for the same thing (snake_case vs camelCase notwithstanding).

For instance, if I know how to specify pod affinity with Kubernetes, I don't know how to specify pod affinity with KubeSpawner. I think naming things is pretty important, and when you are exposing what is an already-named public API that you are exposing and already linking to for docs, as we are here, there is a nontrivial cost to giving it a different name.

I may know that a pod affinity looks like:

podAffinity:
  preferredDuringSchedulingIgnoredDuringExecution: ...

but with this PR, after I've read the kubernetes docs for what an affinity looks like, I need to also look in the KubeSpawner docs for what we have decided to call the same option (pod_affinity_preferred). I think it's confusing to have two namespaces with similar but different names for the same options, and that's something I would like to work to reduce, not increase.

Ok for me: Is this option a bool or string right on top of PodSpec, don't add it.

I specifically mean deep config values like podAffinity here.

If a user had in values.yaml:

extraPodSpec:
  affinity:
    podAffinity:
      requiredDuringSchedulingIgnoredDuringExecution: ...

in the helm chart and that dict was passed through exactly as-is via jupyterhub_config, I think it would do exactly what we wanted. Is that not true?

it is a mess to add it and if you would update the field after adding it it would be even messier!

I don't see how this is the case. What's "a mess" about the above config?

I think kubespawner could be remade to be more dynamic/flexible and general, but the change I consider would require infrastructure additions that I don't see anyone being willing to make.

I don't see what infrastructure is required to do this. Doesn't it already work right now with extra_pod_config? What's missing? The only thing I can think of is convenient deep dictionary merging so that config from multiple sources can be merged conveniently as in helm yaml, but that's quite common in Python, with many implementations that we can use.

@consideRatio
Copy link
Member Author

Yes I share your view about not duplicating documentation. In this PR I'm referencing what field the node_affinity_required maps to for example, but indeed this name is new. But I think we both agree that it would be better not to name it what it matched in this case (affinity__node_affinity__required_during_scheduling_ignored_during_execution__node_selector_terms)


In particular, I think we should try very hard to have config keys where the name is exactly identical to the Kubernetes name for the same thing (snake_case vs camelCase notwithstanding).

I totally agree, I'm very frustrated about image_pull_secrets for example. that in KubeSpawner only accepts a string, while it is taking an array of dics with {'name': ''secretnamehere'} in the actual K8s API.


When we in z2jh use a partial jupyterhub_config.py to specify things in extra_pod_config followed by a user of the chart utilize the extraConfig in config.yaml to modify c.KubeSpawner.extra_pod_config as well in some way, I figure things get troublesome.

Well, but as you say, deep dictionary merging solves a lot of issues, but only if you have access to Python or similar. I wont have access to this in the helm charts. I have already run into the challenges of writing the Z2jh Helm chart while working on using user-placeholder pods with lower pod priority to be able to overprovision nodes - a recommended practice by Kubernetes documentation.

I want my user-placeholder pods to schedule exactly like the actual users, if I don't do this, they become pointless. To do this, we could use the kubespawner as well I guess, but then we would need to add another Hub or script to manage the spawning etc instead of controlling them through a kubernetes Deployment directly.

So, how do I make the user placeholder pods schedule like the real users? If the real users have settings nested within the extraConfig, which is a big string, it's game over. If I have access to the scheduling configuration more directly.

Using a template to add or not add the affinity
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/758/files#diff-a16251c3768fb14e158dc5118c5c0408R43

      {{- $dummy := include "jupyterhub.prepareScope" $_ }}
      {{- if $_.hasTolerations }}
      {{- include "jupyterhub.tolerations" $_ | nindent 6 }}
      {{- end }}
      nodeSelector: {{ toJson .Values.singleuser.nodeSelector }}
      {{- if $_.hasAffinity }}
      {{- include "jupyterhub.affinity" $_ | nindent 6 }}
      {{- end }}

The helper function called prepareScope
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/758/files#diff-2dfbcb890d0b7f61f2c7c11643f98b58R9

The affinity helper
Doing this while users were specifying prefered pod affinity or similar in extraPodConfig would had been impossible I think.
https://github.com/jupyterhub/zero-to-jupyterhub-k8s/pull/758/files#diff-2dfbcb890d0b7f61f2c7c11643f98b58R155

@consideRatio
Copy link
Member Author

Of all the things I've seen and configured in the K8s API, affinity has been the most troublesome, a lot of fields nested, and some of them are not named like their corresponding object models, for example requiredDuringSchedulingIgnoredDuringExecution is for the nodeAffinity of type NodeSelector.

@minrk
Copy link
Member

minrk commented Aug 18, 2018

This is an issue that's not specific to this PR, so let's merge this one. But in the future, I think we should try to pursue documenting/using extraPodConfig, and only add new exposing traits if/when that doesn't work, and first try to figure out a way to make it work before adding more traits.

@consideRatio
Copy link
Member Author

@minrk I've concluded I was wrong. It was not vital to have kubespawner break down the affinity fields. This PR didn't pan out well :'(

Should I take action and make it one big affinity field instead? Hmm... Will keep processing these things.

@consideRatio
Copy link
Member Author

Options

  1. Set affinity using extra_config traitlet
    It would be good to do everything through extraPodConfig, but starting to do only a few things doesn't make much sense to me though.
  2. Set affinity using affinity traitlet
    Sure, when this is done, we are in Python-land and can do advanced merge operations etc even though it may involve some messy conditional logic if you are extending preexisting affinities.
  3. Set affinity using the parts that build up the spec (this PR)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants