Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Shouldn't KubeSpawner's cmd/args map to the K8s container's command/args? #493

Open
consideRatio opened this issue Mar 9, 2021 · 13 comments
Labels

Comments

@consideRatio
Copy link
Member

consideRatio commented Mar 9, 2021

I saw that c.KubeSpawner.default_url was reported to not have an impact and decided to investigate.

I end up concluding that configuring the command that the container starts with is very complicated. The Spawner, K8s, and the Dockerfile all have various names for similar things that all combine into a k8s containers args field where the command field apparently isn't configurable to my surprise.

  • cmd, args, notebook_dir, default_url, and get_args(...) are defined in in the Spawner base class
  • get_args(...) combines notebook_dir, default_url, and args into a single appendix to cmd. There are changes to this in JupyterHub version 2.0.0 though, see the get_args definition.
  • KubeSpawner sets the containers k8s field args to the variable real_cmd decided below, which changed in 5ee6dc6. Note that in k8s manifests command and args are the equivalent of Dockerfiles entrypoint and cmd.
        async def get_pod_manifest(self):
            # ...
            if self.cmd:
                real_cmd = self.cmd + self.get_args()
            else:
                # change commit comment:
                # fix use of get_args() (calling get_args() alone would omit the command)
    -           real_cmd = self.get_args()
    +           real_cmd = None
  • Z2JH's default config of KubeSpawner's cmd is [] for z2jh >= 2.0.0 but jupyterhub-singleuser for z2jh < 2.0.0. This means that by default, z2jh 2.0.0+ will expose this bug by default and silently ignore args, notebook_dir, or default_url unless cmd is set explicitly as well. (Related PR: Unset singleuser.cmd, previously jupyterhub-singleuser, to instead rely on the image's CMD by default zero-to-jupyterhub-k8s#2449)

Investigation conclusion

  1. I'm confused we opt to set args of the k8s container based on the traitlets cmd, args, notebook_dir, default_url instead of setting command to cmd, and setting args to default_url, notebook_dir and args.
  2. I think the logic to silently ignore the traitlets args, notebook_dir, default_url when cmd is unset is problematic and should lead to a warning at least.
@consideRatio consideRatio changed the title Is there a bug?cmd, args, notebook_dir, default_url Shouldn't KubeSpawner's cmd/args map to the K8s container's command/args? Mar 9, 2021
@behrica
Copy link

behrica commented Apr 15, 2021

I am trying to launch a rstudio image with this, and I am very much struggling that I can not "completely control" the command and args passed to starting the container

@manics
Copy link
Member

manics commented May 1, 2021

Do you think we can resolve this issue before Z2JH 1.0.0 is released? Either by adding docs and warnings, or perhaps by adding a new entrypoint parameter, and making it clear that entrypoint and cmd have their Docker semantics.

Although this is a K8s project I think it's fine to use the Docker terminology, especially as it's also the Open Container Initiative terminology. What's more important is that we make it clear and use consistent names throughout.

@consideRatio
Copy link
Member Author

I agree that we must be consistent, and in this case we must be consistent against the Spawner base class that is k8s unaware, so then adding entrypoint would make sense alongside cmd.

We would then document that cmd and entrypoint refers to the Dockerfile names, and that we will set the k8s Pod's container's args and command section when we set cmd and entrypoint in KubeSpawner.

To conclude: 👍 for adding entrypoint as a KubeSpawner traitlet to configure the containers command.

@mapsacosta
Copy link

@consideRatio is there going to be a separate fix for c.KubeSpawner.default_url ?

@manics
Copy link
Member

manics commented May 2, 2021

@mapsacosta if you set cmd the other args such as default_url should be passed. If it's not working can you show us your full configuration?

@consideRatio
Copy link
Member Author

consideRatio commented May 2, 2021

Update

In the initial post I wrote investigative conclusions, and I want to give an update about those two points.

  1. I'm confused we opt to set args of the k8s container based on the traitlets cmd, args, notebook_dir, default_url instead of setting command to cmd, and setting args to default_url, notebook_dir and args.
  2. I think the logic to silently ignore the traitlets args, notebook_dir, default_url when cmd is unset is problematic and should lead to a warning at least.
  • Point 1 is addressed by:
    1. defining the new configuration entrypoint alongside cmd, where we also document that these relate to the Dockerfile's ENTRYPOINT and CMD rather than k8s command and args equivalent Container specifications.
    2. clarifying that the cmd configuration if set also augments itself with spawner.get_args() which considers the configurations ip, port, default_url, notebook_dir, debug, disable_user_config, args.
  • No clear decision on how to address point 2 isn't taken and I'm struggling to suggest a sensible solution.
    • Not setting cmd currently influence a lot to a weird undocumented state by omitting all the associated configuration silently.
    • We could pass spawner.get_args() no matter what, but then we probably introduce a breaking change for someone or start flooding their startup script with weird command line flags.
    • Ideas:
      • to toggle passing get_args() by another logic than if cmd is set or not
      • to always pass get_args() no matter if cmd is set or not

About spawner.get_args()

Note that the args considered by get_args() are as defined in JupyterHub's spawner.py:

Configuration Resulting argument
spawner.ip --ip=...
spawner.port --port=...
spawner.default_url --SingleUserNotebookApp.default_url=...
spawner.notebook_dir --notebook-dir=...
spawner.debug --debug
spawner.disable_user_config --disable-user-config
spawner.args <user defined args>

Action points

  • Add entrypoint configuration (@consideRatio works on this)
  • Update docs etc to bring more clarity to cmd, entrypoint, and args (@consideRatio works on this)
  • Clarify how to resolve point 2 (@manics @minrk could you help deliberate about this?)
  • Resolve point 2

Related

@manics
Copy link
Member

manics commented May 2, 2021

A non-breaking way to achieve (2) could be to treat the empty string as a special value, so cmd = null preserves the existing behaviour, and cmd = "" means add all args?

On the other hand if we go for a breaking change I think either we should keep the ability to not pass additional arguments, or clearly document the command arguments interface that should be supported.

Originally jupyter-notebook was the only client, but now we have jupyter-lab, jupyter-server, and people have written their own alternatives such as https://discourse.jupyter.org/t/new-package-to-run-arbitrary-web-service-in-jupyterhub-jhsingle-native-proxy/3493 (which is also one of the motivations for jupyterhub/zero-to-jupyterhub-k8s#2138)

@consideRatio
Copy link
Member Author

consideRatio commented May 2, 2021

A non-breaking way to achieve (2) could be to treat the empty string as a special value, so cmd = null preserves the existing behaviour, and cmd = "" means add all args?

Hmmm... I think we should opt for a potentially breaking change to resolve the future needs properly.

  1. Ability to configure ENTRYPOINT and CMD very clearly without interference from get_args()
  2. Ability to opt-in/opt-out of augmenting get_args()

I realize now that we can't augment get_args() to the Dockerfile's defined CMD because it would simply be overridden entirely, and that is probably the logic for now always passing get_args().

So... setting spawner.cmd will override the Dockerfile's CMD, so get_args() won't augment with it and would often not make sense unless something like jupyterhub-singleuser was defined in the ENTRYPOINT.

Also... you may want to configure spawner.cmd without ending up having get_args() augmented.

I think what we must support is opt-in/out-out of get_args(), and this can be in three modes:

  • always append if cmd is set (current KubeSpawner behavior)
  • always append to cmd independent if its set or not (Dockerfile CMD will never be considered)
  • never append to cmd independent if its set or not

I wonder if perhaps the Spawner base class should add some configuration about this though rather than KubeSpawner, and then we make use and adjust around that? The current behavior in KubeSpawner is KubeSpawner specific, and I think the Spawner base class current behavior is to always append get_args() to cmd.

@minrk
Copy link
Member

minrk commented May 3, 2021

It is extremely confusing that Kubernetes renames Docker's Entrypoint and Command as Command and Args, respectively!

While Kubernetes' names more accurately describe what happens (a command is passed arguments), I think the docker names more accurately describe what they are for (an entrypoint prepares an environment in which to launch a command).

In that sense, cmd and args are used to build a single command list, which we pass to kubernetes container arguments, just like we would with docker run $image command ... and I think we should never interact with the entrypoint at all, other than requiring that it allow arbitrary commands.

This is part of why I think things like docker stacks should implement their environment customization in entrypoint instead of CMD. Overriding CMD shouldn't be hugely consequential, but it is for docker-stacks.

I think what we must support is opt-in/out-out of get_args(), and this can be in three modes:

  • always append if cmd is set (current KubeSpawner behavior)
  • always append to cmd independent if its set or not (Dockerfile CMD will never be considered)
  • never append to cmd independent if its set or not

I'd add that option 2 would also require restoring the default cmd = 'jupyterhub-singleuser' because it doesn't make sense to append args without starting with the command itself. This would also make KubeSpawner consistent with all the non-container-based Spawners.

I wonder if perhaps the Spawner base class should add some configuration about this though rather than KubeSpawner, and then we make use and adjust around that?

I think this is really a KubeSpawner-specific issue, and should probably be a kubespawner option. It is really only the "get the default command from the image with kubespawner" case that is affected, because kubespawner cannot actually retrieve this info from the image. If the command is specified, everything works.

The Spawner API is that cmd ultimately launches jupyterhub-singleuser, and args will be passed to it. It is part of the API that this command accepts the CLI args of jupyterhub-singleuser, even if it may be a wrapper or adapter. Launching a command other than that is not currently supported by JupyterHub. What makes KubeSpawner (and DockerSpawner) unique and require special handling is that they have another source for the default command: the CMD field of the image itself. Dockerspawner deals with this by inspecting the image to extract CMD before appending arguments, if Spawner.cmd is unset. I don't know how feasible that would be in KubeSpawner (we'd have to deal with all kinds of registry and pull secret stuff to get it, I think). So I'm not sure what we can do to support that here. Maybe this means using the image's CMD by default is not something that's feasible in KubeSpawner.

This is related to why I opened jupyterhub/jupyterhub#3381 - it would be nice if we could stop specifying CLI args (by default) in Spawners, and do everything through the env, but it's trickier to remove than I realized. I would like to get to a point where JupyterHub internally does not turn any options into CLI args, other than explicit CLI args from user config, and only communicates options through the environment. Then this problem would go away, I think...

@meeseeksmachine
Copy link

This issue has been mentioned on Jupyter Community Forum. There might be relevant details there:

https://discourse.jupyter.org/t/kubespawner-and-ldapauthentication-run-under-users-ldap-uid/8988/7

@minrk
Copy link
Member

minrk commented Jun 7, 2022

I think this issue is mostly resolved by the fact that jupyterhub 2.0 does not specify any CLI arguments, nor place any requirements on the CLI signature of the launch command, and never will again. That means that Spawner.args no longer needs to conform to jupyterhub-singleuser, it only needs to conform to whatever the user chooses for cmd. A key point is that there's no difference between specifying singleuser.cmd and singleuser.args vs just concatenating the two in singleuser.cmd. In fact, given that the only case that doesn't work is specifying singleuser.args when using the command from the image (i.e. singleuser.cmd unspecified), arguably with KubeSpawner one should always specify singleuser.cmd as the full list with any args they want to pass, and can safely ignore singleuser.args.

This simplifies the configuration explanation for z2jh in 2.0:

If you want to specify the command to launch, there are two possibilities:

  1. (default) specify nothing to use the image's default command, which is assumed to eventually launch jupyterhub-singleuser, or
  2. customize the launch command and any arguments, via Spawner.cmd, for example:
    c.Spawner.cmd = ["jupyterhub-singleuser", "--some-arg=5", "..."]

Customizing only the arguments without specifying the command to launch (i.e. setting Spawner.args without Spawner.cmd) is not supported.

@consideRatio
Copy link
Member Author

consideRatio commented Jun 7, 2022

Thank you for clarifying things @minrk! I'm not yet comfortable with the situation about cmd in Spawner, KubeSpawner, and z2jh and consider if we should make some change before z2jh 2.0.0 about that.

Renewed overview

Brainstorming

  • Do we want KubeSpawner to override cmd?

    I'm thinking using ["jupyterhub-singleuser"] as a default cmd could be fine, and removing the override in kubespawner would reduce some complexity perhaps?

    If we do this though, we need z2jh to be able to set the command to None and [] respectively, which both are different. I think the way to do that is via a new helper set_config_if_defined() in z2jh's provided utility functions for use by z2jh's jupyterhub_config.py.

  • Why do we set c.Spawner.cmd and c.Spawner.default_url instead of c.KubeSpawner.cmd and c.KubeSpawner.default_url in z2jh's jupyterhub_config.py? It seems weird that KubeSpawner overrides cmd, but we set c.Spawner.cmd.

  • Is there an action point to take pre z2jh 2.0.0 release?

    Overall, I hope to resolve this once and for all and don't mind doing work to make changes if we can come up with changes that makes sense to make.

Related

@minrk
Copy link
Member

minrk commented Jun 7, 2022

Do we want KubeSpawner to override cmd?

The default is less important to me; what's important to me is that the "use the image's command" case is clear and simple. If we can accomplish that, however we accomplish that, I'm okay with switching back to jupyterhub-singleuser to match other Spawners.

Whatever we do with the default, I think it's probably still better for the z2jh approach to specify either the full singleuser.cmd, or whatever special value (None, empty list, '$imageCommand'), but not expose (via documentation / schema) Spawner.args due to the fact that it only works when singleuser.cmd is also specified.

Why do we set c.Spawner.cmd and c.Spawner.default_url instead of c.KubeSpawner.cmd and c.KubeSpawner.default_url in z2jh's jupyterhub_config.py?

It doesn't make a difference in our case, but it's typical to set config either on the class that defines the trait, or on the subclass you know you are using. It makes no difference when there is only one class you are configuring, but if another class came up that was a sibling of KubeSpawner (inherits from Spawner but not KubeSpawner), there is a distinction:

  • Setting it on Spawner would affect both KubeSpawner and CustomSpawner
  • Setting on KubeSpawner would only affect KubeSpawner, not CustomSpawner

That's mostly hypothetical in the current situation where there's really only one class to configure (or possible custom subclasses of KubeSpaner), so if things are clearer to set them on KubeSpawner only, that's okay, too. I would typically apply config to the class that defines the trait, though.

Traitlets config loading looks through the class inheritance, so it'll find it on Spawner, KubeSpawner, or anything in between that has the trait (if there were something in between).

Is there an action point to take pre z2jh 2.0.0 release?

I guess we need to decide if we want to revert the "use image by default" change. If not, then document the two choices (singleuser.cmd or 'default').

If we go back to jupyterhub-singleuser, then we need to revert the change, and make sure to implement and document the 'use the image' option, since that not working is what started the whole process.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants