Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define a process for helping communities with image changes #1145

Open
yuvipanda opened this issue Mar 24, 2022 · 9 comments
Open

Define a process for helping communities with image changes #1145

yuvipanda opened this issue Mar 24, 2022 · 9 comments

Comments

@yuvipanda
Copy link
Member

Problem

The software and configuration setup in the user image is a very important part of the end user experience. It controls:

  1. The languages available, and their versions
  2. Packages available in those languages
  3. Setup for more complicated packages, that might require setting up environment variables, dealing with conflicts, etc
  4. Additional applications (such as RStudio, Linux Desktop, etc) that are enabled, and how they are configured

The Jupyter community has developed a lot of tools - particularly, repo2docker and repo2docker-action to make managing this more tractable. Simple package installs and language versions are often doable with editing repo2docker files when they are present, and complex setups are possible by using the Dockerfile escape hatch.

However, there are more complex issues - particularly around (3) - that often require deeper expertise. There are also times when 'this is not behaving the way I think it should?' that require deeper expertise.

As 2i2c, we want to support our users in customizing their user image in two ways:

  1. Making it as easy as possible to 'self serve' most changes, leveraging the ecosystem built and maintained by the Jupyter community
  2. Avail of the deeper expertise of the 2i2c engineers when dealing with complex problems

Currently, there's no specific limits on (2), which seems unsustainable.

Proposed solution

This issue proposes the following specific actions:

  1. In all contracts, we will set up users with ability to use https://github.com/2i2c-org/hub-user-image-template to setup their own user image that they can update and use without 2i2c engineer intervention. There's work to be done in improving this self-serve capability.
  2. In all contracts, we provide a set number of hours of 2i2c engineer time that can be dedicated to (2) listed above. Unfortunately as it is difficult to categorically figure out how long a particular request will take, time based work seems to fit better than unit-based work. This request will need to come through the community representative of that particular hub. Hours can be topped up if necessary, but will cost money.
@yuvipanda
Copy link
Member Author

I think a part of this will involve us improving docs on repo2docker as well. For example, #1147 is helped by writing jupyterhub/repo2docker#1147

@GeorgianaElena
Copy link
Member

I think this a really good idea and defining this process would really help both us and the communities. The specific actions proposed are very good @yuvipanda! I'll leave some of my thoughts/questions about them below:

First action point 🔽

  1. In all contracts, we will set up users with ability to use https://github.com/2i2c-org/hub-user-image-template to setup their own user image that they can update and use without 2i2c engineer intervention. There's work to be done in improving this self-serve capability.

Question:

So basically what this means and how it differs from our current process is that we will have all of the hubs use their own image instead of defaulting to https://github.com/2i2c-org/2i2c-hubs-image?

A few thoughts:

  1. If the answer is yes to the question above, then I would propose we do our best to keep https://github.com/2i2c-org/2i2c-hubs-image up to date in order to provide it as a template/model for communities not wanting to start "from scratch". This way we provide a more solid base for them, while also empower them to "own" that image and modify it as they wish.

    My guess is that the majority of them will probably use that image or build on top of it. And while I understand that this will require us maintaining that image, I also believe that it would also help with reducing the "how do I install x" for most of the common pkgs.

  2. Another idea would be to keep a record of all the communities images, that others could re-use or adapt?
    This way any new community will be presented with the requirement to use https://github.com/2i2c-org/hub-user-image-template for their image (documented as best as possible) and a list of images that other 2i2c hubs use (including the 2i2c maintained one).

Second action point 🔽

  1. In all contracts, we provide a set number of hours of 2i2c engineer time that can be dedicated to (2) listed above. Unfortunately as it is difficult to categorically figure out how long a particular request will take, time based work seems to fit better than unit-based work. This request will need to come through the community representative of that particular hub. Hours can be topped up if necessary, but will cost money.

A few thoughts:

  1. I like the idea of having a set number of hours dedicating to setting up or help with maintaining community user images. Since more hours translate into a greater cost/hub, I think we should keep this number as small as possible, while also being realistic and sensible about it (I have now idea what this means in hours however 😕). I really hope/believe we can provide an affordable base hub setup.

  2. +1 on having the request come from community representatives

  3. +1 on providing time based work, rather than unit-based work.

Question:

What would happen if we were to apply this last step to the tf+keras utoronto request? When should we "stop trying" and conclude "this request is not yet possible" or "this image is too broken to fix it, we need a different approach/start fresh"?

@yuvipanda
Copy link
Member Author

Thanks for the awesome feedback, @GeorgianaElena! The question with https://github.com/2i2c-org/2i2c-hubs-image is - will we upgrade it? If we bump up say, the Python version there, we will need to co-ordinate with everyone using it - and that is a lot of work.

Instead, perhaps we can leverage the work being done in https://jupyter-docker-stacks.readthedocs.io/en/latest/? And start offering that as the default, and make the configurator more robust so people can pick from one of the jupyter-docker-stacks defaults? That way, they get to control what version they are using - and we can help with upstream maintenance too.

@GeorgianaElena
Copy link
Member

If we bump up say, the Python version there, we will need to co-ordinate with everyone using it - and that is a lot of work.

I think what I had envisioned was that we only maintain that image in order to use it for our staging hubs let's say, not be a default for communities. But the important thing is that it needs to work as it is and that there's no incompatibilities.

But the communities would have their own version of the repo, that pushes the image to a quay account that they maintain and control. They shouldn't use our image directly, but rather start from that and alter it as they wish and use it only as a model.
Not sure this makes sense though.

I love the docker-stacks idea too, esp because I believe it can help more people than just the communities involved with 2i2c 🚀

But bottom line is that I believe there are benefits in maintaining a list of working images that communities could use as a base or just as an example if their use-case is similar.

@damianavila
Copy link
Contributor

I would be totally 💯 on using some of the Jupyter Docker Stack images as our base images!
I envision a sort of hierarchical model where we inherit from the Jupyter Docker Stack images... and we add our specific stuff... and then each community adds their stuff as they need.

For instance, suppose some of our communities want an image containing scipy stuff...
From the jupyter/scipy-notebook>> 2i2c-jupyter/scipy-notebook >> communityA-2i2c-jupyter/scipy-notebook
So we would have a set of 2i2c repo templates inheriting from each of the Jupyter Docker Stack images and adding our specific 2i2c stuff... then each community can self-serve any customization they want.
If we need to push a fix for all of our communities, we can "easily" do that by modifying the repo template so they can consume those fixes.
If some fix happens upstream (or we fix something there, hopefully), we can consume those fixes in our 2i2c templates and then make those fixes available to be consumed by each of our communities...

@yuvipanda
Copy link
Member Author

So in my head, there are two different tracks here:

Track 1: jupyter-docker-stacks based images

  1. Use an unmodified jupyter-docker-stacks or pangeo-docker-stacks image. Community rep gets to choose the tag, and upgrade at their leisure. There's no github repo specific to this on a per-community basis. We contribute to upstream maintenance however we can.
  2. A community wants a customization. We create repo using a template, add a Dockerfile inheriting from whatever image they were using earlier, and add their customization. I've ideas on contributions to jupyter-docker-stacks that makes this easier.
  3. When more customization is needed, the Dockerfile is further customized or plain repo2docker files are used.

repo2docker isn't really used here - primarily, the step between (1) and (2) is just minor modifications. repo2docker-action is still used to build and push, and repo2docker is used when customizations are needed for (3).

Track 2: repo2docker based images

  1. Use an unmodified jupyter-docker-stacks or pangeo-docker-stacks image. Community rep gets to choose the tag, and upgrade at their leisure. There's no github repo specific to this on a per-community basis. We contribute to upstream maintenance however we can.
  2. A community wants customization. This triggers a fork of https://github.com/2i2c-org/hub-user-image-template/, and a new image is basically constructed using repo2docker files from scratch. This is what we currently do for pretty much any community that wants customization. This will primarily use repo2docker based files, while a Dockerfile can be used for advanced customizations.

There are a few questions here:

  1. What happens when a community is on a pre-built image and wants a little customization (a package addition)? Most likely this can't be done upstream, and we don't want to do this in a way that affects other communities - down that way lies fire breathing unfriendly dragons where you unintentionally break someone else because you added / upgraded a package for some user. What is the process when someone is on an unmodified image and wants a customization?
  2. Is there an 'intermediate 2i2c layer' really necessary? I'd love to avoid it completely! Any changes we make to this will affect a lot of users, and will become a big maintenance burden for us. So I'd like to really retire https://github.com/2i2c-org/2i2c-hubs-image eventually somehow, and just rely on one of our upstreams.
  3. What kind of 'fixes' would we want to push to all our communities? I think it would be primarily around jupyterhub, lab, etc versions. Who is responsible for these?. It makes sense that 2i2c is, since it ties into the version of JupyterHub, etc that we use. We'd need to figure out automation to help with this. It also ties into (2). A more general version of this question is who is responsible for what part of the image?.

@yuvipanda
Copy link
Member Author

@GeorgianaElena

But the important thing is that it needs to work as it is and that there's no incompatibilities.

100% I totally agree! But that should work with the jupyter-docker-stacks images too, right? If not, we can work on fixing it upstream...

@yuvipanda
Copy link
Member Author

In general, I'd love for us to try and see if we can get away from a '2i2c maintained image', and focus that energy towards helping maintain upstream image stacks.

@damianavila
Copy link
Contributor

Cross-reference: 2i2c-org/hub-user-image-template#11

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
No open projects
Status: Needs Shaping / Refinement
Development

No branches or pull requests

3 participants