Define a process for helping communities with image changes #1145

yuvipanda · 2022-03-24T20:39:03Z

Problem

The software and configuration setup in the user image is a very important part of the end user experience. It controls:

The languages available, and their versions
Packages available in those languages
Setup for more complicated packages, that might require setting up environment variables, dealing with conflicts, etc
Additional applications (such as RStudio, Linux Desktop, etc) that are enabled, and how they are configured

The Jupyter community has developed a lot of tools - particularly, repo2docker and repo2docker-action to make managing this more tractable. Simple package installs and language versions are often doable with editing repo2docker files when they are present, and complex setups are possible by using the Dockerfile escape hatch.

However, there are more complex issues - particularly around (3) - that often require deeper expertise. There are also times when 'this is not behaving the way I think it should?' that require deeper expertise.

As 2i2c, we want to support our users in customizing their user image in two ways:

Making it as easy as possible to 'self serve' most changes, leveraging the ecosystem built and maintained by the Jupyter community
Avail of the deeper expertise of the 2i2c engineers when dealing with complex problems

Currently, there's no specific limits on (2), which seems unsustainable.

Proposed solution

This issue proposes the following specific actions:

In all contracts, we will set up users with ability to use https://github.com/2i2c-org/hub-user-image-template to setup their own user image that they can update and use without 2i2c engineer intervention. There's work to be done in improving this self-serve capability.
In all contracts, we provide a set number of hours of 2i2c engineer time that can be dedicated to (2) listed above. Unfortunately as it is difficult to categorically figure out how long a particular request will take, time based work seems to fit better than unit-based work. This request will need to come through the community representative of that particular hub. Hours can be topped up if necessary, but will cost money.

The text was updated successfully, but these errors were encountered:

yuvipanda · 2022-03-25T20:18:41Z

I think a part of this will involve us improving docs on repo2docker as well. For example, #1147 is helped by writing jupyterhub/repo2docker#1147

GeorgianaElena · 2022-03-29T10:19:27Z

I think this a really good idea and defining this process would really help both us and the communities. The specific actions proposed are very good @yuvipanda! I'll leave some of my thoughts/questions about them below:

First action point 🔽

In all contracts, we will set up users with ability to use https://github.com/2i2c-org/hub-user-image-template to setup their own user image that they can update and use without 2i2c engineer intervention. There's work to be done in improving this self-serve capability.

Question:

So basically what this means and how it differs from our current process is that we will have all of the hubs use their own image instead of defaulting to https://github.com/2i2c-org/2i2c-hubs-image?

A few thoughts:

If the answer is yes to the question above, then I would propose we do our best to keep https://github.com/2i2c-org/2i2c-hubs-image up to date in order to provide it as a template/model for communities not wanting to start "from scratch". This way we provide a more solid base for them, while also empower them to "own" that image and modify it as they wish.

My guess is that the majority of them will probably use that image or build on top of it. And while I understand that this will require us maintaining that image, I also believe that it would also help with reducing the "how do I install x" for most of the common pkgs.
Another idea would be to keep a record of all the communities images, that others could re-use or adapt?
This way any new community will be presented with the requirement to use https://github.com/2i2c-org/hub-user-image-template for their image (documented as best as possible) and a list of images that other 2i2c hubs use (including the 2i2c maintained one).

Second action point 🔽

In all contracts, we provide a set number of hours of 2i2c engineer time that can be dedicated to (2) listed above. Unfortunately as it is difficult to categorically figure out how long a particular request will take, time based work seems to fit better than unit-based work. This request will need to come through the community representative of that particular hub. Hours can be topped up if necessary, but will cost money.

A few thoughts:

I like the idea of having a set number of hours dedicating to setting up or help with maintaining community user images. Since more hours translate into a greater cost/hub, I think we should keep this number as small as possible, while also being realistic and sensible about it (I have now idea what this means in hours however 😕). I really hope/believe we can provide an affordable base hub setup.
+1 on having the request come from community representatives
+1 on providing time based work, rather than unit-based work.

Question:

What would happen if we were to apply this last step to the tf+keras utoronto request? When should we "stop trying" and conclude "this request is not yet possible" or "this image is too broken to fix it, we need a different approach/start fresh"?

yuvipanda · 2022-03-29T20:12:02Z

Thanks for the awesome feedback, @GeorgianaElena! The question with https://github.com/2i2c-org/2i2c-hubs-image is - will we upgrade it? If we bump up say, the Python version there, we will need to co-ordinate with everyone using it - and that is a lot of work.

Instead, perhaps we can leverage the work being done in https://jupyter-docker-stacks.readthedocs.io/en/latest/? And start offering that as the default, and make the configurator more robust so people can pick from one of the jupyter-docker-stacks defaults? That way, they get to control what version they are using - and we can help with upstream maintenance too.

GeorgianaElena · 2022-03-30T14:19:32Z

If we bump up say, the Python version there, we will need to co-ordinate with everyone using it - and that is a lot of work.

I think what I had envisioned was that we only maintain that image in order to use it for our staging hubs let's say, not be a default for communities. But the important thing is that it needs to work as it is and that there's no incompatibilities.

But the communities would have their own version of the repo, that pushes the image to a quay account that they maintain and control. They shouldn't use our image directly, but rather start from that and alter it as they wish and use it only as a model.
Not sure this makes sense though.

I love the docker-stacks idea too, esp because I believe it can help more people than just the communities involved with 2i2c 🚀

But bottom line is that I believe there are benefits in maintaining a list of working images that communities could use as a base or just as an example if their use-case is similar.

damianavila · 2022-04-04T21:29:22Z

I would be totally 💯 on using some of the Jupyter Docker Stack images as our base images!
I envision a sort of hierarchical model where we inherit from the Jupyter Docker Stack images... and we add our specific stuff... and then each community adds their stuff as they need.

For instance, suppose some of our communities want an image containing scipy stuff...
From the jupyter/scipy-notebook>> 2i2c-jupyter/scipy-notebook >> communityA-2i2c-jupyter/scipy-notebook
So we would have a set of 2i2c repo templates inheriting from each of the Jupyter Docker Stack images and adding our specific 2i2c stuff... then each community can self-serve any customization they want.
If we need to push a fix for all of our communities, we can "easily" do that by modifying the repo template so they can consume those fixes.
If some fix happens upstream (or we fix something there, hopefully), we can consume those fixes in our 2i2c templates and then make those fixes available to be consumed by each of our communities...

yuvipanda · 2022-04-06T21:03:32Z

So in my head, there are two different tracks here:

Track 1: jupyter-docker-stacks based images

Use an unmodified jupyter-docker-stacks or pangeo-docker-stacks image. Community rep gets to choose the tag, and upgrade at their leisure. There's no github repo specific to this on a per-community basis. We contribute to upstream maintenance however we can.
A community wants a customization. We create repo using a template, add a Dockerfile inheriting from whatever image they were using earlier, and add their customization. I've ideas on contributions to jupyter-docker-stacks that makes this easier.
When more customization is needed, the Dockerfile is further customized or plain repo2docker files are used.

repo2docker isn't really used here - primarily, the step between (1) and (2) is just minor modifications. repo2docker-action is still used to build and push, and repo2docker is used when customizations are needed for (3).

Track 2: repo2docker based images

Use an unmodified jupyter-docker-stacks or pangeo-docker-stacks image. Community rep gets to choose the tag, and upgrade at their leisure. There's no github repo specific to this on a per-community basis. We contribute to upstream maintenance however we can.
A community wants customization. This triggers a fork of https://github.com/2i2c-org/hub-user-image-template/, and a new image is basically constructed using repo2docker files from scratch. This is what we currently do for pretty much any community that wants customization. This will primarily use repo2docker based files, while a Dockerfile can be used for advanced customizations.

There are a few questions here:

What happens when a community is on a pre-built image and wants a little customization (a package addition)? Most likely this can't be done upstream, and we don't want to do this in a way that affects other communities - down that way lies fire breathing unfriendly dragons where you unintentionally break someone else because you added / upgraded a package for some user. What is the process when someone is on an unmodified image and wants a customization?
Is there an 'intermediate 2i2c layer' really necessary? I'd love to avoid it completely! Any changes we make to this will affect a lot of users, and will become a big maintenance burden for us. So I'd like to really retire https://github.com/2i2c-org/2i2c-hubs-image eventually somehow, and just rely on one of our upstreams.
What kind of 'fixes' would we want to push to all our communities? I think it would be primarily around jupyterhub, lab, etc versions. Who is responsible for these?. It makes sense that 2i2c is, since it ties into the version of JupyterHub, etc that we use. We'd need to figure out automation to help with this. It also ties into (2). A more general version of this question is who is responsible for what part of the image?.

yuvipanda · 2022-04-06T21:04:27Z

@GeorgianaElena

But the important thing is that it needs to work as it is and that there's no incompatibilities.

100% I totally agree! But that should work with the jupyter-docker-stacks images too, right? If not, we can work on fixing it upstream...

yuvipanda · 2022-04-06T21:05:12Z

In general, I'd love for us to try and see if we can get away from a '2i2c maintained image', and focus that energy towards helping maintain upstream image stacks.

damianavila · 2022-04-12T23:05:35Z

Cross-reference: 2i2c-org/hub-user-image-template#11

yuvipanda mentioned this issue Mar 25, 2022

rstudio on utexas image? #1147

Closed

jameshowison mentioned this issue Mar 31, 2022

Issue on page /howto/customize/custom-image.html #1158

Closed

damianavila mentioned this issue Apr 5, 2022

How do we push changes in this repo downstream to our communities? 2i2c-org/hub-user-image-template#11

Open

damianavila added this to DEPRECATED Engineering and Product Backlog Apr 12, 2022

damianavila moved this to Needs Shaping / Refinement in DEPRECATED Engineering and Product Backlog Apr 12, 2022

damianavila self-assigned this Apr 27, 2022

damianavila assigned damianavila and unassigned damianavila May 19, 2022

choldgraf mentioned this issue May 30, 2022

Define a multi-hub service offering 2i2c-org/team-compass#429

Open

damianavila removed their assignment Jun 14, 2022

consideRatio mentioned this issue Mar 11, 2023

Intervention to have new hubs not couple to the 2i2c-hubs-image, but something more up to date #2336

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define a process for helping communities with image changes #1145

Define a process for helping communities with image changes #1145

yuvipanda commented Mar 24, 2022

yuvipanda commented Mar 25, 2022

GeorgianaElena commented Mar 29, 2022

yuvipanda commented Mar 29, 2022

GeorgianaElena commented Mar 30, 2022

damianavila commented Apr 4, 2022

yuvipanda commented Apr 6, 2022

yuvipanda commented Apr 6, 2022

yuvipanda commented Apr 6, 2022

damianavila commented Apr 12, 2022

Define a process for helping communities with image changes #1145

Define a process for helping communities with image changes #1145

Comments

yuvipanda commented Mar 24, 2022

Problem

Proposed solution

yuvipanda commented Mar 25, 2022

GeorgianaElena commented Mar 29, 2022

First action point 🔽

Question:

A few thoughts:

Second action point 🔽

A few thoughts:

Question:

yuvipanda commented Mar 29, 2022

GeorgianaElena commented Mar 30, 2022

damianavila commented Apr 4, 2022

yuvipanda commented Apr 6, 2022

Track 1: jupyter-docker-stacks based images

Track 2: repo2docker based images

yuvipanda commented Apr 6, 2022

yuvipanda commented Apr 6, 2022

damianavila commented Apr 12, 2022