-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Repo2docker image for Cryo cloud community #1
Comments
Heya, just to say there is no requirement from 2i2c that this image repo be under our org - although we can put it there if other locations are not appropriate. The only thing we need to know is the image name and tag so we can pull it from a container registry :) |
Thanks @weiji14! It will be best to have it in the CryoCloud github for sure. I'll answer definitively on the rest tomorrow. Would you like to meet to discuss? |
Sure, I should be free to chat after 12 noon (MDT) tomorrow. Would be good to discuss about things like names for the docker image, what are the core Python packages needed, etc.
After looking at the 2i2c template and the Hackweek GItHub template, the key difference seems to be in terms of dependency lockfiles (Hackweek uses conda-lock but 2i2c template doesn't). I have a slight preference for having lockfiles because it will be more reproducible long-term, though it might require a bit more work/expertise. No reason we can't combine the 2i2c and Hackweek styles though! Let's discuss this tomorrow perhaps. |
If you're going to take anything from the 2i2c template, take the GitHub Actions workflows that automate the building and pushing of the image using repo2docker. Once you have that in place, everything else is just tweaking environment files. |
Yes, those GitHub Actions CI workflows will definitely be needed, just pushed a commit at c783191! The Hackweek repo does have a similar workflow with some extra customizations at https://github.com/uwhackweek/jupyterbook-template/blob/0d7ca36851ae399438433a8f3b48a8af90f40e30/.github/workflows/repo2docker.yaml, e.g. setting a CalVer tag for the built docker image. We might look at bringing in some of those extra customizations to align with the Cryo community's standards. |
I have added the test and other workflows so you should have what you need now for actions. |
Thanks @tsnow03 for setting up the accounts on https://quay.io/user/cryointhecloud and https://hub.docker.com/u/cryointhecloud (and configuring the related
I haven't tested those docker images (tagged |
Tracking some extra packages requested by @dfelikson and @tsnow03:
Likely upgrades on the horizon: |
While not essentially, did you want to consider splitting off your image into a separate repo from this website repo? I can see the possibility in the future that you will have GitHub actions set up that will need to respond only to changes in the image but you don't want/need the website to also be redeployed. Again, not a firm requirement if using a mono-repo works better but it is just a suggestion. Relatedly, it looks are primarily thinking about building your image using repo2docker and having to define everything in a conda environment file. While this is fine for many groups, you can also manage your own |
Yes, we did consider a dedicated docker image repo (see #2 (comment)). The current mono-repo approach was meant to keep things simple since there's only a few maintainers, but if things get more complicated, then splitting the Jupyter Book content from the conda enviroment specification seems wise.
True, using a Dockerfile would be necessary if VNC/Linux is needed. Maybe @tsnow03 can chime in on what the use case for VNC/Linux would be, since this would require some non-trivial changes to the current repo2docker setup using conda enviroment.yml files. |
I had a meeting with @jmunroe about the CryoCloud. Between @jmunroe, @fperez, and Fernando, and others, it sounds like 2i2c will be helping to do two things:
Both of these items are targeted to be complete before the first onboarding training session for CryoCloud, which has been pushed back to Dec 2 @ 9am PT. |
@tsnow03 I created a new repo based on a message from @fperez at https://github.com/2i2c-org/nasa-cryocloud-image. I'd love to move this repo into this github org :) Can you give me org-level permissions so I can do that? Would also need to update the QUAY credentials on GitHub actions for it to work |
I've added everything in #1 (comment), except the linux desktop - which I'll add later today. |
Update: I didn't find time to add the desktop environment today. Let's do the following, and I'll add the desktop env?
|
Thanks @yuvipanda! I sent you an invite with access to both teams but member role - do you need owner role (I can change it if needed, of course)? |
The quay credentials update will need @tsnow03's input, I don't have those. |
@fperez looks like I need owner perms. |
Done @yuvipanda! |
@fperez great, I've it here now: https://github.com/CryoInTheCloud/hub-image. Will need the quay.io creds next! |
Awesome, thx! Should I add an apt.txt like the stat159 one?? And I'd want to pull in also all the packages we keep around in the stat159 and JMTE envs to form a "happy home in the clouds", though we'd probably want to do a bit of weed trimming for anything there that's gone stale, and run your version update script over those files. |
Also, do we have a way to set up env variables without touching the dockerfile? There's a few useful ones in the JMTE setup but I think most of that shouldn't require touching the Dockerfile itself. |
@fperez I've added all the packages requested in #1 (comment), as well as all the apt packages from stat159 into the image. I've also combed through the stat159 image and added what I think of as baseline packages to the image. However, at this point, I would strongly recommend against a wholesale copying of conda packages from stat159 to this image - additional packages always cause additional work, now or in the future :) I'd say add them as the need arises. If you look at the packages in https://github.com/CryoInTheCloud/hub-image/blob/main/environment.yml#L4, you'll see that I've added a comment to each on on why it is there. In our experience managing large images for berkeley, this is absolutely important as without that it becomes impossible to know why something is there, and what might break if it is bumped or removed. So my suggestion is to try run through your demo, and add packages as needed if it fails. If there's something that you think should be counted as 'baseline' that I didn't pull in, I think adding that with a comment on why it's important would be helpful. Once the quay.io setup is done, it doesn't block on us either - @tsnow03 can add them as necessary! I've amended the README with information on how you can do this. This is actually much better than the setup you are used to with stat159 @fperez as you can test package additions here purely via the GitHub web UI + mybinder. I think TODO still is:
I'd also personally heavily recommend against |
I also updated the README in https://github.com/CryoInTheCloud/hub-image. The Linux desktop still seems broken, I'll look at that probably after thanksgiving. |
Thx @yuvipanda! Done here, we can probably now continue working directly on that repo via PRs and issues. |
Oh, and agreed on But that pattern is a bit unusual and not something I'm keen to explain to new users for now, so let's keep it simple :) Still, setting env vars without touching the dockerfile could be useful (aside from |
Thanks Yuvi and Fernando for starting the new docker image repo! Just wanted to throw out a question on reproducibility. Will there be lockfiles generated from the I realize that the packages in |
@weiji14 - absolutely!! My take on this issue is that we should take a "two-level" approach:
Regarding versioning, my suggestion would be to use an As for the apt packages, I think that's a bit less of a concern, since they change more slowly and we can ensure that a given Speaking of apt - are there any desktop packages you'd really want to have there from day 1? Only the most basic stuff has been installed, mostly CLI tools. But I was thinking of adding at least qgis and grass that I imagine many people might want to use (I know Whyjay in our group uses qgis, and I'm a light user of it too). Very happy to see this taking shape! |
ps - @weiji14, I don't know if qgis and/or grass are in the default ubuntu repos, so I won't add them to the apt file unless you confirm they'd be useful, as they may require adding extra sources lists. I won't have time to dive into using it during the Dec 2 demo, so unless you need it right away, we might just make an issue for other desktop tools needed by the users and sort that out after some users have a chance to kick the tires a bit. |
I hadn't planned on setting up lockfiles in the new image, mostly due to lack of capacity. The primary goal was to remove the requirement for folks needing a local docker setup or similar to be able to make changes to the repo. The pangeo-docker-stacks project does that with this action (https://github.com/pangeo-data/pangeo-docker-images/blob/master/.github/workflows/CondaLock.yml) that @scottyhq wrote I think. If we can generalize that, or even make a copy of that here to autogenerate lock files, I'd be happy to review! Unfortunately I won't have time to work on it myself this year. I would also say it's probably ok to just get desktop apps (outsdie of the base env, like XFCE and X) from conda, as the apt versions can be quite outdated. qgis is definitely much newer in conda-forge than in apt, for example. So @weiji14, if you wanna take a shot at the conda-lock automation, i'd love to see that :) I completely agree it is super important for long-term (or even medium term reproducibility), and could be generally useful in a lot of places. |
np @yuvipanda - if @weiji14 has bandwidth to set up the conda-lock automation that would be awesome, else we can put it on the backlog for later. And in a fun turn that shows my brain is mush, I just realized I ended up rewriting your script and had totally forgotten! Oh, fun... I've named the env and will try to build it locally, then will run my script over it and will thus finalize this PR with full versions of the packages. Not quite the conda-lock that @weiji14 is asking for, but it will go a ways in that direction. |
Ok, using my little script I added version numbers so at least now I think this part is in reasonable shape (and we can improve with conda-locks later, obviously). |
It will be nice to add the conda-locks later. I'm happy to help with that as well. I think the |
Cool, that CondaLock.yml action is pretty much the automated GitHub Actions version of what we've been using at https://github.com/CryoInTheCloud/CryoCloudWebsite/blob/main/conda/lock-environment.sh. Let me see if I can port that over to the hub-image.
A quick scan through the packages in apt.txt does show quite a few packages on conda-forge already. I'd definitely recommend QGIS from conda-forge, though that can wait until after Dec 2.
This is steering towards the 'Binder for Everything' vision mentioned at https://discourse.pangeo.io/t/future-of-pangeo-cloud-i-binder-for-everything/1574 😄. I know 2i2c is looking at operating a BinderHub on AWS us-west2 for the general Pangeo community, but I'm not sure if we want something similar for CryoCloud. There's typically a tradeoff between having having too many packages in the base image (which leads to slow server startup times) and too few packages (users then need to install it themselves). But anyways, this can be discussed later. |
This is deployed in the hub now, and I've fixed the linux desktop environment :) Please test it out and let me know! |
Startup time seems a bit slower, but it works after a bit of light testing on JupyterLab! Not quite sure how to access the desktop apps (e.g. QGIS) though, but I'll find out on Dec 2 😄 Oh, and I think I've got the conda-lock automation working, see CryoInTheCloud/hub-image#5 (comment). So people will need to write |
The hub is looking great so far! The desktop worked great for me and QGIS came up perfectly when I ran it from the command line. From my current pull request, here are the few errors I've found so far with the environment. I'll start working on those things once the open hub-image pull requests are settled. |
@weiji14 is there anything else on this issue we should do, or can we close it now, and start opening new more targeted ones as new ideas/issues come up? I just updated the image in this PR, and though I get an icepyx error in the tutorial, I think most packages are now in good shape and all the tutorial imports work. |
Yes, let's close it since we should have most if not all of the packages added. New discussions can continue over at https://github.com/CryoInTheCloud/hub-image/issues. |
Create an initial conda virtual environment specification for the NASA Cryo community! This will be used in the default docker image for the 2i2c Cryo deployment. From 2i2c-org/infrastructure#1702 (comment), the idea appears to be to combine the dependencies listed in:
Note that 2/3 of these are derived from https://github.com/uwhackweek/jupyterbook-template/tree/main/conda
List of example packages to include:
Template repository and instructions from 2i2c:
Question: Where to put the conda virtual environment specification?
under the 2i2c org, e.g. github.com/2i2c-org/cryo-image following https://github.com/2i2c-org?q=&type=all&language=dockerfile&sort=(not needed, see Repo2docker image for Cryo cloud community #1 (comment))The text was updated successfully, but these errors were encountered: