-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
lobbying for official / supported docker images from scientific software projects #111
Comments
more concretely, for everpub, I'm thinking maybe we can have a collection of good base images to build from, some of which maybe these official base images. somewhat similar how I e.g. heroku / travis provide some base environtment (either by autodetecting in the former case, or via the |
Approaching projects which don't already want to provide these images is tricky. Packaging is a lot of dirty and boring work, so if you approach a project their default answer will most likely be "if you think this is such a great idea we welcome PRs". Which isn't soo surprising. I think it would be super if this existed (but not so super that I will work on it). For |
I'd say the best moment to approach others asking them to make official Dockerfiles is when we have a prototype to show. Then we can ask "Do you want this for your users?" and we are in a much stronger position. |
👍 so keeping this in mind, I would like it to be possible for images to be based on any base-image. This has been somewhat tricky with both everware and binder in that the only way to get code in there is to build docker images upon their respective bases. |
👍 👍 on this. I agree it would be nice if we could invert the layers for binder and everware. Currently you extend the binder base image with your stuff instead of binder adding what it needs to a random base image. Not sure how hard that is.
|
I think it should be reasonably easy. In any case, for any |
I don't know why binder and everware insist on "being first", but I wonder if the best way out is to allow multiple Docker images (see #51), of which one is reserved for the everpub infrastructure. |
yes i'm thinking something similar. in my mind the best point of view is that an everpub "instance" will work as a mini-cluster / swarm
in short I'd like to be able to call (let's say from the notebook instance in the master container) something like:
where the containers can share some state using shared volume-binds. Docker already has a functional python API, but maybe everpub can provide a nicer layer on top that takes care of binding a common filesystem etc.. @betatim, thoughts? |
I like the idea of inverting the layers. Adding the everpub stuff at the I would first build a system that is much simpler. Only one container that Why start simple? Most work can be done on a single host (given enough CPUs You need to keep track of these parts of an analysis for sure. However I Usually the data volumes are much, much smaller, require less CPU and are (apparently adding horizontal lines confuses the markdown/email parser that thinks it indicates that the signature starts) On Wed, Mar 2, 2016 at 6:32 PM Lukas notifications@github.com wrote:
|
+1. If we get awarded the prize, we should focus on the smaller compute
things first, because I think that will be of more value to biomed people.
I can justify that more if/when the time comes :)
|
👍 I agree as well i.e.
I'm fine if we start with 1) and 2). I might have a prototype example notebook of 3) and 4) based on our workflow stuff (which does indeed run in this master/sibling setup on a docker cluster) in the next days. the reason why I want to emphasize this, is that I have our current analysis workflows in mind, where for one stage we are bound to the ATLAS software releases (the 'dull' stuff @betatim mentions, event selection / reduction etc).. but later on, the more hands-on analysis / result presentation stuff lives in a completely different environment. So I want to be able to run the dull stuff, but not chain myself to the software choices of that environment for my later stuff. |
I guess it is not matter of order of inheritance, but matter of entrypoints of a [analysis] container, that should be somehow agreed upon to be suitable for running the analysis in different environments:
each of those environments requires different entrypoint: Makefile, jupyter notebook, jupyterhub, .travis.ci + test_scripts in case of REP we did it simply by script that launches different stuff depending on environment variables. I'm not sure what would be the best way to generalize this approach without much risk of being cut by Occam's weapon.
so when the analysis is started within certain environment (given those environments are not difficult to discriminate), corresponding entrypoint becomes available, say, by everpub command-line utility. Does it make sense? |
I talked about this a bit with @cranmer et al and maybe this is a good forum, also this is related to #51 .
A lot of software products are already a good fit for the Docker paradigm of wrapping a single entry-point / program / command line tool, with all their dependencies. I think it lobbying for large, widely used software products (as opposed to e.g. library provides that are meant primarily for re-mixing) to build official docker images can help both by 1) give at the very least a reference dockerfile on how the authors of that project would install their own software and 2) give already a useful
A perfect example in HEP would be ROOT. I think a ROOT docker base image would already go a long way for a lot of the scientific code that exclusively lives in the ROOT ecosystem.
Other HEP examples are Monte Carlo generators. These are also almost exclusively (at least by experimenters) used as black boxes that eat a couple of configuration files and spit out events in some format. Maybe another example could be GEANT? Maybe there are similar examples in biomed fields?
should we approach such projects and try to get them to have official docker images?
The text was updated successfully, but these errors were encountered: