-
Notifications
You must be signed in to change notification settings - Fork 20
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Meta-issue re composability #51
Comments
I think we touched on this also a bit in https://github.com/betatim/openscienceprize/issues/16 . There is a little bit of tension between an efficient day-to-day workflow / research development and making it accessible / recomposable / remixable to a wider audience. For example, I think most of the discussion in So, Docker is great for packaging stuff but it won't absolve us form listing the dependencies in a human /machine readable form (something like |
I think this is the wrong approach. Docker is a deployment tool, period. You don't extract anything from a Dockerfile. You need some other build system, which could produce Dockerfiles among other outputs. Many Dockerfiles just start from Debian and install packages with apt-get, in that case the build system is Debian. Another way to put this is that Docker is the code equivalent of PDF, not of LaTeX. |
I think it would be straightforward to have the Dockerfile use the specfile to figure out that conda is what should be run. No reason to put the actual deps directly in the Dockerfile... |
@khinsen I don't disagree with that. I took the Dockerfile as an example because this is in practice often one place where all the dependencies are spelled out (across all kinds of dependencies you might have, python deps via I would be very happy if there could be a way to specify those form another source. I agree that the dockerfile (and the resulting image) should also be just one project output among others (i think the pdf/latex analogy is apt). All that library code should be independently installable, with perhaps a container image being a reference implementation / installation. |
Isn’t the analogy: dockerfile ~ .tex I agree with the point that you need apt-get etc. for it to run, but the docker file is much more self-documenting than the resulting image.
|
Agreed on that. I am not sure you can make something as flexible as a In my experience a Does someone know successful cross-platform (as in gentoo vs readhat vs alpine vs centos vs suse) package managers? I can't think of any right now. |
@cranmer The difficulty with the analogy to TeX->PDF is that the latter is a single-layer operation, whereas code building and deployment has become such a mess that we use several layers. Take a basic application written in Python. It would typically come with build instructions for I'd argue that in this case TeX <-> Python, tex-command-line<->setup.py, and PDF<->installed-code. The Debian and Docker layers are just conversions to different build systems. The real question here is: Which layer is the best notation to extract dependency information for other uses? Not the Dockerfile, in my opinion, because you can't be sure to find anything useful in there. The Dockerfile that says "FROM debian, ADD apt-get..." is of little help - you have to parse the Debian package spec after that. But the Debian spec is hardly better, it refers to yet another underlying build system. And both Docker and Debian use imperative specs, which are notoriously hard to analyze. I suspect that it's best to introduce another notation for dependencies, rather than work around the problems with notations made for something else. Something simple and declarative. Perhaps there is something around that we can reuse, of course. |
@betatim No, but I don't think anyone has tried. Packaging is so difficult that doing it in a portable way is probably asking for too much. |
Anyone interested in the composability aspects should have a look at this writeup of a recent Twitter conversation. |
I think the core issue is that we want projects to have multiple kinds of research output that live on a spectrum between these two ends:
there is valuable space in between those edges in the sense that one can deploy 'black boxes' with well-defined input and output. think: a library function with all necessary dependencies included. ... and now I see @khinsen link to a document that shows exactly this spectrum 👍 I think all those things are not mutually exclusive. A good project should provide useful full deployments for reproducibility / limited re-usability, but also provide the possibility to install manually into a larger project. |
@lukasheinrich I completely agree with your description. And nothing is mutually exclusive, as you say, but the constraint of repurposing existing technology makes it hard to satisfy them all. In fact, my ActivePapers approach does everything you list, at the price of being incompatible with 99% of existing research software. But then, ActivePapers is research software for exploring these issues, not a tool made for widespread use in real life. I am rather pessimistic about satisfying all criteria while being fully compatible with the past - I think this requires too much accidental complexity for anyone to handle. But I'd be happy to be proven wrong. |
I suspect that it's best to introduce another notation for dependencies, rather than work around the problems with notations made for something else. Something simple and declarative. Perhaps there is something around that we can reuse, of course.
Aieeeeeeeeeeeeeeeeee <head explodes>
<obligatory reference to xkcd on standards>
I think we need some real, concrete use cases here to ground this discussion...
|
@ctb I said "introduce", not "invent". Introduce something else in addition to the Dockerfiles used for building images. |
There is some prior art to wrapping stuff around Dockerfiles / auto-generating dockerfiles from a more machine readable spec like here |
What I feel is missing currently from the Docker ecosystem is a clean way to compose layers from different pieces. Say you have a couple of layers that you know will be compatible (in that they would not overwrite files within the other layers respectively) you should be able to compose those as if you just bind mounted them during docker run I think the ADD keyword works in some sense like that if it is followed by an tar archive (see note in Dockerfile reference)
but it's not really a first-class citizen. If one wanted to merge two docker images one would need to export both to a archive and add them back via
and build a new file via such a Dockerfile
and then hope for the best :-/. Maybe there is no better solution, but currently lots of people are writing very similar Dockerfiles and streamlining the process somehow must be possible. The proposal did include a statement that we do want to be opinionated in certain ways. So maybe we can carve out an idea of what it means to be a "everpub"-compatible Docker image that then has some guarantees (via conventions in how they are built) to be composable) |
Last thing I wanted to add. Within the high energy physics community there has been a quite successful effort to streamline software distribution housed in a single global filesystem tree, called CVMFS (https://cernvm.cern.ch/portal/filesystem), with an additional toolchain to easily setup the shell with software from /cvmfs (in ATLAS it's called AtlasLocalRootBase (not sure why, not sure what other LHC experiments use https://twiki.atlas-canada.ca/bin/view/AtlasCanada/ATLASLocalRootBase, maybe @betatim and @anaderi can comment how this is done in LHCb) . which allows me do e.g. setup various software produces (across a wide range, from compilers to very specific software used by a single experiment) some examples Since this is a global filesystem, installation there is done using some political process and you need to have the entire thing + a network connection, but maybe we can re-use some of the insights as a model how to build ad-hoc coherent filesystems (which then could be |
Provocative mode: if the two images would layer perfectly, couldn't we just My guess as to why there is so much duplication right now is that it is too On Tue, Mar 1, 2016 at 1:50 AM Lukas notifications@github.com wrote:
|
yeah maybe that's stretching it, and I can think of various things that can go wrong, even it it worked from a filesystem layering point of view (think conflicting ENV statements, etc). As to the point that one should use the other as a base, I think this is exactly the core of the problem. With docker these two things don't commute (A based on B and B based on A result in different images) so this leads to the point were we would create long chains of image layers like cern/slc6-base + grid middleware + ROOT installation + custom library + analysis code + ... which takes a lot of discipline and upfront thought on how to layer everything together, if you want these to be re-used. So I actually like this "(composable) mini-/cvmfs" idea. the cvmfs maintainers seem to at least have some workable model and maybe they have some insights how to provision e.g. things like /usr/lib etc so that they can fulfill the (perhaps conflicting ?) requirements |
@lukasheinrich Considering at which level Docker containers operate (everything but the kernel), I'd be surprised if you could find two working Docker images that have no files in common. Every Docker image contains some Linux infrastructure stuff. In my paper on ActivePapers, I use the concept of a "platform" able to accomodate "contents". The platform is the infrastructure you rely on, the contents is what you generate on top of that. Think of "MP3 player" vs. "MP3 files" as a simple example. In the case of Docker, the platform provides the Linux kernel, and the contents are application software with all the Linux elements they require, except the kernel. This is inherently not composable. In fact, the real problem is that the traditional Unix model of software installation is not composable (see this blog post). One main motivation behind Docker was to work around the non-composability of Linux software installations. You cannot compose containers either, but at least you can run multiple containers on the same host system. In an OS with composable software installation (such as Nix/Guix), there is much less need for containers. They still have their place as sandboxes for secure execution, but that's a much less fundamental role. |
@betatim Your summary of Python Web frameworks is pretty much the standard story of innovation across human history. New technology starts out in a "high-temperature" state with lots of variation, which then "cools down" as consensus is reached on how to do things. Since this is also the history of the universe, our solar system, and our planet, perhaps we should accept it as a law of nature and learn to live with it 😄 In fact, I wonder if the specific problems of computing are perhaps the consequence of a dysfunction in this process. The complexity of dealing with compatibility issues forces consensus much too early, leading to de-facto standards that aren't really mature but that people prefer to live with. Another problem is that consensus is often reached through market dominance rather than by technical merit. That happens elsewhere as well, of course, but I suppose it's more pronounced in computing. |
Before we start turning in circles, maybe we can propose a couple of options how we can help with composability. I see two main areas 1) have tools in the everpub toolchain that allow an easy use / execution of multiple docker images / containers.E.g. for my projects, I am expecting that I will be managing multiple docker images / containers that each encapsulate different pars of my work (stuff based on ATLAS software in one image, post-processing in an image using only ROOT + python) so I think a good starting point is to assume that the everpub tools are executed on a machine that has e.g. the docker-client + $DOCKER_HOST set (e.g. perhaps using a Carina setup) 2) have tools in the everpub toolchain that allow an easy creation of good(TM)/best-practice Dockerfiles (or docker images directly)I'm spending way too much time creating Dockerfiles / building images from them. things like sharing provisioning scripts seem sensible (even if they don't have work for everyone). There's no reason everyone needs to figure out how to build ROOT like this
I do think that guix (as a build system within a docker image ) and packer.io (as a tool to build docker images from a machine-readable spec) are very interesting option that go beyond dockerfiles. |
@lukasheinrich Point 1) is clearly something we need to address. I don't see any particular difficulty either, but that may come :-( As for point 2), I am not sure what problem you want to solve exactly. If you want to facilitate building ROOT, just put your image on Docker Hub, or bundle your Dockerfile with ROOT itself. I suppose you are suggesting that others could profit from your ROOT Dockerfile in doing something similar but different, but I fail to see how exactly that would work. In any case, the problems of building and using Docker images are largely independent, so they can be attacked in parallel. And I strongly suspect that someone has already thought about best practices and tutorials for building Docker images for science. |
I just published a blog post about composition, with background information relevant to everpub. |
interesting read. recently Do you have any thoughts on IPFS as a solution to global content-addressable storage? I've been thinking this could be a solution (certainly not by us, but the community) to the commutation problem in docker images (A after B is different from B after A even if they were compatible). They are teasins as much on their blog, but there has not been news on this since https://ipfs.io/blog/1-run-ipfs-on-docker/ |
I have mentioned IPFS before in a blog post on data management, and I do see it as the right direction. Whether IPFS as an implementation will work out remains an open question, there is not enough experience with it. But the principles behind it are certainly very promising. Do you know anyone who has actually used IPFS in real life? |
not personally. I, myself, just recently stumbled on it and think it's a promising effort I want to keep an eye on. |
I noticed that #18 veered into some really great discussions of composability and I want to close that issue (because most of it has been dealt with by #41) but retain a link to composability.
So, put links to good comments about composability in this issue and we'll revisit if/when people want to talk about it more :).
The text was updated successfully, but these errors were encountered: