-
-
Notifications
You must be signed in to change notification settings - Fork 5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Organising the conda communities and establishing best practices. #299
Comments
@kyleabeauchamp, @jchodera, @jakirkham @msarahan, @johanneskoester, @daler, @chapmanb, @jxtx, @jmchilton: please feel free to ping others and invite them :) |
Definitely interested in learning more! For now, pinging @rmcgibbo, @mpharrigan, @cxhernandez, @marscher, @franknoe, @pgrinaway, @bas-rustenburg. |
@bgruening thanks for the message! In fact we just discussed that yesterday!! Conda-forge was born from two communities similar to bioconda and omnia (the SciTools and IOOS channels) with the goal to reduce redundancy and join forces to produce high quality recipes and binaries. I would love to see more communities join us here. We are not the dark side but we do have cookies 😉 (Well... a cookie cutter... sorry for the false advertisement.) I am trying to put a blog post online next week with more info. We are also planning on public (Google?) hangouts so we can have some online face-time and QnA sessions. Meanwhile feel free to ask anything here, or in new issues, if the you have a very specific question. Here is the gist of conda-forge:
There are many details I am leaving out and much more to talk about, but I will stop here for now. The number one question we get is: why multiple repositories instead of one with all the recipes? We had (and still have) many discussions like this. However, all I have to say is: We tried the single repo model and now we are trying the multiple repos model. So far, the multiple repos has scaled much better, and none of the major worries we had became true. |
This sounds great. @rmcgibbo is much more qualified to comment than I am here---he pioneered most of the omnia conda framework---but we ended up converging on our own build system (modeled loosely on Where should we look for all the gory technical details about the build systems and automation? This was the hardest part for us, since we needed (1) broad platform support (hence the use of a phusion/holy-build-box-64 build system for I'd love to understand more about how the |
Which ones? For humans or browsers? 😆 Ok, it was terrible, but I had no self-control. Yes, welcome all. 😄 Please feel free to peruse what is going on at conda-forge and ask questions. The best place to get acquainted with or propose general discussion topics is probably the website repo (in particular the issue tracker). There are many issues there that are likely of interest and welcome to healthy discussion of thoughts and personal experiences. Also, there may be a few closed issues there worth reading up on just to get a little bit of history (we are still quiet young 😉). If you would like, feel free to submit a simple recipe or a few to get a feel for how everything works here. Also, feel free to check out our gitter channel for any generic questions or you may have. Once everyone has had a chance to get a feel for how everything works and what seems personally relevant, we can figure out meeting discussion topics in some place YTBD. Again welcome. |
Welcome @jchodera.
This varies depending on the question. Let's try and direct you based on the points raised.
This issue has basically moved in the direction of various proposals of how to move the Linux build system forward. Though there is current strategy in place, as well.
This is under active discussion. The reason being this is tied to several issues including build system constraints, how features work, how and what of these libraries get distributed, etc.. See this issue. There is a proposed example there of how we might get this to work. However, we haven't settled on something yet.
This is all over the map. 😄 In general, we use AppVeyor (Windows), Travis CI (Mac), and Circle CI (Dockerized Linux) If you just want to read code, we can point you there. Proper documentation isn't quite there yet. Also, there isn't one singular issue for this, but it is discussed at various points in various issues. What sort of things would you like to know? |
Hi all, checking in from bioconda. I've been poking around the conda-forge code and can't pin down where the magic is happening. Could you point to some code or to a description of what's happening to aggregate the one-recipe-per-repos? To further the discussion, here's a description of the bioconda build system and where you can find the code.
The workflow is just like most anything else on github: submit a PR and wait for it to be tested. Once it passes, someone on the team merges it into master. Upon merging, travis-ci then runs again but on the master branch and this time upon completing, the built packages are uploaded to anaconda. Aside from differences in the moving parts of the build systems, it sounds like we're all dealing with similar issues with respect to CUDA and gcc, etc. Would be nice to work out some best-practices that we could all use. |
Welcome @daler.
Sorry I'm not following this question. Could you please clarify what you are meaning by aggregate? It is a little unclear and I am a bit worried that there may be some misunderstanding of what is going on here. I'll try to clarify the big picture below.
Yes, SciTools and IOOS behave in a similar manner. However, those recipes along with many from conda-recipes are being ported over here as people from those groups seem to like this model. Just to clarify, the model for building is very different here than the many recipes in a single repo. The reasons are varied, but I think the biggest difference is it allows people to take ownership of recipes/packages that are important to them and the tools (CIs) used to test, build, and deploy. This includes making bug fixes, releases, feature support, etc. Similarly it allows relevant discussion to break along those lines. In practice, this appears to be a huge asset. However, there are plenty of other reasons for one to consider this model. How this works:
While understanding this infrastructure may at first seem daunting, it is actually not so bad and is not really necessary. However, if you are curious, we are more than happy to explain the details. Maybe if you could please rephrase your question in terms of these steps, we can do a better job at answering your questions and providing you places to look for more information.
Absolutely, we would be happy to point you to relevant issues where these are being discussed. Just please let me know which of these you would like to know more about. |
@daler, aggregation is done at the https://github.com/conda-forge/feedstocks/tree/master/feedstocks repo. This is created with conda-smithy, particularly this module: https://github.com/conda-forge/conda-smithy/blob/master/conda_smithy/feedstocks.py Continuum is very interested in this particular aspect (I am Continuum's representative here, though others are also involved in contributing recipes and discussing build tools). The one-repo-per-recipe model is necessary, I think, for two reasons:
The latter is the bigger issue here, since you all have had reasonable success with CI. Continuum has started a community channel (https://anaconda.org/pycommunity), with the long-term plan to have that as a package aggregation center. In my mind, the most important facet of this effort is to unite the recipes and have a single canonical source for each recipe. I don't care whether it's on some project's page (i.e. matplotlib), or on conda-forge, or whatever - so long as one place is the official source, and finding that source and contributing to it is straightforward. Conda forge is a great place to host recipes because it provides the CI of those recipes, and I like the distributed maintainer model, but I also think that hosting recipes directly at projects, and having conda-forge build from indirectly-hosted sources would be the ideal - that way the recipe would be holistically managed by the package originators. For the pycommunity channel, we'll mirror or link packages from other channels. In the case of multiple package sources, we haven't quite figured out how to prioritize them (activity level? origin of package?) The hope is that rather than many organizations having to say "add our channel!" we'd instead have just one, and that one may be enabled by default for some "community edition" of miniconda/anaconda - or otherwise could be enabled with |
@jakirkham and @msarahan thanks for your pointers. One missing piece for me was that submitting a PR to @msarahan -- Wholeheartedly agree that a single canonical source for each recipe is critical, and that finding that source and contributing needs to be straightforward. conda-forge/conda-smithy and pycommunity look like great tools to make that happen. |
Glad to help, @daler. Hope it wasn't too much. Just wanted to make sure we had common context for our discussion. 😄
When a PR is submitted all CIs (Travis/Mac, Circle CI/Linux, AppVeyor/Windows) are run and used to attempt to build the recipe, but do not release it.
Once the PR is merged, a Linux job in the Travis CI build matrix does the setup for the feedstock. It goes something like this for each recipe unless otherwise specified (steps 7, 8, and 9).
As you have mentioned, this all basically happens through After generating a feedstock, a global feedstock update is run. It is pretty simple. It updates the feedstocks with the latest commit of each feedstock on |
Perfect, these were just the kinds of details I was looking for. Thanks. Hopefully it can help get others up to speed as they join the discussion as well. |
Hi guys,
|
Yes, especially for Windows builds. Mapping conda-forge's model to Anaconda.org should be OK - the organization would be conda-forge, and each package would be a different build. Maybe I'm missing how this is different from the other CI services? Anyway, the hangup has been that anaconda.org has some kinks that need to be worked out.
ATM, I think the answer is "we don't." There has been discussion about coming up with networkx-driven guidance of what recipes to work on next, but that has been for human consumption more than automated buildout of dependency trees. Before getting involved in conda-forge, Continuum developed a build script that also uses networkx, and builds out these trees. That code assumes a single folder of packages, which can be created from conda-forge using conda-smithy. The dependency building code is part of ProtoCI: https://github.com/ContinuumIO/ProtoCI/blob/master/protoci/build2.py |
Thanks for the clarification. |
I think separate repos per recipe are still a good thing, because it gives you complete control over who has permission to accept changes to a recipe. I don't know how we'd do that with many recipes under one umbrella. |
Would this work on the |
Sure, I think so. It would need to be adapted to look into the nested recipes folder, but I think otherwise, it would work fine. It may also have trouble with jinja vs. static version numbers - but again, that's tractable. |
@msarahan I agree, this is in general a nice advantage. I asked, because the situation is different for bioconda. There, we have a rather controlled collaborative community, and it is much more convenient to have all recipes in one repository (e.g. for toposorting builds). |
Yeah, the one thing we don't have figured out well yet is how to edit multiple recipes at once. For aggregating them and building them as a set, I think conda-smithy + ProtoCI abstract away the difficulties with one repo per recipe. |
But if you build them as a set, you have the problem with job limits in the CI again, haven't you? |
Yeah, I figure the nested directory structure needs to be addressed. Otherwise adding jinja template handing is probably valuable no matter where it is used, no? |
Absolutely. In case you missed it, @pelson has a nice snippet at conda-forge/shapely-feedstock#5 (comment) |
Well, one could consider some sort of debouncing to handle this. Namely even though one has made the change together and one is submitting them all ultimately, we manage submissions/builds somehow so that they are staggered. This will likely require some thought, but it is useful for some workflows with the recipes. |
With anaconda.org, we don't have artificial limits. There are still strange practical limits - like logs that get too large end up making web servers time out. These are tractable problems. |
Interesting, thanks for the link. I'll take a closer look. |
@msarahan, I know, you don't have these limits, but my understanding was that anaconda.org cannot out of the box build recipes as a set, right? You have to register an individual trigger for each of them? And then, their order of execution is no longer determined, and they can't depend on each other. Or am I missing something here? |
@johanneskoester there would need to be some intermediate representation as a collection of recipes. Then that ProtoCI tool would then be able to build things that changed. It is written to build packages based on which packages are affected by a git commit. Here, obviously only one recipe could trigger, rather than many changing at once. That does not affect its ability to build requirements dependencies, though - and they'll be built in topological order. |
Hello! I am wanting to let you all know that the Right now our plan is to handle the migration in stages, with the end goal of fully moving off the
With regards to the 3rd item on the list, I was hoping to get a discussion as to how to handle the additional tools some of our packages would need into conda-forge. Here are the capabilities as I see them we would need and would like to discuss:
I would also like to discuss the possibility of supporting extended Docker images which is how we are currently handling step 1 of our migration process, although this would have to be highly regulated to ensure compatibility across conda-forge. There is alot to digest here, so I'm more than happy to split off parts of this to other issues as needed. Pinging @jchodera |
@Lnaden this will made my day! Awesome! A few of the omnia packages are already in BioConda which is using the same Docker Image as conda-forge, so it might be easier to port the BioConda package or use BioConda as well. |
@Lnaden that is really good news. We are going to hold meeting tomorrow see https://gitter.im/conda-forge/conda-forge.github.io?at=58fe073ccfec9192726d5141 and https://conda-forge.hackpad.com/conda-forge-meetings-2YkV96cvxPG for the details. If you are available at that time try to participate. |
@bgruening It may take a while to migrate all the packages since we will be relying on individual package developers to move their own packages, but the first step will be to force them to add To see everything we are doing differently, you can see the extended linux-anvil we are preparing to use not merged in yet. Its a bit crude since I know several packages are already installed but here is the list of things we need as of now: yum: of which I know libXext, libSM, libXrender, and groff are already installed based on our Docker build log. dkms and libvdpau are for the CUDA toolkits texlive GPU Related
Not every package however needs all of those additions. The big one is OpenMM (openmm.org) which requires the GPU files, lots of the TeX, and some of the yum packages. Because we have a number of tools which rely on OpenMM though, we won't be able to move a number of the packages until we can move it first. The packages which do not depend on OpenMM should be much easier to move over, and some packages such as mdtraj moved quite a while go. |
@ocefpaf I should be able to attend the meeting. I also want to ping @peastman @mpharrigan and @rmcgibbo who may also be interested in being on that meeting if they can (5-6 PM UTC) |
I suspect we may be able to avoid requiring |
@Lnaden any reason you need those libraries inside the container, from a short look it seems that many of your libs already package, like the X.org stack, wget, perl - so no need to put this into the container. |
@bgruening The split Docker image is a port from the CentOS 5-based Docker file here and here and I have not refined it yet. I fully acknowledge there are packages which are already part of the base conda-forge |
I tried some time ago to get this to install |
Yeah, just add a
We should talk more about this in a separate issue. Maybe on the webpage repo. It would be good to hear what packages are required from this source. Generally we have been moving to use |
I would love if all distros actually packaged their own tex packages themselves, and I would very much prefer not to rely on a conda-based TeX install. From my very brief testing of the The only thing I think the packages in omnia use TeX for is building the docs to ship with the build. Maybe an option is to make TeXLive available on the image so packages can add
I have seen several packages do this. My larger concern is the lack of documentation on this feature, the best I could find was mention of it on conda-smithy's page of the docs.
Happy to. I know at least DKMS and libvdpau from EPEL is needed to make the CUDA toolkits work correctly, so if we can solve the GPU libraries instead, that may remove the need to get EPEL and allow us to shift away from heavy |
Yeah, documentation is a weak point for conda-forge still. That said, @pelson and I wrote the |
Basically this has been a problem of licensing. The same issue occurs with MKL for instance. It's been awhile since I've looked into it. I know that Continuum was able to get permission to build and distribute CUDA and cuDNN libraries. Though they are using CUDA 7.5 not 8 currently. If there is only one or two low-level packages like that, we could request they get build and added to Alternatively we could try emailing NVIDIA directly and asking how to use their toolchain in CIs and how to distribute the CUDA libraries. Considering the size and audience of conda-forge, I think NVIDIA would be interested in a mutually beneficial solution. Would you have interest in writing such a letter? We could put this in a Google Doc to get feedback from others. Haven't looked into AMD's OpenCL SDK yet. Though am getting the impression that sending an email their way wouldn't hurt either. In any event, let's move this discussion over to issue ( conda-forge/conda-forge.github.io#63 ). |
Continuum is distributing the CUDA and cuDNN libraries and headers in the |
To answer @bgruening's question:
When we set up our
@jakirkham : The first thing I would do would be to bring in Mark Berger, the Senior Alliance Manager for Life/Materials Sciences at NVIDIA, to help connect us directly with someone at NVIDIA who can help negotiate any technical/licensing issues. He is incredibly supportive of building an ecosystem for scientific computing that can exploit NVIDIA hardware (that's his job!), and making CUDA-enabled packages more accessible could only further that goal. We can help initiate that contact by email on our side as needed---@Lnaden can help coordinate. |
Thanks for the details @jjhelmus. Maybe we should come up with some scripts to automate this process on the CIs. If they just live in CI builds, that should minimize the chances of accidentally distributing these.
That's fantastic @jchodera. Please do. |
Do any of the CI providers have nodes with NVIDIA GPUs? I do not know if these would be strictly needed to compile GPU packages but they would be needed to properly test the packages. |
That is one problem we have with OpenMM right now, our builds on Travis and Appveyor cannot test the GPU components correctly. Our current process involves building and uploading OpenMM on a IIRC you can set up local Jenkins tests to handle the GPU, but I think that requires local physical boxes you can set up with private access, and clearly not a viable solution for conda-forge |
At Bioconda, we are currently experimenting with Buildkite. Basically, Buildkite provides an interface and management framework for CI agents that are deployed to local machines. This has two major advantages
With such an approach, security of the used systems becomes important of course. For example, they should always be in some kind of DMZ, and build jobs should be executed in a volatile, containerized or virtualized environment. |
How does Buildkite compare to say Concourse CI, @johanneskoester? Asking because @msarahan has been working on using Concourse CI to build ref: https://github.com/conda/conda-concourse-ci |
On the surface, they look much the same. The main reason I chose Concourse
was the ability to dynamically create batch jobs. We have the ability to
monitor a git repo of many recipes (or subrepos), and build all the ones
that have changed. Having the aggregate view be the working set of recipes
also means we can do neat things like testing n steps downstream from a
newly built package to make sure things still work.
…On Mon, Jun 26, 2017 at 1:28 PM, jakirkham ***@***.***> wrote:
How does Buildkite compare to say Concourse CI <https://concourse.ci/>,
@johanneskoester <https://github.com/johanneskoester>? Asking because
@msarahan <https://github.com/msarahan> has been working on using
Concourse CI to build conda packages.
ref: https://github.com/conda/conda-concourse-ci
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#299 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACV-ZnHKpinzMEFPuoSLq75dDclHNPEks5sH_hMgaJpZM4IDngh>
.
|
Indeed, looks similar. With buildkite, dynamic batch jobs are possible as well (via their agent tool). I don't know if concouse makes this easier though. Note that we have stopped our experiments with builkite though. We mainly were interested because of the build time limits on Travis, but we found a way to circumvent that nicely within Travis CI. For large bulk updates, we have simply one fixed branch. When we update our global pinnings on that branch, all affected recipes are rebuilt. Whenever a built succeeds, the package is uploaded to anaconda. Since hundreds of recipes can be affected, the build time limits can be exceeded. However, since we upload immediately after success, no recipe has to be built twice. Instead, we can simply fix all failed recipes, and trigger a rebuild by pushing the fixes. Now, in the next iteration, new recipes will be built in addition to the fixed ones, and so on. By this, we can simply reduce the number of remaining recipes to zero with a couple of iterations on the bulk branch. This worked really well when recently updating Python to 3.6 and R to 3.3.2. And we have all recipes in a single repository, so it is easy to keep track of the dependency DAG in order to build in the correct order. |
Closing as I think this has served its purpose. Happy to continue relevant discussions in new issues. |
We all love conda and there are many communities that build awesome packages that are easy to use. I would like to see more exchange between these communities to finally share more build-scripts, to develop one best-practice guide and finally to have channels that can be used together without breaking recipes - a list of trusted channels with similar guidelines.
For example the bioconda community - specialised on bioinformatic software. They have some very nice guides how to develop packages, they are reviewing and bulk-patches recipes if there are new features in conda to make the overall experience even better.
ping @johanneskoester, @daler and @chapmanb from BioConda fame
Omnia has a lot of cheminformatic software and a nice build-box based on phusion/holy-build-box-64 + CUDA and AMD APP SDK.
ping @kyleabeauchamp, @jchodera
With conda-forge there is now a new one and it would be great to get all interested people together to join forces here and don't replicate our recipes or copy them from one channel to the other just to make them compatible.
Another point is that we probably want to move recipes to
default
at some point and deliver our work back to Continuum - so that we can benefit from each other.I can imagine that we all form a group of trusted communities and channels and activate them by default in our unified build-box - or we have one giant community channel. All this I would like to discuss with everyone that is interested and come up with a plan how to make this happen :)
What do you all think about this?
As a next step I would like to create a doodle to find a meeting data where at least one representative from all communities can participate.
Many thanks to Continuum Analytics for there continues support and the awesome development behind scientific python and this package manager.
ping @jakirkham @msarahan
The text was updated successfully, but these errors were encountered: