Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Policy: One recipe per repo or multiple recipes per repo? #25

Closed
jankatins opened this issue Feb 10, 2016 · 10 comments
Closed

Policy: One recipe per repo or multiple recipes per repo? #25

jankatins opened this issue Feb 10, 2016 · 10 comments

Comments

@jankatins
Copy link
Contributor

[This is mainly pulled from: https://github.com/conda-forge/conda-smithy/pull/51#issuecomment-181910644 and the following discussion.]

Single recipe per repo

Pro

  • tooling for the builds (conda-smithy and the CI scripts) scales, as it is only one recipe in each repo and build attempt
  • fine grained permissions per recipe (=repo)
  • no broken builds due to unrelated packages
  • build status per package

Con:

  • needs tooling to setup (how does an outsider without rights in the conda-forge org setup a new recipe?)
  • multi packages changes need to be coordinated
  • high cost of learning all the setup steps for new contributors
  • high complexity of the setup scripts (e.g. setting up a new recipe needs changes in the conda-forge org which are managed by conda-smithy)
  • one ends up with hundreds of github repos -> finding the right repo for the package can be error prone for the user
  • higher cost when the CI scripts need updating (or a new python version comes out) -> all repos need to update
  • High monitoring cost for maintainers, as they have to look into a lot of repos for PRs/Issues instead of just one

Both

  • needs tooling to build a website which lists all packages
  • on the local file system: confusing, either because of hundreds of checked out git repos or because of hundreds of recipe-subdirs

Multiple recipes per repo

Pro

  • Multiple packages changes work out of the box (conda buildall handles the build order)
  • Changes by contributors can be reviewed by maintainers in one place
  • Only one repo to handle: updates of CI scripts, on the github dir
  • easier for new contributors: simple add a PR for the new recipe in a subdir
  • needs less tooling, as the usual github UI can be used (add new committer with push access to a repo) -> also more similar to the rest of the github workflow

~

  • can be done with hand crafted CI scripts (=without conda smithy). This gives more flexibility but also means that it's a manual workstyle from that point on...

Con

  • Not sure how many recipes per repo can be handled by CI code and conda buildall. As long as each PR submits one recipe (and all earlier recipes are built) it's probably fine, but what happens on the release of python 3.6...?
  • unrelated changes can break the builds for all recipes in the channel
  • build badges only for the whole repo, not individual recipes
  • Only broad permissions: committer with push access for all recipes and normal contributor who can submit PRs [this could be managed by splitting into multiple repos by themes, but this will give some of the disadvantages from a single recipe per repo policy]

[I will add pro and cons as they become available in the comments]

@jankatins
Copy link
Contributor Author

I'm in the "multi recipes per repo" camp, mainly for the ease when adding a new recipe: one PR which submits 3 files in a new subdir is usually enough. This makes it easier for new contributors and it does not need so much new tooling (both to develop and to learn).

@ChrisBarker-NOAA
Copy link
Contributor

How would they be grouped?

by discipline? by maintainer? ???

I know it was Phil's inspiration to do one recipe per repo -- I know he had some good reasons....

@pelson
Copy link
Member

pelson commented Feb 11, 2016

The single repo, many recipes, ship sailed long ago ⛵ . @ocefpaf, others and myself have maintained these kinds of repositories for over a year with great success. We know how they work and they are completely an option in individual silos (e.g. IOOS, conda-recipes-scitools, etc.). Their huge drawback is lack of reach. Anybody who knows how to make a conda recipe and use Github should be able to contribute and maintain a recipe. There is no group in the world with the resources and expertise to maintain a large range of build recipes on their own - the only way of doing it is to have a collaborative, de-centralised approach to maintenance (just ask Enthought and Continuum how hard it is).

The only real discussion here is whether some feedstocks should have more than one recipe in them - that may be a pragmatic decision we make in the future for things like meta-packages, but we are by no means there yet. When the use case arises, we will have to re-asses, and determine whether to fix the problem with tooling, or with multi-recipe feedstocks.

As you well know, the tools to make a self building repository containing many recipes is well developed (i.e. conda-build-all), and it isn't hard to take a feedstock's CI scripts and re-purpose them to using them (as you have proposed in conda-forge/conda-smithy#51). The reason that is a limited idea is that it simply does not scale beyond 10s of recipes, and even then, it can only work when continuous integration doesn't build every recipe each time. By separating to one recipe per repository (and allowing branches it should be said), any change to the repository triggers the recipe to be built. It's still not perfect (because dependencies can still move under your feet), but is gives a degree of confidence that any change to a repository doesn't have an adverse effect on the built distribution.

@JanSchulz - take a look at https://conda-forge.github.io - there is not a single mention of conda-smithy. Sure, if you want to know how conda-forge works, you need to know of its existence, but there have already been feedstocks created by users without them even knowing about conda-smithy. The key here is tooling and automation - it has been clear from day one that if conda-forge is to fly, there has to be a huge amount of automation. There is still plenty to do, but the riskiest parts are now complete. We already have automated: github repository creation, CI registration, CI scripting, github team management, feedstock listing (http://conda-forge.github.io/feedstocks.html), build matrix computation etc. etc..

As I say, for me, this simply isn't a discussion on policy. The decision was made before a line of code was written for conda-forge; it is worth remembering that the capability that multi-recipe repositories gives has been available for over a year without a canonical community led channel being established.

@jankatins
Copy link
Contributor Author

The reason that is a limited idea is that it simply does not scale beyond 10s of recipes, and even then, it can only work when continuous integration doesn't build every recipe each time.

This is a bummer :-(

My main concern with the "one recipe per repo" policy is, that it also doesn't scale, but on the human side by attracting new contributors and from a security standpoint.

There is currently no mention of "how to submit your own recipe" on https://conda-forge.github.io/ and actually I have no real idea how you would do that (and othere seem to have similar problems: #8, conda-forge/staged-recipes#59). I currently have two guesses about how this might work:

Submit as a new PR to staged-recipes

This is from my experience of submitting msinttypes, which got turned into a feedstock.

  • It gets reviews (+) and the contributor does not need to setup any infrastructure (e.g. get an access token to the conda-forge anaconda channel; nor fiddle with github tokens, etc)
  • It gets turned into a feedstock by some magic (manual? Is there a server running which does this?)
  • The magic sets the conda-forge access token in encrypted form, so the original contributor does not have access to it (+).

But now I'm subscribed to a github repo I have no idea about apart from a few github info mails. It's also a completely different repo than the one I initially contributed to: where do I add future patches? To the staged-recipe repo (which now has this recipe removed) or to the the new feedstock repo? And if it is to the feedstock, I now have to lean the conda smithy / CI internals to understand how it works if I want to maintain that part of the repo -> this is confusing for new contributors.

Submit a Request to get a new repo

  • Add a request somewhere (probably here) to get access to a new repo or the org to create new repos
  • Push your recipe there
  • Now some magic (e.g. manual?) has to run the init code to setup the repo (or you need to give the conda-forge access token to the new contributor)

Alternative:

  • submit a request for a new repo with a pointer to the recipe (which you have to host somewhere)
  • some magic does the setup by downloading the recipe and doing the complete conda smithy on it.

This means that the initial request is not properly visible as a PR in github, which makes it harder to review.

In both scenarios, some magic (e.g. tooling or manual) has to do the major steps of setting up the feedstock and transfer the rights.

[Edit: changed, as it seems I only have pull access to the reos I'm a team member of. Sorry, I misread the sent out mails that I have access now to the msinttypes repo]
In both scenarios, the group of maintainer now have to work with a lot of small repos to monitor for issues and PRs instead of one big one (if you give push access to the repo, you basically give each contributor access to the whole conda-forge channel).

@jankatins
Copy link
Contributor Author

I wonder if it would make sense to take the resolver from conda-build-all and instead of building it by conda-build-all, simple submit them one by one in the right order via the python equivalent of anaconda build submit ..... This would mean that the main repo only needs one CI service...

That would give the easy handling of the single repo and the relative easy and safe building by continuum. If a build fails we also have a simple way to restart by using the same command locally.

@ChrisBarker-NOAA
Copy link
Contributor

I've been thinking that while a large repo with more than tens of recipes is untenable, there are also times where a closely related set of recipes might be better served by being together: say a c-lib and the python package that wraps it, or...

libgeos, gdal, shapely, fiona

comes to mind (even though those are ideally maintained by continuum...)

So it would be nice if the tooling supported either option.

@pelson
Copy link
Member

pelson commented Feb 15, 2016

There is currently no mention of "how to submit your own recipe" on https://conda-forge.github.io/

add_a_recipe

if you give push access to the repo, you basically give each contributor access to the whole conda-forge channel

This is a concern I share. In actual fact, the small feedstock repos give a better opportunity of mitigating this in the future.

But now I'm subscribed to a github repo I have no idea about apart from a few github info mails.

Agreed. Docs are the answer to this - it isn't an unreasonable requirement for the capability provided.

It's also a completely different repo than the one I initially contributed to: where do I add future patches?

The feedstocks. This is a documentation issue.

And if it is to the feedstock, I now have to lean the conda smithy / CI internals to understand how it works if I want to maintain that part of the repo

This isn't true. It is not necessary to use conda-smithy to maintain a feedstock, though from time to time, a PR will go by which updates the CI scripts generated by conda-smithy.

As I've said before @JanSchulz, your concerns aren't unfounded, and I've already felt many of them (though it is always good to see a different perspective on them). I implore you to have a go with conda-forge, remember that we are still on the sharp edge of capability, and note down your experiences - it is the rough edges that you come across that we need to smooth, be it through documentation or processes/tooling.

I'm feeling kind of done with this issue, and wont be commenting on it again in the short term. The only way that my stance will change on the single-recipe per repo policy is when it is absolutely clear that it doesn't work, or adds an untenable degree of complexity. For now, the benefits of the approach are beginning to bear fruit (e.g. 12 maintainers of conda-forge recipes, compared to 7 on IOOS and SciTools combined), and I'd like to give the approach a genuine shot at success.

@ocefpaf
Copy link
Member

ocefpaf commented Feb 15, 2016

I'm feeling kind of done with this issue, and wont be commenting on it again in the short term. The only way that my stance will change on the single-recipe per repo policy is when it is absolutely clear that it doesn't work, or adds an untenable degree of complexity.

I am here only to add another data point. I pretty much feel the same. We tried the single repo and now it is time to try multiple repos. I see advantages and disadvantages to both approaches, but without a real experience with multiple repos I can't really say anything for sure.

For now, the benefits of the approach are beginning to bear fruit.

👍

@jankatins
Copy link
Contributor Author

Re how to add:

I only see this:

2016-02-15_123121

@pelson
Copy link
Member

pelson commented Feb 18, 2016

I only see this

Try clicking the down arrow next the the words "report and issue with".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

4 participants