Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upgrade binder to v2.1.0 #5

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open

Conversation

alimanfoo
Copy link
Member

This PR upgrades the binder submodule to v2.1.0. This brings in a number of minor version increases and deals with some compatibility issues.

@alimanfoo alimanfoo requested review from idwright and slejdops August 7, 2019 13:34
Copy link

@idwright idwright left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about having the pinned and unpinned versions of the requirements files

Have you considered using either the upgrade option to install or stripping out the pinned versions using e.g. sed first rather than duplicating the package lists?

@alimanfoo
Copy link
Member Author

alimanfoo commented Aug 7, 2019

I'm not sure about having the pinned and unpinned versions of the requirements files

Have you considered using either the upgrade option to install or stripping out the pinned versions using e.g. sed first rather than duplicating the package lists?

Hi @idwright, only the pinned files are used here. The pinned files are needed to ensure a reproducible environment between different systems.

The unpinned requirements files are there to capture the set of packages that we want to be included. They are not used here, the only place they are used is to periodically recreate the pinned files, which we do periodically when we need to upgrade to newer versions, or when packages need to be added. I.e., they are an internal tool used only to help with maintenance of binder.

Happy to explain further, there's a good reason I promise :-)

@alimanfoo
Copy link
Member Author

Just to explain further, there is a CI-driven process which generates the pinned files from the unpinned files. Only the pinned files should ever be used when creating environments, to ensure reproducibility.

@idwright
Copy link

idwright commented Aug 7, 2019

I got that the unpinned files are only used for upgrading.

I just don't like having more than one file with a list of packages, especially unpinned.

Can see there's a bit of a problem with more than one OS...

@alimanfoo
Copy link
Member Author

As a maintainer of malariagen/binder, I need to be able to specify which packages should be present in the environment. I do that by manually editing the unpinned requirements files. These are split into three files (requirements-conda-forge.txt, requirements-bioconda.txt, requirements-pypi.txt) to help me remember which packages are drawn from which software channels.

Again as a maintainer of malariagen/binder, I need a process for generating a fully pinned environment definition, including both the packages I know I need and all of their dependencies. This pinned file is what is then provided to users, to create an environment on their own systems. This is pinned and includes all dependencies because it needs to be reproducible so that all users have exactly the same environment.

We have to support users on both linux and mac. This adds a complication because conda has separate channels for linux and mac. So it's not possible to have exactly the same environments for linux and mac. However we want to get as close as we can.

So we have a CI process which takes (requirements-conda-forge.txt, requirements-bioconda.txt, requirements-pypi.txt) as inputs and generates (environment-pinned-linux.yml, environment-pinned-osx.yml) as outputs. Using a CI process means we can get as close as possible to the same environment for both linux and mac.

Users should only ever use the pinned files. This is handled here in datalab-image because the dockerfile only uses environment-pinned-linux.yml. Any users installing malariagen/binder on their own systems will do so via a bash script we provide (install-conda.sh) which again ensures that the appropriate pinned environment file is used. We are using malariagen/binder both for datalab and for environments on individual's computers, and we are trying to get the same environment across all systems, so analyses can be run reproducibly anywhere.

Believe me I have considered many alternative strategies here and sunk many hours into this (hence the rant :-). This is the best solution I have found so far that can give us fully reproducible environments across both linux and mac computers and on datalab, and is also reasonable to maintain. It's not perfect but it does what we need. But if you can come up with a better solution that satisfies the same requirements, you are more than welcome to try!

@idwright
Copy link

idwright commented Aug 8, 2019

OK - thanks - so long as it works for you.
It's not always easy to tell what's hard from a quick review, but doesn't seem like it's worth more time
I can see that the end product is pinned files, and trying to get as close as possible to the same environment across ecosystems, which is what matters.

@alimanfoo
Copy link
Member Author

@idwright thanks, sorry for the rant, my frustration comes from how hard this has been to achieve. Maybe we should adapt that old computer science joke, something like, "There are only two hard things in bioinformatics: reproducibility, naming things and off-by-one errors."

@idwright
Copy link

idwright commented Aug 8, 2019

@alimanfoo I think that's one reason why docker has become so popular - makes it easier to have the same environment especially when a lot of developers seem to have macs but servers are linux.
Still feels like package management isn't really a solved problem even with conda, pip, maven, gradle, npm, yarn etc, etc - there's not really a solution to sub-dependencies especially if a particular package stops being maintained.

@alimanfoo
Copy link
Member Author

alimanfoo commented Aug 8, 2019 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants