-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Upgrade binder to v2.1.0 #5
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure about having the pinned and unpinned versions of the requirements files
Have you considered using either the upgrade option to install or stripping out the pinned versions using e.g. sed first rather than duplicating the package lists?
Hi @idwright, only the pinned files are used here. The pinned files are needed to ensure a reproducible environment between different systems. The unpinned requirements files are there to capture the set of packages that we want to be included. They are not used here, the only place they are used is to periodically recreate the pinned files, which we do periodically when we need to upgrade to newer versions, or when packages need to be added. I.e., they are an internal tool used only to help with maintenance of binder. Happy to explain further, there's a good reason I promise :-) |
Just to explain further, there is a CI-driven process which generates the pinned files from the unpinned files. Only the pinned files should ever be used when creating environments, to ensure reproducibility. |
I got that the unpinned files are only used for upgrading. I just don't like having more than one file with a list of packages, especially unpinned. Can see there's a bit of a problem with more than one OS... |
As a maintainer of malariagen/binder, I need to be able to specify which packages should be present in the environment. I do that by manually editing the unpinned requirements files. These are split into three files (requirements-conda-forge.txt, requirements-bioconda.txt, requirements-pypi.txt) to help me remember which packages are drawn from which software channels. Again as a maintainer of malariagen/binder, I need a process for generating a fully pinned environment definition, including both the packages I know I need and all of their dependencies. This pinned file is what is then provided to users, to create an environment on their own systems. This is pinned and includes all dependencies because it needs to be reproducible so that all users have exactly the same environment. We have to support users on both linux and mac. This adds a complication because conda has separate channels for linux and mac. So it's not possible to have exactly the same environments for linux and mac. However we want to get as close as we can. So we have a CI process which takes (requirements-conda-forge.txt, requirements-bioconda.txt, requirements-pypi.txt) as inputs and generates (environment-pinned-linux.yml, environment-pinned-osx.yml) as outputs. Using a CI process means we can get as close as possible to the same environment for both linux and mac. Users should only ever use the pinned files. This is handled here in datalab-image because the dockerfile only uses environment-pinned-linux.yml. Any users installing malariagen/binder on their own systems will do so via a bash script we provide (install-conda.sh) which again ensures that the appropriate pinned environment file is used. We are using malariagen/binder both for datalab and for environments on individual's computers, and we are trying to get the same environment across all systems, so analyses can be run reproducibly anywhere. Believe me I have considered many alternative strategies here and sunk many hours into this (hence the rant :-). This is the best solution I have found so far that can give us fully reproducible environments across both linux and mac computers and on datalab, and is also reasonable to maintain. It's not perfect but it does what we need. But if you can come up with a better solution that satisfies the same requirements, you are more than welcome to try! |
OK - thanks - so long as it works for you. |
@idwright thanks, sorry for the rant, my frustration comes from how hard this has been to achieve. Maybe we should adapt that old computer science joke, something like, "There are only two hard things in bioinformatics: reproducibility, naming things and off-by-one errors." |
@alimanfoo I think that's one reason why docker has become so popular - makes it easier to have the same environment especially when a lot of developers seem to have macs but servers are linux. |
Yes docker is the only other viable option I think. We tried it for a while
but there were challenges both for users and for maintainers, and we backed
off in favour of using conda directly. At least with the pinned conda
environment files we have now, they include all dependencies of all
requested packages, pinned to version and build numbers, i.e., they are
complete, and so should give reproducible results no matter where or when
they are used to create a conda environment.
…On Thu, 8 Aug 2019, 09:38 idwright, ***@***.***> wrote:
@alimanfoo <https://github.com/alimanfoo> I think that's one reason why
docker has become so popular - makes it easier to have the same environment
especially when a lot of developers seem to have macs but servers are linux.
Still feels like package management isn't really a solved problem even
with conda, pip, maven, gradle, npm, yarn etc, etc - there's not really a
solution to sub-dependencies especially if a particular package stops being
maintained.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#5?email_source=notifications&email_token=AAFLYQR35QL6YJE2E4Z4FWTQDPLO5A5CNFSM4IKAOQU2YY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGOD3243YA#issuecomment-519425504>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AAFLYQQ3HBG3RPIFA2ADDVDQDPLO5ANCNFSM4IKAOQUQ>
.
|
This PR upgrades the binder submodule to v2.1.0. This brings in a number of minor version increases and deals with some compatibility issues.