Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add R packages from the Galaxy TS #574

Closed
wants to merge 1 commit into from
Closed

add R packages from the Galaxy TS #574

wants to merge 1 commit into from

Conversation

bgruening
Copy link
Member

No description provided.

@daler
Copy link
Member

daler commented Jan 8, 2016

Hi Björn -

We've been separating bioconductor packages from CRAN packages using bioconductor- and r- recipe prefixes. We don't have ballgown, but DESeq2 is already under
https://github.com/bioconda/bioconda-recipes/tree/master/recipes/bioconductor-deseq2.

Check out the scripts/bioconductor dir for some tools that make bioconductor recipes easier -- for example, all those wget calls in this PR's build.sh can be replaced by dependencies in the meta.yaml.

@bgruening
Copy link
Member Author

@daler thanks for the info. I have seen this and will use this for new versions. What I'm trying here is to migrate Galaxy versions to biocondoa. This means that most of the packages are old ones and have old dependencies. The aim is to keep the installation identical to the Galaxy once, so every single package with the exact version. It will get more crazy if I migrate more packages. This was only a test - more to come :)

Do you think this is a workable solution as migration path?
New packages (that do not need to be migrated) will for sure use your scripts. Talking about these script you do not pin exact versions, we have had a lot of trouble with that in the past. Can we switch over to make strict versioning required?

I will change the prefix in my migration scripts, to fit the naming schema.

Talking about the deseq2 package it has some additional dependencies on rjson that are needed for Galaxy integration. Do you have and idea how we can solve this? New namespace, maybe gx at the end? Complete new conda channel?

@bgruening
Copy link
Member Author

@daler I changed the namespace and added a gx tag if there is a version clash. Please let me know what you think about it :)

@daler
Copy link
Member

daler commented Jan 8, 2016

Ah, I see. Is the rjson dependency you mention an additional dependency required by Galaxy? That is, is this a patched version of DESeq2 that only Galaxy would use, or is it the default v1.8.2?

If the latter, can the recipe go in older-version-subdirectories as you've done in the other parts of the migration you're doing? That way others could easily find the older DESeq2 versions without having to know they need to add the gx tag to get the older version.

I guess it boils down either wgetting all specific dependencies in the gx-deseq2 package's build.sh for a version that only Galaxy would use, or creating all the older dependency recipes for general use from bioconda. As long as these older dependencies are standard bioconductor packages and not patched specifically for Galaxy, I think the latter might be better. For example, how many of the Galaxy recipes depend on IRanges? Might as well have a conda recipe for it for speedy package building and use by other packages.

But, as you mention, you need to be able to pin versions in the bioconductor recipe scripts to make this work easily. This would be a really helpful feature and it should be simple to add. If a version is specified on the command line, the skeleton script can grab it directly from bioaRchive (as long as that version is available).

@johanneskoester
Copy link
Contributor

I deleted my previous comments because I misunderstood sth.
I agree with Ryan: We don't want to have all the dependencies integrated into one package. This defeats the purpose of a package manager.

@bgruening
Copy link
Member Author

Ah, I see. Is the rjson dependency you mention an additional dependency required by Galaxy? That is, is this a patched version of DESeq2 that only Galaxy would use, or is it the default v1.8.2?

No, no patched version. Everything is the standard package, mirrored to various places. Because of this we created the bioarchive and the corgo-store. It just a different mix of package that we would like to keep forever.

If the latter, can the recipe go in older-version-subdirectories as you've done in the other parts of the migration you're doing? That way others could easily find the older DESeq2 versions without having to know they need to add the gx tag to get the older version.

It already is in a subdir, should I override the version you have currently with this one, increasing he build number?

I guess it boils down either wgetting all specific dependencies in the gx-deseq2 package's build.sh for a version that only Galaxy would use, or creating all the older dependency recipes for general use from bioconda. As long as these older dependencies are standard bioconductor packages and not patched specifically for Galaxy, I think the latter might be better.

I totally agree, it's just a little bit hard to make this for all old packages we have currently for little gain, because most of them are old.

For example, how many of the Galaxy recipes depend on IRanges? Might as well have a conda recipe for it for speedy package building and use by other packages.

Sure, I will add this. Especially if we create new packages primary with conda and not with our own build-system. The real problem comes into play with our strict versioning. The current R packages don’t have this and it makes it easy to update those. But if you add a strict version requirement you need to update all DESeq2 dependencies in a new version and the dependencies thereof. In reality this means with every update of DESeq2 all packages needs to be updated (if you want to have the latest version of all dependencies). This is hard to do, that the reason we are shipping such bundles in Galaxy.

The entire bioaRchive project is related to this issue but also this PR: bioarchive/aRchive_source_code#20

But, as you mention, you need to be able to pin versions in the bioconductor recipe scripts to make this work easily. This would be a really helpful feature and it should be simple to add. If a version is specified on the command line, the skeleton script can grab it directly from bioaRchive (as long as that version is available).

Yes I think so. But do we want to have so many versions? My guess is that most of the small dependencies (+version) are only used once for this specific package.

Btw. I have the same problem with Python packages. If I depend on pbr ==1.8.0 and there is no package, should I add it to bioconda, wait until conda will accept a PR or include it in the build.sh script?

Requiring strict versioning is such a pain :(

@bgruening
Copy link
Member Author

I deleted my previous comments because I misunderstood sth.
I agree with Ryan: We don't want to have all the dependencies integrated into one package. This defeats the purpose of a package manager.

Please see my other post. This is hard to archive if you want to have strict versioning and will create a massive amount of new packages. Moreover it means we will have packages in bioconda that are already in conda with a different version.

That said it can be done. Just not sure if it's worth the effort.

Regarding your comments about the package manager. This is not so uncommon, remember your latex packages. Debian is also shipping them in larger units, and as Fedora changed this there was a huge discussion. It just makes it unnecessary hard, especially if you require strict versioning.

@daler
Copy link
Member

daler commented Jan 8, 2016

If the latter, can the recipe go in older-version-subdirectories as you've done in the other parts of the migration you're doing? That way others could easily find the older DESeq2 versions without having to know they need to add the gx tag to get the older version.

It already is in a subdir, should I override the version you have currently with this one, increasing he build number?

Sorry I missed that the gx was the suffix for the version subdir, as in recipes/bioconductor-deseq2/1.8.2gx. Never mind that part.

For example, how many of the Galaxy recipes depend on IRanges? Might as well have a conda recipe for it for speedy package building and use by other packages.

Sure, I will add this. Especially if we create new packages primary with conda and not with our own build-system. The real problem comes into play with our strict versioning. The current R packages don’t have this and it makes it easy to update those. But if you add a strict version requirement you need to update all DESeq2 dependencies in a new version and the dependencies thereof. In reality this means with every update of DESeq2 all packages needs to be updated (if you want to have the latest version of all dependencies). This is hard to do, that the reason we are shipping such bundles in Galaxy.

Metapackages might solve this. A deseq2-gx metapackage could specify all of the strict dependencies, each of which would exist individually as a bioconda recipe. Meanwhile the main bioconda DESeq2 package could remain as the up-to-date one with non-strict dependencies.

But, as you mention, you need to be able to pin versions in the bioconductor recipe scripts to make this work easily. This would be a really helpful feature and it should be simple to add. If a version is specified on the command line, the skeleton script can grab it directly from bioaRchive (as long as that version is available).

Yes I think so. But do we want to have so many versions? My guess is that most of the small dependencies (+version) are only used once for this specific package.

I'm not familiar with the dependencies in Galaxy. To use the IRanges example, does this mean that different Galaxy bioconductor packages need different IRanges versions? That is, Galaxy is not frozen to a specifc bioconductor release, but rather each package uses a different custom set of version dependencies?

Btw. I have the same problem with Python packages. If I depend on pbr ==1.8.0 and there is no package, should I add it to bioconda, wait until conda will accept a PR or include it in the build.sh script?

The model we've been using is to add it to bioconda.

Requiring strict versioning is such a pain :(

No kidding!

@bgruening
Copy link
Member Author

Sure, I will add this. Especially if we create new packages primary with conda and not with our own build-system. The real problem comes into play with our strict versioning. The current R packages don’t have this and it makes it easy to update those. But if you add a strict version requirement you need to update all DESeq2 dependencies in a new version and the dependencies thereof. In reality this means with every update of DESeq2 all packages needs to be updated (if you want to have the latest version of all dependencies). This is hard to do, that the reason we are shipping such bundles in Galaxy.

Metapackages might solve this. A deseq2-gx metapackage could specify all of the strict dependencies, each of which would exist individually as a bioconda recipe. Meanwhile the main bioconda DESeq2 package could remain as the up-to-date one with non-strict dependencies.

Yes, meta-package is probably the correct name.

Yes I think so. But do we want to have so many versions? My guess is that most of the small dependencies (+version) are only used once for this specific package.

I'm not familiar with the dependencies in Galaxy. To use the IRanges example, does this mean that different Galaxy bioconductor packages need different IRanges versions? That is, Galaxy is not frozen to a specifc bioconductor release, but rather each package uses a different custom set of version dependencies?

Galaxy has no concept of a BioC version. Every tool should be reproducible, so the DESeq2 Galaxy tools, depends on a specific R version, with a specific Bioc version with ... hence the bioaRchive idea :)

Btw. I have the same problem with Python packages. If I depend on pbr ==1.8.0 and there is no package, should I add it to bioconda, wait until conda will accept a PR or include it in the build.sh script?

The model we've been using is to add it to bioconda.

Great, than I will add basic python packages as well.

Requiring strict versioning is such a pain :(

No kidding!

@daler @johanneskoester to exaggerate a little bit you are voting to have every R package in every version in bioconda, is this correct? Would this be even possible from a technical point of view?

Lets assume we have a package:

deseq that depends on foo ==3.4 and bar == 1.0. foo on the other hand depends on Rcurl ==2.0 but bar depends on Rcurl ==2.1. This would result in a conflict, right?

If conda can handle such cases I guess it would be easier for us to integrate this directly into bioaRchive instead of doing this on demand on pushing regularly to bioconda.

To defend this PR a little bit, this does not mean we will not have a IRanges package, we will have it as soon as a Tool uses this as primary dependency.

Thoughts?

@daler
Copy link
Member

daler commented Jan 8, 2016

So in the example dependency conflict:

deseq ==1.8.2
  - foo ==3.4
    - Rcurl ==2.0
  - bar ==1.0
    - Rcurl ==2.1

How do you currently resolve this conflict in Galaxy / bioarchive?

As for all versions of all packages . . . I believe the limiting factor is the storage quota of the bioconda channel. But it also depends on Johannes' vision of bioconda (storage depot/archive vs latest packages).

@bgruening
Copy link
Member Author

So in the example dependency conflict:

deseq ==1.8.2
  - foo ==3.4
    - Rcurl ==2.0
  - bar ==1.0
    - Rcurl ==2.1

How do you currently resolve this conflict in Galaxy / bioarchive?

BioaRchive is just a storage and does not have this problem, it's a tarball storage.
In Galaxy we don't have this problem because we have this repository capsules, which is includes all packages.

As for all versions of all packages . . . I believe the limiting factor is the storage quota of the bioconda channel.

But also a technical one if the above one really results in a conflict.

But it also depends on Johannes' vision of bioconda (storage depot/archive vs latest packages).

Yes :) I guess it boils down to if we envision complete reproducibility here.
@johanneskoester?

@johanneskoester
Copy link
Contributor

So, our original idea was, Bioconda should contain the latest versions and keep the old versions unless we run out of space. I don't think we should now start and add all old versions of all R/bioc packages.
Older versions in subdirectories should be only added if there is a particular need for them. Hence if a package is updated, normally, the primary version will be updated without creating a copy of the recipe in a subdirectory.

Regarding the strict versioning: is there really a need for this? Anaconda does not seem to do it for its own R packages and they work totally fine. I think Conda won't be able to deal with the conflicting situation you describe. I am pretty sure even R itself is not capable of something like that, isn't it? I would prefer to add specific fixed version dependencies only if they are really needed (e.g. known API incompatibilities). Regarding your need for specific versions in Galaxy, meta-packages or uploading anaconda environments with fixed versions are the way to go. In both cases I would think that a separate galaxy channel would be a good idea (only for the meta-packages or the environments, not for the packages themselves).
This would allow you to do

conda env create galaxy/someenvironemnt

to get the combination of versions you need from Bioconda.

@bgruening
Copy link
Member Author

So, our original idea was, Bioconda should contain the latest versions and keep the old versions unless we run out of space. I don't think we should now start and add all old versions of all R/bioc packages.

As I said this was exaggerated :) and will not really happen, we would just need to add all packages that we currently maintain in the Galaxy community.

Older versions in subdirectories should be only added if there is a particular need for them. Hence if a package is updated, normally, the primary version will be updated without creating a copy of the recipe in a subdirectory.

Upps, I thought this is ok and because in the readme it's written. I added a lot of packages the last days into subdirs.

Regarding the strict versioning: is there really a need for this?

Depends on your needs. If you want reproducibility yes. The question is usually how far you want to go, but I guess it's always nice to reproduce a manuscript that used some old R2 packages / tools in general.

Anaconda does not seem to do it for its own R packages and they work totally fine. I think Conda won't be able to deal with the conflicting situation you describe.

I thought so :(

I am pretty sure even R itself is not capable of something like that, isn't it?

Oh no, this is the reason we do all this stuff. R is in this regard a real nightmare, since even the tarballs disappear after some time.

I would prefer to add specific fixed version dependencies only if they are really needed (e.g. known API incompatibilities).

You never know beforehand and it's not really about API compatibilities it's about reproducible results.

Regarding your need for specific versions in Galaxy, meta-packages or uploading anaconda environments with fixed versions are the way to go.

Meta-packages only work if I put every single R package I need in a specific version into the channel and we need the guarantee to that they keep available.

In both cases I would think that a separate galaxy channel would be a good idea (only for the meta-packages or the environments, not for the packages themselves).

See above, in this case we need more than the most recent version.
A separate channel is what I wanted to avoid :(

@jmchilton
Copy link
Contributor

Older versions in subdirectories should be only added if there is a particular need for them.

Is this only regarding R packages or all software. I'm happy to open a Galaxy channel for R or Python - because these are libraries and what we do is a bit peculiar. How about general bioinformatics applications though? The dependency hell you land in when you maintain older versions of software is not as bad in this case. There will be a lot more stuff added to your and you may be uncomfortable with that in some ways - but it is a large community of developers who work hard to maintain recipes, many new users, and stronger claims of reproducibility that you will gain in the process.

@chapmanb
Copy link
Member

chapmanb commented Jan 9, 2016

Björn, John, Johannes and Ryan;
It would be awesome to have Galaxy using bioconda for package management. I'm sympathetic to improving how we include previous versions, but also a bit nervous about the maintainability of those build descriptions. Jamming a ton of specific dependencies into a shell script seems tough to maintain and debug.

Could we solve the issue of reproducibility by having Galaxy install from conda dependency files? So for ballgown, instead of that giant recipe you'd have old packages installed separately and do:

bioconductor-ballgown=1.0.3
bioconductor-biocgenerics=0.12.1
r-s3vectors=0.4.0
bioconductor-iranges=2.0.1

and install via those without needing to add the specific dependency versions to the meta.yaml file (which I agree will probably make conda start on fire).

This way we don't need to maintain all package descriptions separately for all versions for all time, since the pre-built old versions will be available in anaconda.org to reproduce and we never need to build them again. We might need to figure out with Continuum about storage as we get more but I assume they will accept money understanding the noble purposes of Galaxy. You'll also need to do some insane amount of back-porting to get in all the old versions you want, but I know y'all like to torture yourselves so that should be fun.

@bgruening
Copy link
Member Author

Hi Brad!

I'm not sure I understand your proposal. bioconductor-ballgown=1.0.3 will install randomly the latest version of all defined dependencies of ballgown. If we install additional packages there is no guarantee that these will be used, isn't it? I also don't think we should take the R version into consideration. Specific versions of a package only works with a very specific R version.

Assuming that DESeq2 requires R 3.1, the following code does not work, because of conflicting dependencies:

requirements:
  build:
    - r ==3.1.2
    - bioconductor-shortread
  run:
    - r ==3.1.2
    - bioconductor-shortread

This was one of the reasons we decided to create such collection-packages in Galaxy. We can be sure they are working in the composition we are shipping them (R-version + Package versions ...). In this regard it is not different from the Anaconda installer.
Imho, this is easier to debug and maintain than the meta-package idea, where one package can bring the entire chain to fall.

Admittedly, this is made for users that want to use ballgown/deseq as tool and not as library and I understand if this does not fit the bioconda philosophy.

@daler @johanneskoester @chapmanb thanks for this discussion! I really appreciate this and I see that both communities have a slightly different focus. It would be awesome if we can find some middle ground we are all happy with.

@daler
Copy link
Member

daler commented Jan 9, 2016

I'm not sure I understand your proposal. bioconductor-ballgown=1.0.3 will install randomly the latest version of all defined dependencies of ballgown. If we install additional packages there is no guarantee that these will be used, isn't it?

Depends on how dependencies in bioconductor-ballgown==1.0.3 are defined. For example, say bioconductor-ballgown specifies a minimum GenomicAlignments but will work with any rtracklayer:

# bioconductor-ballgown
- bioconductor-genomicalignments >=1.0
- bioconductor-rtracklayer

Then this metapackage would work:

# ballgown-gx
- bioconductor-ballgown ==1.0.3
- bioconductor-genomicalignments ==1.2.1
- bioconductor-rtracklayer ==1.26.2

and so will this one:

# ballgown-gx
- bioconductor-ballgown ==1.0.3
- bioconductor-genomicalignments ==1.2.0
- bioconductor-rtracklayer ==1.26.2

but not this one:

# ballgown-gx
- bioconductor-ballgown ==1.0.3
- bioconductor-genomicalignments ==0.9
- bioconductor-rtracklayer ==1.26.2

Currently, the existing bioconductor recipes specify minimum dependency versions only if the original bioconductor package does. Since the author specified it, we assume that the package won't work correctly if the version is too low. But as long as the pinned versions in the metapackage satisfy those minimum versions you're good.

Hopefully that clarifies how things are currently set up. But all of this doesn't solve the original example conflict where we want both Rcurl versions to be simultaneously installed so that foo and bar can each use a different version. Conda can't handle it:

deseq ==1.8.2
  - foo ==3.4
    - Rcurl ==2.0
  - bar ==1.0
    - Rcurl ==2.1

I think I'm missing something in this case though. Probably because I don't understand the internals of R library loading and how Galaxy is handling it. This seems like an unsolvable dependency.

That is, in the Galaxy envrionment/repository capsule for the above environment, what Rcurl version am I using if I open R and run library(Rcurl)? Do I get different Rcurl versions if I run library(foo); library(bar) vs library(bar); library(foo)?

@johanneskoester
Copy link
Contributor

Yes, what Ryan and Brad describe is what I meant. There is no need to specify versions of R packages in dependency chains. A flat list of specific versions for all R packages will tell conda what to do. This flat list can be either a file (as seen above), an environment (as I proposed) or a meta-package. This is 100% reproducible, whithout making individual recipes less generic.

I second Ryans last question. That seems not to be possible in R.

@johanneskoester
Copy link
Contributor

@bgruening, @jmchilton, Galaxy having the need for a particular older version is totaly fine as motivation for adding them as subdirectories. What I meant was, in general, when a package is updated, we don't move the old version to a subdirectory, but rather rely on it staying available in our anaconda channel. That way, it stays installable without additional maintenance effort.

@bgruening
Copy link
Member Author

Ha, ok I missed that flat list part. So if DEXSeq depends on DESeq2 and additional dependencies I would need to list all DESeq2 dependencies plus the additional ones? I can not simply depend on DESeq2, because DESeq2 has loosely defined dependencies.

This makes sense, thanks for the explanation :)

Actually this is what we tried initially in Galaxy but it was not practical as you end up with a lot of packages you will never need, maybe this is different here. The maintenance overhead scares me a little bit in comparison to this simple PR.

Before I try this, is there any way to offer packages for multiple R versions. This PR explicitly targets R-3.1 but I guess you want to have all this dependencies also for latest R. This is not super important as this will only happen for migration packages and if it is not possible we can have a galaxy-migrate channel or something similar.

You don't want to have this meta-packages hosted here, right?

Leaving the R discussion for a moment as this is really the worst case to discuss :) I really want to use bioconda for classical binaries, can we keep the subdirectories so that it is more obvious what is available and make it possible to fix old recipes easily?

P. S.

deseq ==1.8.2
  - foo ==3.4
    - Rcurl ==2.0
  - bar ==1.0
    - Rcurl ==2.1

This was an example that if you make a small mistake in setting the >=, ==, >= wrong you end up with a unresolvable dependency. The flat list resolves this problem. Thanks.

@chapmanb
Copy link
Member

Björn;
Thanks for the additional details and thoughts. Ryan and Johannes said things better than I did and I think we're all agreed. I'm a little confused by your needs/use cases for old packages:

  • Why do you need/want to target multiple versions of R for older packages? My understanding is that you need the R version + packages used for an old analysis and be able to pull them up from the conda archive so someone could repeat an analysis. Do we need something more than that?
  • What is the use case for needing to edit/fix old recipes? If you have a pre-built package in anaconda.org and used that for an analysis then you shouldn't ever need to rebuilt it. Keeping one recipe updated moving forward reduces a lot of clutter and potential confusion in the recipe directories.
  • I'm confused about your dependency example. From the Galaxy side you only need to keep a conda requirements file (like we're doing with GEMINI: https://github.com/arq5x/gemini/blob/master/versioning/0.18.0/requirements_conda.txt) and then can always reinstall from this with the exact packages. If it installed once for an analysis and nothing got deleted, all should work again, right?
  • I don't really like these meta packages but am also sympathetic to the pain of trying to port all this over. My vote is that whatever is easiest for back compatibility you should do, and then try to use the cleaner approach going forward. I'm agnostic if you store these old things here or in a different project but would like to have y'all using bioconda going forward for Galaxy tool installs so we're all working on one set of package porting.

Thanks again for all the discussion.

@bgruening
Copy link
Member Author

Brad;

Sorry to make you all so much trouble. Most of my concerns are about long-term sustainability and migrating old packages.

  • Why do you need/want to target multiple versions of R for older packages? My understanding is that you need the R version + packages used for an old analysis and be able to pull them up from the conda archive so someone could repeat an analysis. Do we need something more than that?

I just thought that if I add packages (for older versions) it should in addition also target newer versions of R, isn't it? This is not needed by us, more a general question how to deal with this to support the community.

  • What is the use case for needing to edit/fix old recipes? If you have a pre-built package in anaconda.org and used that for an analysis then you shouldn't ever need to rebuilt it. Keeping one recipe updated moving forward reduces a lot of clutter and potential confusion in the recipe directories.

There are a lot of different reason, also mentioned here: #612 (comment)

From my experience a recipe is rarely perfect from the beginning, e.g.:

  • you have forgotten to enable a compile switch for one rarely used feature
  • to add a dependency that gets auto-detected and add an other feature to the package (depending on sqlite is a famous example here)
  • you need to adopt the recipe to support newer versions of some OS (OS-X has dropped some pre-installed libraries in the past)
  • links are broken not sustainable (not needed in current conda, but if they build from source at some point)

I know this is very special to Galaxy or reproducibility in general but if we care about this we should be able to fix old recipes easily.
I have read your comment about the additional sub-dirs (#232 (comment)) and if your only concern is the ease of use, I think we could hack something up that automatically creates a new directory as soon as you change the version number at merge-time.

Yes! But only for new packages and if nothing get's deleted and if we can rely on anaconda.org.
Following example, please correct me if I'm doing something stupid here:

conda create -y -c bioconda --name bx bx-python==0.7.3 numpy==1.9.2 pyyaml==3.11

This will install fine but creates an unusable bx-python. The reason is that bx-python compiles against numpy 1.10 as this was the latest version in the conda channel at time of creating the bx-python package. If you run bx-python Python will complain that this version was compiled against a different numpy version, which is true. It seems that even if you haven’t specified a strict version in the meta.yaml definition you implicitly have defined one which prevents you from recombining the different version of packages. Even worse, if this is correct we will not notice the error unless we run the tool.

  • I don't really like these meta packages but am also sympathetic to the pain of trying to port all this over. My vote is that whatever is easiest for back compatibility you should do, and then try to use the cleaner approach going forward. I'm agnostic if you store these old things here or in a different project but would like to have y'all using bioconda going forward for Galaxy tool installs so we're all working on one set of package porting.

I admit that this porting effort was to early and I caused you all to much pain. Maybe it's easier that we create our own repo for the old packages. Something like a IUC-migrate channel and store all this old cruft there. It's your call.

What really worries me is the above example, I really hope I'm doing something silly here. If this is true, it means the order of submitting packages to conda determines the implicit dependencies and in a case where anaconda.org goes down we can not replicate the packages. Moreover the flat-file idea only works for a specific set of packages, not all.

Thanks!

@johanneskoester
Copy link
Contributor

Following example, please correct me if I'm doing something stupid here:

conda create -y -c bioconda --name bx bx-python==0.7.3 numpy==1.9.2 pyyaml==3.11

This will install fine but creates an unusable bx-python. The reason is that bx-python compiles against numpy 1.10 as this was the latest version in the conda channel at time of creating the bx-python package. If you run bx-python Python will complain that this version was compiled against a different numpy version, which is true. It seems that even if you haven’t specified a strict version in the meta.yaml definition you implicitly have defined one which prevents you from recombining the different version of packages. Even worse, if this is correct we will not notice the error unless we run the tool.

Actually, conda has a way of handling this properly. As you can see here, when you add an x.x to the numpy dependency, the same numpy version that was used for building will be required as a runtime dependency. The latter can be seen here: the numpy version is encoded into the package archive name similar to the Python version, and there are multiple builds for different numpy versions.

Hence, we can easily fix the bx-python recipe to properly handle your example @bgruening.
Now, the thing is that such mechanisms are available for numpy, python, perl and R. There might still be cases out there that are not covered by these, e.g. other libraries without a stable ABI. However, strict dependencies are only a hack, in contrast to promoting the generalization of this already available mechanism in conda. I would suggest that we work on the latter instead of artificially constraining our recipes.

@johanneskoester
Copy link
Contributor

I have been looking into this. I think the place to start would be here. Currently, the mechanism for fixing a version from the build is limited here. A generalization would take the currently available version of each dependency, and add it to the metadata of the package if the dependency was specified with "x.x".

@bgruening
Copy link
Member Author

@johanneskoester x.x sounds great! Trying this here: #623

@johanneskoester
Copy link
Contributor

This PR contains the needed changes in conda-build to generalize the mechanism. I hope it will get the attention of Continuum soon.

@bgruening
Copy link
Member Author

I will close this. Thanks for the discussion :)

@bgruening bgruening closed this Sep 24, 2016
@bgruening bgruening deleted the r-packages branch September 24, 2016 21:29
@bgruening bgruening restored the r-packages branch September 27, 2016 12:28
@bgruening bgruening deleted the r-packages branch September 27, 2016 12:32
aerijman pushed a commit to aerijman/bioconda-recipes that referenced this pull request Apr 20, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants