Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update the bioconductor skeleton to pin to packages in the same bioco… #353

Merged
merged 18 commits into from
Nov 23, 2018

Conversation

dpryan79
Copy link
Contributor

@dpryan79 dpryan79 commented Nov 3, 2018

…nductor release. This closes bioconda/bioconda-recipes#11456 as best I can tell.

This also adds the ability to update/create all bioconductor packages in a single command, which is quite handy when new bioconductor releases come out. Further, this changes the r_base pinning project wide to 3.5.1. The update process for bioconductor 3.7 really drove home that we should just build for the most recent R release, since otherwise it's a massive annoyance to track down the various packages that need a custom conda_build_config.yaml.

@bgruening
Copy link
Member

@nturaga can you please look at this one from the bioc side? Thanks.

Thanks @dpryan79

@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 4, 2018

@nturaga If you have a way to get package descriptions without actually downloading the package tarballs please let us know. When I tested generating all of the bioconductor 3.8 skeletons it took quite a while simply because the various BSgenome and similar packages take a while to download. PACKAGES seems to have everything we need except that.

@dpryan79 dpryan79 mentioned this pull request Nov 4, 2018
5 tasks
@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 4, 2018

I think build-docs should work now, there was some doi string in a package that was messed up.

@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 6, 2018

One thing that should probably be changed with this is that update-all-packages should skip packages with unchanged versions by default. The data packages in particular rarely change and there's no reason to create new builds with every release needlessly.

@nturaga
Copy link
Contributor

nturaga commented Nov 8, 2018

Hi @dpryan79 ,

@nturaga If you have a way to get package descriptions without actually downloading the package tarballs please let us know. When I tested generating all of the bioconductor 3.8 skeletons it took quite a while simply because the various BSgenome and similar packages take a while to download. PACKAGES seems to have everything we need except that.

https://bioconductor.org/packages/release/bioc/VIEWS

@nturaga
Copy link
Contributor

nturaga commented Nov 8, 2018

@dpryan79

A way to get a list of all the packages is,

$$ git clone https://git@git.bioconductor.org/admin/manifest

$$ git checkout -b RELEASE_3_8 origin/RELEASE_3_8

$$ ls

data-annotation.txt	data-experiment.txt	software.txt		workflows.txt

The files there list all the packages in RELEASE_3_8, which you'll need to create/update all packages for R-3.5.1.

@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 8, 2018

@nturaga Thanks, but I think fetching PACKAGES is a bit more useful for that part, since it includes versions, dependencies, etc. What would really be awesome is to be able to fetch package descriptions without getting it from tarballs.

@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 8, 2018

Oh, Views I missed, thanks!

@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 8, 2018

I've updated this to use VIEWS instead of packages. Now tarballs are only downloaded if a package requires compilation (to try and determine which compilers are actually needed). This is still being tested.

@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 9, 2018

The tests only failed due to a time out, maybe someone can rerun that.

nturaga and others added 5 commits November 9, 2018 10:02
1. how the bioconductor versions are obtained. It is a more
straightforward way.

2. refractor 'latest_bioconductor_version()' to
'latest_bioconductor_release_version()' which is more accurate.

3. add 'latest_bioconductor_devel_version()' for future purposes.
@dpryan79 dpryan79 changed the title Update the bioconductor skeleton to pin to packages in the same bioco… [WIP] Update the bioconductor skeleton to pin to packages in the same bioco… Nov 10, 2018
@dpryan79
Copy link
Contributor Author

I suspect conda-build will need to be updated. I've found a recipe (bioconductor-hgu95av2cdf) where it seems to think that mercurial is being in the meta yaml. Of course it's not, but that prevents it from passing linting (and the linting process crashes). I'm checking into the cause of that now. Otherwise I have a local branch that speeds things up a bit more. I'm hoping to get that all finalized early this week.

@nturaga For packages requiring compilation, is there any way to determine what language they're written in (C, C++, fortran, etc.) without downloading the packages and looking?

As an aside, the single slowest step in skeleton creation is checking URLs for tarballs. For some reason, some of the servers seem to sporadically delay for 15-20 minutes before replying to a head request (I presume they're rebuilding a cache of some sort). If it weren't for that, we could update all of the bioconductor recipes in maybe 3 hours (as opposed to 6-7 hours).

@dpryan79
Copy link
Contributor Author

I've updated the PR to replace instances of HG_ with HG to get around the aforementioned conda-build API issue. This is probably as far as this PR can go, I'll remove WIP from the title once the last tests pass. Please see bioconda/bioconda-recipes#12058 for examples of the results for all bioconductor packages.

@dpryan79 dpryan79 changed the title [WIP] Update the bioconductor skeleton to pin to packages in the same bioco… Update the bioconductor skeleton to pin to packages in the same bioco… Nov 12, 2018
@dpryan79
Copy link
Contributor Author

FYI, I count about 200 CRAN packages that will need to be updated/added in conda-forge if we use update-all-packages. Granted, that's a one-off update and no different than what was done for the bioconductor 3.7 release. Of note, biocmanager also needs to be added. It's debatable whether that belongs in bioconda or conda-forge (it's a borderline case).

@nturaga
Copy link
Contributor

nturaga commented Nov 12, 2018

FYI, I count about 200 CRAN packages that will need to be updated/added in conda-forge if we use update-all-packages. Granted, that's a one-off update and no different than what was done for the bioconductor 3.7 release. Of note, biocmanager also needs to be added. It's debatable whether that belongs in bioconda or conda-forge (it's a borderline case).

HI @dpryan79 ,

The package BiocManager needs to come from conda-forge since it's primarily hosted on CRAN.

@dpryan79
Copy link
Contributor Author

There are a number of CRAN packages in bioconda, it's mostly a matter of how "directly related to biology" we consider that package to be. I agree that it's probably best to put it in conda-forge, but then again it's then the only part of the bioconductor ecosystem that's not then part of bioconda.

@nturaga
Copy link
Contributor

nturaga commented Nov 13, 2018

There are a number of CRAN packages in bioconda, it's mostly a matter of how "directly related to biology" we consider that package to be. I agree that it's probably best to put it in conda-forge, but then again it's then the only part of the bioconductor ecosystem that's not then part of bioconda.

I believe @mtmorgan might be able to provide some suggestion for this. I understand your point of view and agree that it since it's part of the bioconductor ecosystem it should be part of bioconda.

My understanding while using conda is, we install packages only through bioconda/conda-forge viaconda install <bioconductor-package>, so I'm not sure why we need BiocManager at all really. This would allow users to have more than one type of installation for packages,

  1. using BiocManager::install("package-name")

  2. conda install bioconductor-package

I'm not sure if this will create any problems, but it is worth checking to see if it does. i.e make sure they are both installing in the same R-lib (this can be checked with the command .libPaths() in R.

Regarding your other comment, I believe it is worth updating all the conda-forge recipes which need to be updated as a result of the update-all-packages function. Since bioconductor RELEASE_3_8 packages work on specific versions of CRAN packages, this will be worth it.

@dpryan79
Copy link
Contributor Author

I can't comment on how well it works to mix conda and traditional installation methods for packages, that's often a recipe for headaches for other languages (e.g., python). I was also surprised to see some bioconductor recipes depending on biocmanager, I wonder if they're incorrectly listing it.

I'm happy to add/update the relevant CRAN packages. For what it's worth, I haven't checked to see how many bioc packages actually specify versions for all of their CRAN dependencies (when they do we specify that in the recipe).

@mtmorgan
Copy link

If you capture 'Suggests:' in the recipe then authors will often use BiocManager() in an (unevaluated) code chunk to illustrate installation; a few packages, including core packages maintained by us, do actually try to install, e.g., annotation packages 'for' the user; not a very good practice!

I think of BiocManager as a Bioconductor package that happens to be hosted on CRAN for expediency (to avoid users copy / pasting URLs into their sessions).

@dpryan79
Copy link
Contributor Author

They're not from Suggests: (we capture it, but just add it as a comment), biocmanager tends to be in imports, such as in ChIPpeakAnno, perhaps that's an example of what you mentioned. At the end of the day it's neither here nor there to me where we put biocmanager, so long as everything works properly. There are actually only 28 packages that import biocmanager.

@nturaga
Copy link
Contributor

nturaga commented Nov 14, 2018

I suspect conda-build will need to be updated. I've found a recipe (bioconductor-hgu95av2cdf) where it seems to think that mercurial is being in the meta yaml. Of course it's not, but that prevents it from passing linting (and the linting process crashes). I'm checking into the cause of that now. Otherwise I have a local branch that speeds things up a bit more. I'm hoping to get that all finalized early this week.

@nturaga For packages requiring compilation, is there any way to determine what language they're written in (C, C++, fortran, etc.) without downloading the packages and looking?

As an aside, the single slowest step in skeleton creation is checking URLs for tarballs. For some reason, some of the servers seem to sporadically delay for 15-20 minutes before replying to a head request (I presume they're rebuilding a cache of some sort). If it weren't for that, we could update all of the bioconductor recipes in maybe 3 hours (as opposed to 6-7 hours).

Hi @dpryan79 ,

I've looked for ways if this is possible. It seems like the answer is no. There is no way to figure it out without downloading the package.

@dpryan79
Copy link
Contributor Author

@nturaga Too bad, but understandable. That might be something to think about for future bioconductor releases (I guess you'd just copy over that info from here, since it's then already been done).

nturaga and others added 2 commits November 14, 2018 15:14
SystemRequirements in the DESCRIPTION file provide dependencies
which are usually not fulfilled by the bioconductor-skeleton.py.

These usually represent external dependencies.

Take for instance the package 'rsbml', which has the SystemRequirement,
libsbml (==5.10.2). This is not fulfilled by this script, and should
throw a warning to the user that there is a recipe for libsbml is also
needed.

There are many packages which have such SystemRequirements with
Bioconductor, as shown in this gist.

    https://gist.github.com/nturaga/39185116b8681264e7405b868e00d405

If there are suggestions from the Bioconda team to standardizing the format of
SystemRequirements, I could try to get them standardized across
Bioconductor packages, so we have a better chance of fulfilling them
automatically.

I'm cc-ing @mtmorgan, and @vobencha to help answer add context if
needed.
@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 15, 2018

This is now updated with the most recent PR from @nturaga. It additionally includes Suggests: and SystemRequirements: sections in the meta.yaml as comments (the CRAN skeleton does the same for Suggests: already I think).

Example from DESeq2:

build:
  number: 0
  rpaths:
    - lib/R/lib/
    - lib/
# Suggests: testthat, knitr, rmarkdown, vsn, pheatmap, RColorBrewer, IHW, apeglm, ashr, tximport, tximportData, readr, pbapply, airway, pasilla (>= 0.2.10)
requirements:

And from rsbml:

build:
  number: 0
  rpaths:
    - lib/R/lib/
    - lib/
# SystemRequirements: libsbml (==5.10.2)
requirements:

Obviously both can appear. SystemRequirements: is definitely the more important of those.

@nturaga
Copy link
Contributor

nturaga commented Nov 19, 2018

Hi @dpryan79,

Just wanted to bring another issue to notice. In a fresh environment, i'm trying to install conda install bioconductor-biocparallel. It seems to be defaulting to R-3.4.1 even though a recipe for R-3.5.1 is available.

(demo2) /tmp ❯❯❯ conda install bioconductor-biocparallel
Solving environment: done

## Package Plan ##

  environment location: /Users/ni41435_ca/miniconda/envs/demo2

  added / updated specs:
    - bioconductor-biocparallel


The following packages will be downloaded:

    package                    |            build
    ---------------------------|-----------------
    curl-7.62.0                |       h74213dd_0         131 KB  conda-forge
    r-formatr-1.5              |   r341h6115d3f_1         133 KB  conda-forge
    r-lambda.r-1.2.3           |   r341h6115d3f_0          85 KB  conda-forge
    ncurses-5.9                |               10         1.1 MB  conda-forge
    readline-7.0               |                0         383 KB  conda-forge
    r-bh-1.66.0_1              |        r341_1001         9.9 MB  conda-forge
    cairo-1.14.12              |       he6fea26_5         1.2 MB  conda-forge
    harfbuzz-1.9.0             |       h08d66d9_0         1.0 MB  conda-forge
    libcurl-7.62.0             |       hbdb9355_0         455 KB  conda-forge
    libgcc-4.8.5               |                1         785 KB  conda-forge
    r-base-3.4.1               |                4        22.4 MB  conda-forge
    krb5-1.16.2                |       hbb41f41_0         1.2 MB  conda-forge
    pcre-8.39                  |                0         268 KB  conda-forge
    libssh2-1.8.0              |       h5b517e9_3         232 KB  conda-forge
    r-futile.options-1.0.1     |   r341h6115d3f_0          19 KB  conda-forge
    r-snow-0.4_3               |   r341h6115d3f_0          78 KB  conda-forge
    libpng-1.6.34              |       ha92aebf_2         321 KB  conda-forge
    r-futile.logger-1.4.3      |   r341h6115d3f_1         106 KB  conda-forge
    bioconductor-biocparallel-1.14.2|   r341h26a2512_0         1.0 MB  bioconda
    tk-8.6.9                   |       ha92aebf_0         3.0 MB  conda-forge
    libedit-3.1.20170329       |                0         155 KB  conda-forge
    ------------------------------------------------------------
                                           Total:        44.0 MB

The following NEW packages will be INSTALLED:

    bioconductor-biocparallel: 1.14.2-r341h26a2512_0 bioconda
    ca-certificates:           2018.10.15-ha4d7672_0 conda-forge
    cairo:                     1.14.12-he6fea26_5    conda-forge
    curl:                      7.62.0-h74213dd_0     conda-forge
    fontconfig:                2.13.1-hce039c3_0     conda-forge
    freetype:                  2.9.1-h6debe1e_4      conda-forge
    gettext:                   0.19.8.1-h1f1d5ed_1   conda-forge
    glib:                      2.55.0-h464dc38_2     conda-forge
    graphite2:                 1.3.12-h7d4d677_1     conda-forge
    gsl:                       2.1-2                 conda-forge
    harfbuzz:                  1.9.0-h08d66d9_0      conda-forge
    icu:                       58.2-hfc679d8_0       conda-forge
    jpeg:                      9c-h470a237_1         conda-forge
    krb5:                      1.16.2-hbb41f41_0     conda-forge
    libcurl:                   7.62.0-hbdb9355_0     conda-forge
    libedit:                   3.1.20170329-0        conda-forge
    libffi:                    3.2.1-1               bioconda
    libgcc:                    4.8.5-1               conda-forge
    libiconv:                  1.15-h470a237_3       conda-forge
    libpng:                    1.6.34-ha92aebf_2     conda-forge
    libssh2:                   1.8.0-h5b517e9_3      conda-forge
    libtiff:                   4.0.9-he6b73bb_2      conda-forge
    libxml2:                   2.9.8-h422b904_5      conda-forge
    ncurses:                   5.9-10                conda-forge
    openssl:                   1.0.2p-h470a237_1     conda-forge
    pango:                     1.40.14-he752989_2    conda-forge
    pcre:                      8.39-0                conda-forge
    pixman:                    0.34.0-h470a237_3     conda-forge
    r-base:                    3.4.1-4               conda-forge
    r-bh:                      1.66.0_1-r341_1001    conda-forge
    r-formatr:                 1.5-r341h6115d3f_1    conda-forge
    r-futile.logger:           1.4.3-r341h6115d3f_1  conda-forge
    r-futile.options:          1.0.1-r341h6115d3f_0  conda-forge
    r-lambda.r:                1.2.3-r341h6115d3f_0  conda-forge
    r-snow:                    0.4_3-r341h6115d3f_0  conda-forge
    readline:                  7.0-0                 conda-forge
    tk:                        8.6.9-ha92aebf_0      conda-forge
    xz:                        5.2.4-h470a237_1      conda-forge
    zlib:                      1.2.11-h470a237_3     conda-forge

Is there a reason this is happening which i'm not thinking about? My initial guess was, it is because bioconductor-biocparallel doesn't have the latest recipe yet, for R-3.5.1, and all bioconductor-packages have not been updated. Is this correct?

@dpryan79
Copy link
Contributor Author

I have no clue how conda prioritizes packages when there are two equal versions. If you specified "r-base==3.5.1" in addition this wouldn't happen.

@dpryan79
Copy link
Contributor Author

dpryan79 commented Nov 19, 2018

Actually, the 3.4.1 version may have been uploaded slightly later, so it's likely that suffices to be considered the "newest" version (when the actual versions are otherwise the same).

@nturaga
Copy link
Contributor

nturaga commented Nov 19, 2018

This is now updated with the most recent PR from @nturaga. It additionally includes Suggests: and SystemRequirements: sections in the meta.yaml as comments (the CRAN skeleton does the same for Suggests: already I think).

Example from DESeq2:

build:
  number: 0
  rpaths:
    - lib/R/lib/
    - lib/
# Suggests: testthat, knitr, rmarkdown, vsn, pheatmap, RColorBrewer, IHW, apeglm, ashr, tximport, tximportData, readr, pbapply, airway, pasilla (>= 0.2.10)
requirements:

And from rsbml:

build:
  number: 0
  rpaths:
    - lib/R/lib/
    - lib/
# SystemRequirements: libsbml (==5.10.2)
requirements:

Obviously both can appear. SystemRequirements: is definitely the more important of those.

Regarding the SystemRequirements, i'm going to take a look at these two packages to find out if they are of use to us in standardizing the installation of these libraries to ensure the package installation succeeds.

https://github.com/r-hub/sysreqs

https://github.com/r-hub/sysreqsdb

Do you know if there is an easy way to find conda recipes for the requirements mentioned in the sysreqdb repo?

@dpryan79
Copy link
Contributor Author

Most packages have the same name. For finding just a few packages conda search suffices. For more it's best to just parse the various repodata.json files (primarily for conda-forge, but maybe bioconda too). Basically, one could make a similar dictionary to what sysreqsdb makes.

@dpryan79
Copy link
Contributor Author

@bioconda/core I think @nturaga are both happy with the current state of things, so as long as he/you all agree this can get merged and we can start the build for bioconductor 3.8.

@nturaga
Copy link
Contributor

nturaga commented Nov 20, 2018

I concur with @dpryan79. We can deal with the SystemRequirements in a later PR. I believe it's important to get the RELEASE_3_8 packages all built to gauge any further improvements required.

@bgruening
Copy link
Member

Will look at it now ....

Copy link
Member

@bgruening bgruening left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great! Thanks for working on this @dpryan79 and @nturaga

@bgruening bgruening merged commit d54a6d8 into bioconda:master Nov 23, 2018
Given a package name, return the version pin string.

For example, if version 1.2.3 is in the current bioconductor release,
then return >=1.2.0,<1.4.0.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dpryan79 If I understand correctly, odd minor version numbers are used in Bioconductor for development versions, so wouldn't it be safer to pin >=1.2.0,<1.3.0 (i.e. use + 1 in line 527) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could go either way on that. In practice development versions aren't released, but at least with the current set up if someone makes a package for one it'll use the at-the-time most-recent bioconductor release rather than one that hasn't come out yet. I don't really have a preference there myself, but obviously if @nturaga and the bioconductor folks would prefer a narrower pinning we can easily make that happen.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignore this comment, if its too confusing:

We could also define run_exports for every bioc package like, https://github.com/conda-forge/qd-feedstock/blob/ca38df9a1c4bd61ed32dce2e83807cf37e85bd51/recipe/meta.yaml#L15

This would give us the correct pinning also for other package (not only bioc).

However, the current approach is more explicit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @nsoranzo , I agree with your comment.

@dpryan79 I think I would prefer a narrower pinning because 1.4.0 honestly doesn't do anything of consequence. It is saying that you are looking for a package in the "upcoming" release, which doesn't yet exist. Maybe @mtmorgan can add more?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, we'll narrow the pinning, that's not a problem at all.

@dpryan79 dpryan79 deleted the update_bioconductor_skeleton branch December 4, 2018 12:19
@epruesse
Copy link
Member

epruesse commented Dec 4, 2018

If we want to avoid users mixing packages from different bioconductor releases, we could have a meta-package bioconductor versioned with the bioconductor release number on which all the bioconductor-* packages depend. Catch is, however, that mixing would then become impossible (or require forcing install).

@nturaga
Copy link
Contributor

nturaga commented Dec 4, 2018

If we want to avoid users mixing packages from different bioconductor releases, we could have a meta-package bioconductor versioned with the bioconductor release number on which all the bioconductor-* packages depend. Catch is, however, that mixing would then become impossible (or require forcing install).

I don't think this is needed, we currently take the Bioconductor release version information from https://bioconductor.org/config.yaml, which is kept up to date by the Bioconductor team. This is how the skeleton works in this PR, so going forward the release is not something that is hard-coded.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

R 3.5 pinning
6 participants