Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update busco to v4.1.3 #103

Merged
merged 14 commits into from
Sep 28, 2020
Merged

Update busco to v4.1.3 #103

merged 14 commits into from
Sep 28, 2020

Conversation

skrakau
Copy link
Member

@skrakau skrakau commented Sep 23, 2020

First part: add extra container with updated BUSCO version.

Will contain fixes for #77.

I added an extra container for BUSCO, because

  • it requires some extra dependencies and adjustments to the Dockerfile to avoid some problems with R (for BUSCO generated plots), which in turn cause conflicts with packages in the main container
  • to keep the changes in the main container as few as possible (for the next release)

(The previous BUSCO version 4.1.2 was also incompatible with python 3.6.7 and required python 3.7, which on the other side is incompatible with centrifuge in the main container, but v4.1.3. now also works with python 3.6.7.)

I am not exactly sure, if the are any further packages which can be removed from the main environment.yml file, but I would suggest we address this in a follow-up PR if necessary.

PR checklist

  • This comment contains a description of changes (with reason)
  • If you've fixed a bug or added code that should be tested, add tests!
  • If necessary, also make a PR on the nf-core/mag branch on the nf-core/test-datasets repo
  • Ensure the test suite passes (nextflow run . -profile test,docker).
  • Make sure your code lints (nf-core lint .).
  • Documentation in docs is updated
  • CHANGELOG.md is updated
  • README.md is updated

Learn more about contributing: https://github.com/nf-core/mag/tree/master/.github/CONTRIBUTING.md

Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

edit: probably an update of the CHANGELOG would be good too.

@skrakau
Copy link
Member Author

skrakau commented Sep 24, 2020

Thanks @d4straub. I wasn't aware that with the new template one can do the container update and the actual workflow update in one PR, so that the test use the new container. I will add the other changes here too.

@skrakau
Copy link
Member Author

skrakau commented Sep 24, 2020

So I adjusted the BUSO process code for the new BUSCO version (addresses #77). So far only to allow the basic functionality, i.e. no automatic lineage selection or offline running possible.

I renamed the variable ${assembly} to ${bin} within the busco processes, and also within the process quast_bins.

Additionally I changed the busco_plot process and the channel logic, so that the process now runs on each assembler-sample combination (instead of calling the busco script generate_plot.py for each combination within the process). This also fixes the grouping of BUSCO summaries for the plot, which was wrong in case one sample name was the prefix of another one.

For the final busco summary for all bins I created an extra process.

Since current versions of the BUSCO generate_plot.py (v4.1.3) can not handle additional dots within the summary file names, I unfortunately had to replace also dots with underscores in the bin names, which will also appear as such in the BUSCO plots.

And I added an option --save_busco_reference to save the used BUSCO reference. Useful for later re-use, since BUSCO datasets are frequently updated and old versions not always remain accessible.

The error handling of BUSCO should be improved (https://gitlab.com/ezlab/busco/-/issues/296), so no further handling of this will be required.

@skrakau skrakau marked this pull request as draft September 24, 2020 08:49
@skrakau skrakau changed the title Update busco to v4.1.3 in extra container Update busco to v4.1.3 Sep 24, 2020
@d4straub d4straub self-requested a review September 25, 2020 13:06
@skrakau skrakau marked this pull request as ready for review September 25, 2020 13:10
Copy link
Collaborator

@d4straub d4straub left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could not see any problem (besides that cosmetic edit), looks good to me!

Copy link
Member

@ggabernet ggabernet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me too. The additional container should be a bridge towards porting the pipeline to DSL2. But I agree that it would be great to release it first once :)

@skrakau
Copy link
Member Author

skrakau commented Sep 28, 2020

Regarding the extra container: busco 4.1.3 (in contrast to busco 4.1.2) is now compatible with python 3.6.7 in the main container, however, the required augustus version etc. cause conflicts with the main container. Thus even if we would skip the plot generation scripts from busco (process busco_plot), we would need an extra container.

@skrakau
Copy link
Member Author

skrakau commented Sep 28, 2020

Hi @nf-core/core , we need an extra container for BUSCO in this pipeline, because it causes conflicts with the other required packages in the main container. And we need a stable release of the pipeline, before we can migrate to DSL2.
I added extra steps in the .github/workflows/ci.yml to build or pull the BUSCO container. It works smoothly for the test with the current nextflow version, but somehow fails for nextflow 19.10.0 (locally the tests work for nextflow 19.10.0). So it seems to be a problem in context of the ci.yml.

Does anyone have an idea what the problem might be here?

@drpatelh
Copy link
Member

It looks like that version of NF is trying to use Conda instead of the Docker container as you have defined in this line. If you specify the conda directive then it gets used which is why we have added an extra param in nf-core/modules. Try pushing a commit without that line and see if it works. Be good if you can see if this has been fixed in the latest release because we can remove that logic from modules 🎉 Another option is to bump the minimum required version of Nextflow which you will have to do eventually for DSL2 anyway.

@ggabernet ggabernet merged commit 3ba26d1 into nf-core:dev Sep 28, 2020
@drpatelh
Copy link
Member

Note: The pipeline won't work with -profile conda now 😅

@skrakau
Copy link
Member Author

skrakau commented Sep 28, 2020

Thanks @drpatelh ! Without specifying the conda directive both tests run through.

But also with the latest nextflow release it seems the conda environment is activated even if the docker profile is used. Just somehow the tests do not fail. At least when looking into the .command.run, the conda environment is activated, and then docker is used. Not sure.

@drpatelh
Copy link
Member

Ok. Probably best to avoid specifying both then just in case it causes a software conflict somehow with the container. As I mentioned in the previous comment though this means you will have to strip out any docs suggesting that -profile conda works. Can't remember if removing the install with bioconda badge will break the linting now 🤔

@skrakau
Copy link
Member Author

skrakau commented Sep 28, 2020

@drpatelh thought maybe one can enable the conda profile again using something like if (hasProperty("conda")) conda = "$baseDir/containers/busco/environment.yml"? Had some conda problems, so couldn't test it further yet

@drpatelh
Copy link
Member

Ummm, possibly but that will depend on whether the -profile variables for the workflow are available in the relevant scope e.g here. Don't think they are available in configs or modules but definitely worth a proper test 👍

@skrakau skrakau mentioned this pull request Sep 29, 2020
@skrakau skrakau deleted the update_busco4.1.3 branch May 31, 2021 13:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants