-
Notifications
You must be signed in to change notification settings - Fork 240
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enhance cwltoil to support SoftwareRequirements & BioContainers. #1943
Conversation
I'll add a couple tests based on the examples above - just found cwlTest.py and it looks like a very clean interface for testing. |
17e89fe
to
339d7b4
Compare
I finally got my two news tests to pass - it looks like this current failure is a transient failure unrelated to these changes right (it didn't crop up on the other runs for this PR)? I don't have a retry button in your Jenkins server - want me to rebase with an arbitrary change to rerun the tests or is it okay the way it is? |
Jenkins, test this please. |
Tests passed, but got this error: From the end of: |
This enables the reproducibilty stack described in [this preprint](https://www.biorxiv.org/content/early/2017/10/11/200683) and [presented at BOSC 2017](http://jmchilton.github.io/writing/bosc2017slides/biocontainers.html) under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including `` --beta-conda-dependencies``, ``--beta-dependency-resolvers-configuration``, and ``--beta-use-biocontainers``. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta). Here I will quickly review a couple of the available options against test examples available in cwltool's ``tests`` directory using this branch of Toil. ``` git clone https://github.com/common-workflow-language/cwltool.git cd cwltool ``` From here we can quickly demonstrate installation and resolution of CWL ``SoftwareRequirement`` hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicit ``DockerRequirement`` but does define the following ``SoftwareRequirement`` in its ``hints`` as follows: ``` hints: SoftwareRequirement: packages: - package: seqtk version: - r93 ``` We can try this tool out with ``cwltoil`` and see that by default we probably don't have the binary seqtk on our ``PATH`` and so the tool fails using the following command: ``` cwltoil tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` This should result in a tool execution failure. We can then instruct ``cwltoil`` to install the required package from Bioconda into an isolated environment and use it as needed by passing it the ``--beta-conda-dependencies`` flag as follows: ``` cwltoil --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` The tool should now be successful. The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional ``SoftwareRequirement`` resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the ``--beta-dependency-resolvers-configuration`` option instead of the simple shortcut ``--beta-conda-dependencies``. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in [galaxy-lib's documentation](http://galaxy-lib.readthedocs.io/en/latest/topics/dependency_resolution.html). In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The [Biocontainers](https://github.com/BioContainers) project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently. Continuing with the example above, the new `--beta-use-biocontainers` flag instructs ``cwltoil`` to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools). ``` cwltoil --beta-use-biocontainers tests/seqtk_seq.cwl tests/seqtk_seq_job.json ``` These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.
@ejacox I guess this is expected - the tests run from that directory and the Conda packages that get installed get installed into the working directory by default - cwltool does the same thing. I've just added this file to .gitignore for now - if there is something more "Toil-ish" to be done let me know - I can add an override point for the tests but keep it that way for the CLI interface by default, I can modify it to install things into $HOME by default, or I could try to run the CWL tests with a different CWD. |
Thank you @jmchilton |
Enhance cwltoil to support SoftwareRequirements & BioContainers.
This enables the reproducibilty stack described in this preprint and presented at BOSC 2017 under Toil. Concretely this enables all the same options in cwltoil as added to cwltool in common-workflow-language/cwltool#214 including
--beta-conda-dependencies
,--beta-dependency-resolvers-configuration
, and--beta-use-biocontainers
. The first two of these are documented in depth in cwltool's README (https://github.com/common-workflow-language/cwltool/#leveraging-softwarerequirements-beta).Here I will quickly review a couple of the available options against test examples available in cwltool's
tests
directory using this branch of Toil.From here we can quickly demonstrate installation and resolution of CWL
SoftwareRequirement
hints using Conda using the tests/seqtk_seq.cwl tool. This tool doesn't define an explicitDockerRequirement
but does define the followingSoftwareRequirement
in itshints
as follows:We can try this tool out with
cwltoil
and see that by default we probably don't have the binary seqtk on ourPATH
and so the tool fails using the following command:This should result in a tool execution failure. We can then instruct
cwltoil
to install the required package from Bioconda into an isolated environment and use it as needed by passing it the--beta-conda-dependencies
flag as follows:The tool should now be successful.
The Conda support can be endless tweaked but the defaults are defaults that target the best practice Conda channels that work well for the Galaxy project. Additional
SoftwareRequirement
resolution options are available including targetting Software Modules, lmod, Homebrew, simple scripts called "Galaxy packages". All of these options can be specified and configured with a YAML file passed to cwltoil using the--beta-dependency-resolvers-configuration
option instead of the simple shortcut--beta-conda-dependencies
. The cwltool documentation walks through a few examples of adapting infrastructure to tools and tools to package managers. Reference documentation is available in galaxy-lib's documentation.In addition to options that allow configuring tool execution environments, containers themselves can be discovered and/or built from these software requirements. The Biocontainers project (previously Biodocker) contains a registry we use for this purpose. Every version of every Bioconda package has a corresponding best-practice (very lightweight, very small) Docker container on quay.io. There are over 3000 such containers currently.
Continuing with the example above, the new
--beta-use-biocontainers
flag instructscwltoil
to fetch the corresponding Biocontainers container from quay.io automatically or build one to use locally (required for instance for tools with multiple software requirements - fat tools).These containers contain the same binaries that the package would use locally (outside of Docker). Therefore this technique allows cross platform reproducibility/remixability across cwltool, cwltoil, Galaxy, and CLI - both inside and outside of containers.