@@ -139,6 +139,212 @@ The easiest way to use cwltool to run a tool or workflow from Python is to use a
139139
140140 # result["out"] == "foo"
141141
142+ Leveraging SoftwareRequirements (Beta)
143+ --------------------------------------
144+
145+ CWL tools may be decoarated with ``SoftwareRequirement `` hints that cwltool
146+ may in turn use to resolve to packages in various package managers or
147+ dependency management systems such as `Environment Modules
148+ <http://modules.sourceforge.net/> `__.
149+
150+ Utilizing ``SoftwareRequirement `` hints using cwltool requires an optional
151+ dependency, for this reason be sure to use specify the ``deps `` modifier when
152+ installing cwltool. For instance::
153+
154+ $ pip install 'cwltool[deps]'
155+
156+ Installing cwltool in this fashion enables several new command line options.
157+ The most general of these options is ``--beta-dependency-resolvers-configuration ``.
158+ This option allows one to specify a dependency resolvers configuration file.
159+ This file may be specified as either XML or YAML and very simply describes various
160+ plugins to enable to "resolve" ``SoftwareRequirement `` dependencies.
161+
162+ To discuss some of these plugins and how to configure them, first consider the
163+ following ``hint `` definition for an example CWL tool.
164+
165+ .. code :: yaml
166+
167+ SoftwareRequirement :
168+ packages :
169+ - package : seqtk
170+ version :
171+ - r93
172+
173+ Now imagine deploying cwltool on a cluster with Software Modules installed
174+ and that a ``seqtk `` module is avaialble at version ``r93 ``. This means cluster
175+ users likely won't have the ``seqtk `` the binary on their ``PATH `` by default but after
176+ sourcing this module with the command ``modulecmd sh load seqtk/r93 `` ``seqtk `` is
177+ available on the ``PATH ``. A simple dependency resolvers configuration file, called
178+ ``dependency-resolvers-conf.yml `` for instance, that would enable cwltool to source
179+ the correct module environment before executing the above tool would simply be:
180+
181+ .. code :: yaml
182+
183+ - type : module
184+
185+ The outer list indicates that one plugin is being enabled, the plugin parameters are
186+ defined as a dictionary for this one list item. There is only one required parameter
187+ for the plugin above, this is ``type `` and defines the plugin type. This parameter
188+ is required for all plugins. The available plugins and the parameters
189+ available for each are documented (incompletely) `here
190+ <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html> `__.
191+ Unfortunately, this documentation is in the context of Galaxy tool ``requirement `` s instead of CWL ``SoftwareRequirement `` s, but the concepts map fairly directly.
192+
193+ cwltool is distributed with an example of such seqtk tool and sample corresponding
194+ job. It could executed from the cwltool root using a dependency resolvers
195+ configuration file such as the above one using the command::
196+
197+ cwltool --beta-dependency-resolvers-configuration /path/to/dependency-resolvers-conf.yml \
198+ tests/seqtk_seq.cwl \
199+ tests/seqtk_seq_job.json
200+
201+ This example demonstrates both that cwltool can leverage
202+ existing software installations and also handle workflows with dependencies
203+ on different versions of the same software and libraries. However the above
204+ example does require an existing module setup so it is impossible to test this example
205+ "out of the box" with cwltool. For a more isolated test that demonstrates all
206+ the same concepts - the resolver plugin type ``galaxy_packages `` can be used.
207+
208+ "Galaxy packages" are a lighter weight alternative to Environment Modules that are
209+ really just defined by a way to lay out directories into packages and versions
210+ to find little scripts that are sourced to modify the environment. They have
211+ been used for years in Galaxy community to adapt Galaxy tools to cluster
212+ environments but require neither knowledge of Galaxy nor any special tools to
213+ setup. These should work just fine for CWL tools.
214+
215+ The cwltool source code repository's test directory is setup with a very simple
216+ directory that defines a set of "Galaxy packages" (but really just defines one
217+ package named ``random-lines ``). The directory layout is simply::
218+
219+ tests/test_deps_env/
220+ random-lines/
221+ 1.0/
222+ env.sh
223+
224+ If the ``galaxy_packages `` plugin is enabled and pointed at the
225+ ``tests/test_deps_env `` directory in cwltool's root and a ``SoftwareRequirement ``
226+ such as the following is encountered.
227+
228+ .. code :: yaml
229+
230+ hints :
231+ SoftwareRequirement :
232+ packages :
233+ - package : ' random-lines'
234+ version :
235+ - ' 1.0'
236+
237+ Then cwltool will simply find that ``env.sh `` file and source it before executing
238+ the corresponding tool. That ``env.sh `` script is only responsible for modifying
239+ the job's ``PATH `` to add the required binaries.
240+
241+ This is a full example that works since resolving "Galaxy packages" has no
242+ external requirements. Try it out by executing the following command from cwltool's
243+ root directory::
244+
245+ cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf.yml \
246+ tests/random_lines.cwl \
247+ tests/random_lines_job.json
248+
249+ The resolvers configuration file in the above example was simply:
250+
251+ .. code :: yaml
252+
253+ - type : galaxy_packages
254+ base_path : ./tests/test_deps_env
255+
256+ It is possible that the ``SoftwareRequirement `` s in a given CWL tool will not
257+ match the module names for a given cluster. Such requirements can be re-mapped
258+ to specific deployed packages and/or versions using another file specified using
259+ the resolver plugin parameter `mapping_files `. We will
260+ demonstrate this using `galaxy_packages ` but the concepts apply equally well
261+ to Environment Modules or Conda packages (described below) for instance.
262+
263+ So consider the resolvers configuration file
264+ (`tests/test_deps_env_resolvers_conf_rewrite.yml `):
265+
266+ .. code :: yaml
267+
268+ - type : galaxy_packages
269+ base_path : ./tests/test_deps_env
270+ mapping_files : ./tests/test_deps_mapping.yml
271+
272+ And the corresponding mapping configuraiton file (`tests/test_deps_mapping.yml `):
273+
274+ .. code :: yaml
275+
276+ - from :
277+ name : randomLines
278+ version : 1.0.0-rc1
279+ to :
280+ name : random-lines
281+ version : ' 1.0'
282+
283+ This is saying if cwltool encounters a requirement of ``randomLines `` at version
284+ ``1.0.0-rc1 `` in a tool, to rewrite to our specific plugin as ``random-lines `` at
285+ version ``1.0 ``. cwltool has such a test tool called ``random_lines_mapping.cwl ``
286+ that contains such a source ``SoftwareRequirement ``. To try out this example with
287+ mapping, execute the following command from the cwltool root directory::
288+
289+ cwltool --beta-dependency-resolvers-configuration tests/test_deps_env_resolvers_conf_rewrite.yml \
290+ tests/random_lines_mapping.cwl \
291+ tests/random_lines_job.json
292+
293+ The previous examples demonstrated leveraging existing infrastructure to
294+ provide requirements for CWL tools. If instead a real package manager is used
295+ cwltool has the oppertunity to install requirements as needed. While initial
296+ support for Homebrew/Linuxbrew plugins is available, the most developed such
297+ plugin is for the `Conda <https://conda.io/docs/# >`__ package manager. Conda has the nice properties
298+ of allowing multiple versions of a package to be installed simultaneously,
299+ not requiring evalated permissions to install Conda itself or packages using
300+ Conda, and being cross platform. For these reasons, cwltool may run as a normal
301+ user, install its own Conda environment and manage multiple versions of Conda packages
302+ on both Linux and Mac OS X.
303+
304+ The Conda plugin can be endlessly configured, but a sensible set of defaults
305+ that has proven a powerful stack for dependency management within the Galaxy tool
306+ development ecosystem can be enabled by simply passing cwltool the
307+ ``--beta-conda-dependencies `` flag.
308+
309+ With this we can use the seqtk example above without Docker and without
310+ any externally managed services - cwltool should install everything it needs
311+ and create an environment for the tool. Try it out with the follwing command::
312+
313+ cwltool --beta-conda-dependencies tests/seqtk_seq.cwl tests/seqtk_seq_job.json
314+
315+ The CWL specification allows URIs to be attached to ``SoftwareRequirement `` s
316+ that allow disambiguation of package names. If the mapping files described above
317+ allow deployers to adapt tools to their infrastructure, this mechanism allows
318+ tools to adapt their requirements to multiple package managers. To demonstrate
319+ this within the context of the seqtk, we can simply break the package name we
320+ use and then specify a specific Conda package as follows:
321+
322+ .. code :: yaml
323+
324+ hints :
325+ SoftwareRequirement :
326+ packages :
327+ - package : seqtk_seq
328+ version :
329+ - ' 1.2'
330+ specs :
331+ - https://anaconda.org/bioconda/seqtk
332+ - https://packages.debian.org/sid/seqtk
333+
334+ The example can be executed using the command::
335+
336+ cwltool --beta-conda-dependencies tests/seqtk_seq_wrong_name.cwl tests/seqtk_seq_job.json
337+
338+ The plugin framework for managing resolution of these software requirements
339+ as maintained as part of `galaxy-lib <https://github.com/galaxyproject/galaxy-lib >`__ - a small, portable subset of the Galaxy
340+ project. More information on configuration and implementation can be found
341+ at the following links:
342+
343+ - `Dependency Resolvers in Galaxy <https://docs.galaxyproject.org/en/latest/admin/dependency_resolvers.html >`__
344+ - `Conda for [Galaxy] Tool Dependencies <https://docs.galaxyproject.org/en/latest/admin/conda_faq.html >`__
345+ - `Mapping Files - Implementation <https://github.com/galaxyproject/galaxy/commit/495802d229967771df5b64a2f79b88a0eaf00edb >`__
346+ - `Specifications - Implementation <https://github.com/galaxyproject/galaxy/commit/81d71d2e740ee07754785306e4448f8425f890bc >`__
347+ - `Initial cwltool Integration Pull Request <https://github.com/common-workflow-language/cwltool/pull/214 >`__
142348
143349Cwltool control flow
144350--------------------
0 commit comments