Making Snakemake workflows into full-fledged command line tools since 1999.
Warning: this project is under active development; the API will change!
- I wanted a single command line utility that could organize and execute multiple Snakemake workflows
- I wanted to have my workflow-specific arguments be parsed on the command line (ex. with argparse)
- I wanted an API or library to configure how to group and organize the workflows, and how to display them on the command line.
- I didn't want to write this library more than once.
Python 3.6 or higher is required.
Snakemake 4.4.0 or higher is required.
To clone the repository: git clone https://github.com/nh13/snakeparse.git
.
To install locally: python setup.py install
.
The command line utility can be run with snakeparse
See the recipe on Bioconda.
See the following Documetnation
To run tests, run coverage run -m unittest discover -s src
To obtain test coverage, run codecov
.
To get started, run snakeparse --help
.
More documentation is coming soon, but see the source API documentation in the meantime.
- Using
argparse
and a custom method named snakeparser. - Using
argparse
and a concrete sub-class of SnakeParser.
For more examples, see this link.
The example below is from (1).
Consider this simple Snakemake file (snakefile) that has a required configuration option:
message = config['message']
rule all:
input:
'message.txt'
# A simple rule to write the message to the output
rule message:
output: 'message.txt'
shell: 'echo {message} > {output}'
You would need to run snakemake with the --config
option:
snakemake --snakefile </path/to/snakefile> --config message='Hello World!'
If you forget to add the correct key/value pairs with the --config
option, you'll get a KeyError
exception, which is not user-friendly to non-programmers.
At that point, you're out of luck to see all the various required and optional config key/value pairs without examining the snakefile (i.e. you want to see a help message).
Have fun adding each configuration option one-by-one and gleaning their meaning.
Even examining the source, there needs to be clear documentation within your snakefile for each argument for the user to examine.
Why can't we just use argparse
as we normally would for our command-line python scripts?
Furthermore, if you have multiple snakefiles, setting the --config
key/value pairs can get quite painful, notwithstanding the fact you need to specify the path to the specific snakefile your interested in each time.
Why can't we put all the snakefiles in one place, and have an easy way to specify which to run on the command line?
So many other command-line tools do it (ex. bwa
, samtools
, fgbio
, Picard
), and even other workflow software do it (ex. dagr
), why can't we do it?
This is why I wrote Snakeparse.
Source: examples/argparse/method/write_message.smk
Modify the above snakefile by prepending the following:
# Import the parser from snakeparse
from snakeparse.parser import argparser
def snakeparser(**kwargs):
''' The method that returns the parser with all arguments added. '''
p = argparser(**kwargs)
p.parser.add_argument('--message', help='The message.', required=True)
return p
# Get the arguments from the config file; this should always succeed.
args = snakeparser().parse_config(config=config)
# NB: you could use `args.message` directly.
message = args.message
You can run the installed snakeparse
utility as follows:
snakeparse --snakefile examples/argparse/method/write_message.smk -- --message 'Hello World!'
or
snakeparse --snakefile-globs examples/argparse/method/*smk -- WriteMessage --message 'Hello World!'
config = SnakeParseConfig(snakefile_globs='~/examples/argparse/method/*smk')
SnakeParse(args=sys.argv[1:], config=config).run()
or alternatively SnakeParse
accepts leading configuration arguments:
args = ['--snakefile-globs', '~/examples/argparse/method/*smk'] + sys.argv[1:]
SnakeParse(args=args, config=config).run()