A pipeline for processing reads from a sequencing run. Currently supports Illumina or Ion Torrent, but it can be expanded to other platforms.
# Run SneakerNet on the example data
SneakerNetPlugins.pl --numcpus 4 t/M00123-18-001-test
This is the default workflow in v0.14 but there are other workflows available as described in PLUGINS.md.
- Parse sample entries - create an input file
samples.tsv
- Read metrics - get raw read yields and raw read QC summary (CG-Pipeline)
- Assembly - assemble each genome (Shovill/skesa)
- MLST - 7-gene MLST (mlst)
- Run Kraken
- Contamination detection - check that all reads come from one taxon for each genome (Kraken)
- Contamination detection - check that all seven MLST genes have only one instance in the genome as expected (ColorID)
- Base balance - check that the ratio of A/T is approximately 1 and same with C/T
- Antimicrobial resistance gene prediction - detect genotype and predict phenotype (staramr)
- Pass/fail - list all genomes that have failed Q/C
- Transfer Files - files are copied to a remote folder
- HTML summary report
- Email the report
- Install and configure SneakerNet - from source or with a container
- Make an input folder from your MiSeq run docs/SneakerNetInput.md
- Run
SneakerNetPlugins.pl
on the input folder.
See docs/INSTALL.md
NOTE: to ensure all dependencies are met, please follow the dependencies section under the installation document.
SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.
Here is a summary of Docker commands, from the containers documentation.
# Pull image
docker pull lskatz/sneakernet:latest
# Import data directly from the MiSeq machine, where $MISEQ is a raw run folder exported by the MiSeq machine
# and $INDIR is the newly created SneakerNet input folder
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNet.roRun.pl /data/$MISEQ -o /data/$INDIR
# Run SneakerNet on the $INDIR (SneakerNet formatted folder)
docker run --rm -v $PWD:/data -v $KRAKEN_DEFAULT_DB:/kraken-database -u $(id -u):$(id -g) lskatz/sneakernet:latest SneakerNetPlugins.pl --numcpus 12 --no email --no transfer --no save /data/$INDIR
For more information on a SneakerNet-style folder, see docs/SneakerNetInput.md
SneakerNet requires a project directory that is in a certain format already.
To create the project, you can use SneakerNet.roRun.pl
. For example,
SneakerNet.roRun.pl --createsamplesheet -o M1234-18-001-test miseq/working/directory
M01234-19-01-test is a project folder name, where it is dash-delimited and contains
machine name, year, ordinal, and optionally a name.
Fastq files must be in the format of _R1_
instead of _1
and _R2_
instead of _2
for this particular script to parse the files properly.
It is generally a good idea to edit a file snok.txt
to configure the run further.
For more information on the workflow, see the configuration section in INSTALL.md
.
For example,
echo "emails = example@example.com, blah@example.com" > t/data/M00123-18-001/snok.txt
echo "workflow = default" >> t/data/M00123-18-001/snok.txt
And then run SneakerNet like so (optionally following the log with tail -f
):
SneakerNetPlugins.pl --numcpus 8 t/data/M00123-18-001 > t/data/M00123-18-001/SneakerNet.log 2>&1 &
tail -f t/data/M00123-18-001/SneakerNet.log
SneakerNet has been containerized and is at dockerhub. For more information, please see our containers documentation.
For more information, please see docs/SneakerNetOutput.md
SneakerNet produces a subfolder SneakerNet/
in your run directory.
It also emails a report. To view a sample report, please go to
t/report.html
in this repository.
SneakerNet is based on plugins. In this context, a plugin is an independent script
that can run an analysis on a run directory, accept standard inputs (e.g., --help
),
and create standard output files.
For more details, see the plugins readme.
You too can develop for SneakerNet! For more information, please look at the readme for plugins and the contributing doc.
Please see the docs subfolder for more specific documentation.
For inline documentation on some of the perl code, run perldoc lib/perl5/SneakerNet.pm
.
Griswold, T., Kapsak, C., Chen, J. C., den Bakker, H. C., Williams, G., Kelley, A., Vidyaprakash, E., & Katz, L. S. (2021). SneakerNet: A modular quality assurance and quality check workflow for primary genomic and metagenomic read data. Journal of open source software, 6(60), 10.21105/joss.02334. https://doi.org/10.21105/joss.02334