This is a simple, snakemake-based pipeline that takes an NCBI accession list of a given SRA#, and performs de novo OTU clustering via qiime2's vsearch wrapper. End results are given in Qiime2 Artifacts (.qza files), though these can be extracted the same as any .zip file, if you so choose
- Clone this repository and create & activate a new conda environment with the provided environment file
git clone --depth 1 git@github.com:Pinjontall94/asd-q2.git /your/new/analysis/folder
mamba env create -f environment.yaml
conda activate snakeqiimer
Note: the standard conda tool that comes with Anaconda will work, but as Snakemake itself recommends, I highly encourage you to use mamba (whether on its own, or via the mambaforge distribution)
-
Download the NCBI Accession List (e.g. "SRR_Acc_list.txt") and move it into the asd-q2 folder
-
Run the following in the asd-q2 folder:
python scripts/srr_munch.py -i SRR_Acc_List.txt -o data
- Modify the config file ("config.yaml") to fit your analysis Update the following parameters, in plain text, unless otherwise specified:
- "AUTHOR": a string containing no spaces (e.g. "Franklin_53")
- "primers", "FWD" and "REV": integer values only (e.g. FWD: 5)
- Optional: "offset", FWD or REV for 5' and 3' bp-wise offsets, respectively
- Optional: "THREADS", specify the number of CPU threads to allocate to the pipeline (e.g. THREADS: 8)
AUTHOR: "Franklin_53"
primers:
FWD: GTGCCAGCMGCCGCGGTAA
REV: ATTAGASACCCBDGTAGTCC
# Number of nucleotides to trim from reads' 5' (FWD) and 3' (REV) ends
offset:
FWD: 5
REV: 4
THREADS: 8
Note: Requires graphviz is installed
(snakeqiimer) /your/new/analysis/folder/asd-q2 $ snakemake --dag | dot -Tsvg > dag.svg
Run with:
(snakeqiimer) /your/new/analysis/folder/asd-q2 $ snakemake -cN # where N = number of cores
Your output files will be stored in a newly made "OTUs" folder
-
Download and unzip all fastq.gz's listed in the accession list as SRR numbers, and place them in a "data" folder
-
Generate a Qiime2-compatible manifest file for the resulting fastqs Note: Only tested on PHRED33 fastqs
-
Import Seqs
-
Merge paired-end reads with q2-vsearch's join pairs
-
Dereplicate the SampleData[Sequences] artifact
-
De novo cluster FeatureTable[Frequency] and FeatureData[Sequence] artifacts
-
Generate FeatureTable and FeatureData summaries
-
Create a tree for phylogenetic diversity analyses
-
Determine alpha and beta diversity
- Add conditional to handle all PHRED values compatible with Qiime2
- Add rule for Qiime2 that uses the artifact api
- Add examples folder showing sample workflows
- Organize rules into a separate folder? (Maybe not necessary)
- Add instructions for running remotely via slurm and/or GCP ()