viralFlye is a pipeline to recover high-quality viral genomes from long-read metagenomic sequencing.
You need the conda
package manager for the installation process.
To install viralFlye locally, run:
git clone https://github.com/Dmitry-Antipov/viralFlye
cd viralFlye
install.sh
This will create a conda environment viralFlye which contain all dependencies.
Alternatively, viralFlye can be installed through bioconda, https://bioconda.github.io/recipes/viralflye/README.html
You also need the Pham HMM database for viral genome identification. If you don't have it yet, download using:
wget http://ftp.ebi.ac.uk/pub/databases/Pfam/releases/Pfam34.0/Pfam-A.hmm.gz
To run, activate the conda environemnt first, then invoke the pipeline:
conda activate viralFlye
./viralFlye.py
viraFlye takes an existing metaFlye assembly directory as input. Use metaFlye v2.9+ with --meta
option.
You will also need the original reads that were used to assemble.
Then, pipeline could be invoked as follows:
./viralFlye.py --dir flye_assembly_dir --hmm path_to_Pfam-A.hmm.gz --reads path_to_reads --outdir output_dir
Additional options are shown when running viralFlye script without parameters.
viralFlye output can be found in the outdir directory (by default, equal to the input directory)
It consists of 3 fasta files, linears_viralFlye.fasta
, components_viralFlye.fasta
and circulars_viralFlye.fasta
,
and a txt file that lists all erroneously circularized components.
Prediction of hosts within the sample is performed by a separate script crispr_host_match.py
.
It takes metaFlye result as an input, extracts circular and linear isolated contigs,predicts viruses and CRISPR spacers and matches them using BLAST.
Result (BLAST output format 6) can be found in blast.out
file in the output folder.
viralFlye package depends on the following software
- viralVerify
- viralComplete
- prodigal
- samtools
- seqtk
- minced
- minimap2
- biopython
- pysam
- samtools
- freebayes
- bcftools
- numpy
- scipy
- BLAST
viralFlye is distributed under a BSD license. See the LICENSE file for details.
Code contributors:
- Dmitry Antipov (Center for Algorithmic Biotechnology, Saint Petersburg State University)
- Mikhail Rayko (Center for Algorithmic Biotechnology, Saint Petersburg State University)
- Mikhail Kolmogorov (University of California, Santa Cruz)