Scripts for identification taxon-specific k-mers from plant genomes and for the detection and counting the k-mers directly from WGS reads of metagenomic sample.
PlantTaxSeeker scripts are licensed under the GPLv3 license.
The scripts consists predominantly of code written in Python (tested in UNIX server with Python versions 2.7 and 3.3) and also use:
glistmaker
, glistcompare
, glistquery
, MakeUnion.pl
and gmer_counter
from the GenomeTester4 package
python identification_of_taxon_specific_kmers.py <Targets.fasta> <Nontargets.fasta> [optional_arguments]
The optional arguments can also be specified:
- -w Length of the k-mer (default value 32)
- -f The minimum number of target sequences that should contain every specific k-mers (default value 1)
Input files:
- Target taxon genome sequences as FASTA format file
- Nontarget taxa genome sequences as FASTA format file
Output files:
- The list of target taxon specific k-mers (the count of k-mers and sequences) as binary file
- The list of target taxon specific k-mers (the count of k-mers and sequences) as TEXT file
2. To filter out additional non-specific k-mers using whole genome sequencing raw reads or assembled sequences of nontarget taxa, use command.
python filtering_with_nontargets.py <Specific_kmers.list> <Nontarget1.fastq> [Nontargets fastqs] [optional_arguments]
The optional arguments can also be specified:
- -w Length of the k-mer (bases, by default 32)
- -f The k-mer frequency cutoff (only k-mers from nontarget sequences with at least given frequency cutoff will be filtered out from target k-mer list) (by default 10)
Input files:
- Unfiltered target taxon specific k-mers list as binary file (the output file of identification_of_taxon_specific_kmers.py)
- Nontarget taxon fastq files for filtering nonspecific k-mers
Output files:
- Target taxon specific k-mers list as binary file (contains only k-mers that are not in nontarget taxa fastq files)
- Target taxon specific k-mers list as TXT file
README file for executing scripts for the identification Solanum lycopersicum specific k-mers are available in Github
3. To detect and count plant taxa specific k-mers from whole genome sequencing raw reads of metagenomic sample, use command.
python plant_taxa_kmers_counter.py <Specific_kmers.list> <Metagenomic_sample.fastq> [optional_argument]
The optional argument can also be specified:
- -f The k-mer frequency cutoff (only k-mers with at least given frequency cutoff will be counted from metagenomic sequencing reads) (by default 1)
Input files:
- Target taxon specific k-mers list as TXT file (the output file of identification_of_taxon_specific_kmers.py)
- fastq file of WGS reads from metagenomic sample
Output
- The count of detected target plant taxon specific k-mers in WGS reads from metagenomic sample.
An example: the identification of Lupinus spp. specific k-mers and counting Lupinus spp. specific k-mers from WGS reads of lupin-containing cookie:
README file for executing scripts for the identification Lupinus spp. specific k-mers and for counting of Lupinus spp. specific k-mers from cookie WGS data are available in Github