This repository contains all supplementary material for paper Spaced seeds improve k-mer-based metagenomic classification by K.Brinda, M.Sykulski, G.Kucherov. Current version available at http://arxiv.org/abs/1502.06256.
Snakemake scripts used in Section 3.1.2 (Classifying unaligned reads) and 3.3 (Correlation on real genomes) are available here.
rank.cor.seed..weight.*.pdf
- Spearman rank correlation between alignment (dis)similarity and score (hits or coverage), alignment (read) lengths 100 and 250, various spaced seed weights- read length 100: RCS_rl100_w18, RCS_rl100_w20, RCS_rl100_w22, RCS_rl100_w24, RCS_rl100_w26
- read length 250: RCS_rl250_w18, RCS_rl250_w20, RCS_rl250_w22, RCS_rl250_w24, RCS_rl250_w26
relative.mutual.information..seed.weight.*.pdf
- mutual information divided by entropy is ploted as a measure of interdependence between alignment (dis)similarity and score (hits or coverage)- read length 100: RMI_rl100_w18, RMI_rl100_w20, RMI_rl100_w22, RMI_rl100_w24, RMI_rl100_w26
- read length 250: RMI_rl250_w18, RMI_rl250_w20, RMI_rl250_w22, RMI_rl250_w24, RMI_rl250_w26
- smooth.scatter..spaced.vs.contig.pdf - scatter plots of alignment (dis)similarity vs score (hits or coverage), alignment length 100
- smooth.scatter..spaced.vs.contig.zoom.rl100.pdf - as above zoomed region,
3 report files with scatter plots of alignment (dis)similarity vs score (hits or coverage), plots in several flavors, experiments on 3 real genomes.
Plots comparing seed-Kraken with original Kraken, performance and sensitivity on several data sets, spaced seeds of various weights and spans, tables with all results and used seeds.
Seed-Kraken, a modification of Kraken utilizing spaced k-mers instead of contiguous k-mers, is located in a standalone repository. For more details, see its documentation.