Studying Acidobacteria reads from a Nanopore metagenomic data-set | Python v3.5 | PyPI (see version)
Author Samantha C Pendleton, Data Science MSc Aberystwyth University, Twitter | GitHub
Follow the Twitter bot I created, acido_bot, that dispenses daily facts about Acidobacteria!
The GC content of the Acidobacteria genomes are consistent with their placements, e.g. species in the same subdivision (above 60% for group V fragments and roughly 10% lower for group III fragments) are similar, displaying the diversity within the phylum [1]. The abundance of the subdivisions correlate with pH depends on the subdivisions: 1, 2, 3, 12, 13 have a negative relationship as pH increases, whilst 4, 6, 7, 10, 11, 16, 17, 18, 22, 25 are sparse in low pH and have a positive relationship as pH increases [2].
This package includes studying a collection of reads and gathering the ones assigned as Acidobacteria from a Kaiju output. There are various statistical information and GC plots. Futhermore, the group of unclassified Acidobacteria reads are visualised into subdivisons based on the pH level of the soil sample.
Kaiju output provides taxon ID and the corredponding sequence, my package outputs the Acidobacteria species alongside annotation, plots, and information on the unclassified reads.
- FASTA format of all the reads.
- Kaiju output after extracting the two columns: sequence ID and NCBI taxIDs.
import os
import csv
import pysam
import collections
import matplotlib.pyplot as plt
import matplotlib.patches as mpatches
import random
from termcolor import colored
from colorama import init
import click
$ pip3 install matplotlib
GitClone
$ git clone https://github.com/sap218/acidoseq.git
pip
$ pip install acidoseq
Kaiju
I used the Kaiju output: columns 2 and 3 which included sequence references and the NCBI taxons.
- Filter the output with only classified labels
$ awk '$1 == "C"' kaiju.out > kaijuC.out
- Cut the columns
$ cut -f2,3 kaijuC.out > results.txt
- Converted the txt to csv (comma-delimted)
$ sed 's/\s\+/,/g' results.txt > result_seqid_taxon.csv
If you are unsure of the pH of your soil samples, you may want to use the map script first - default city is Aberystwyth.
Please note: due to the fact that the Earth is spherical and maps are 2-dimensional, there will be some distortion when plotting locations.
$ acidomap --city Birmingham
CLI needs the Kaiju and FASTA file, all other options have defaults: e.g. pH = 5.
If no plot style was provided, or entered incorrectly, it will choose a random one.
Run like followed with Linux (find how to run with other operating systems here):
$ acidoseq --help
Usage: acidoseq [OPTIONS]
Options:
--taxdumptype TEXT Study "ALL" or only unclassified "U"?
--kaijufile TEXT Place edited Kaiju (csv) in directory for ease.
--fastapath TEXT Place FASTA in directory for ease.
--style TEXT ['seaborn-bright', 'seaborn-poster', 'seaborn-white',
'bmh', 'seaborn-darkgrid', 'seaborn-pastel',
'grayscale', '_classic_test', 'ggplot', 'seaborn-
whitegrid', 'seaborn-dark', 'seaborn-muted', 'seaborn-
colorblind', 'seaborn-ticks', 'Solarize_Light2',
'seaborn-notebook', 'dark_background', 'fast',
'seaborn', 'fivethirtyeight', 'seaborn-paper', 'seaborn-
dark-palette', 'seaborn-talk', 'classic', 'seaborn-
deep']
--plottype TEXT "span" range of GC means OR "line" average mean GC
--ph TEXT pH of soil, use map script for assistance.
--help Show this message and exit.
$ acidoseq --kaijufile result_seqid_taxon.csv --fastapath all.fa
$ acidoseq --taxdumptype ALL --kaijufile result_seqid_taxon.csv --fastapath all.fa --style ggplot --plottype span --ph 4.92
$ acidoseq --taxdumptype U --kaijufile result_seqid_taxon.csv --fastapath all.fa --style seaborn --plottype line --ph 7.14
Output
- FASTA file: a collection of reads which were identified as Acidobacteria
- Plot of AT and GC ratio comparison with means
- Indepth plot of GC ratio with subdivisions labelled (regions with 'span' and means with 'line')
- Separate FASTA files of the unclassified reads assigned into subdivisions based on the pH, e.g. a file of sequences which reside in the subdivison 1 GC span if the pH is low
- Amanda Clare, senior lecturer, MSc supervisor at Aberystwyth University, Twitter | GitHub | Staff Profile
- Sam Nicholls, postdoc at University of Birmingham, Twitter | GitHub
- Arwyn Edwards, senior lecturer at Aberystwyth University, provided the data-set, Twitter | Staff Profile
Don't hesitate to create an issue or make a suggestion!
- Make available
- Improve descriptions and comments
- Look into command line interface
- Fix code to output unclassified subdivisions based on pH
- Alter code so the input file can be the original Kaiju output
- Make available on Conda
[1] Quaiser, A., Ochsenreiter, T., Lanz, C., Schuster, S. C., Treusch, A. H., Eck, J., & Schleper, C. (2003). Acidobacteria form a coherent but highly diverse group within the bacterial domain: evidence from environmental genomics. Molecular microbiology, 50(2), 563-575.
[2] Eichorst, S. A., Breznak, J. A., & Schmidt, T. M. (2007). Isolation and characterization of soil bacteria that define Terriglobus gen. nov., in the phylum Acidobacteria. Applied and environmental microbiology, 73(8), 2708-2717.