-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This repository is used as a journal to track Roberto Rossini's analysis steps to reproduce the paper Multi-omics Reveals the Lifestyle of the Acidophilic, Mineral-Oxidizing Model Species Leptospirillum ferriphilumT by Stephan Christel et al. (2018) doi:10.1128/AEM.02091-17 as part of the Genome Analysis course at Uppsala University (Bioinformatics Programme 2018/2019).
Leptospirillum ferriphilum is a gram-negative prokaryote that plays an important role in acidic metal-rich environments, where is one of the main responsible of iron oxidation. Up until ~2 years ago, no complete genome for this organism was available. In 2017 Stephan Christel et al. sequenced Leptospirillum ferriphilum's genome using PacBio SMRT long-read sequencing in the attempt to produce an high quality genome assembly that could be used as a reference by other studies. Transcript and protein levels were also measured in order to explore differences in the metabolism of Leptospirillum ferriphilum when grown in different conditions, namely continuous culture with ferrous iron and bioleaching culture with chalcopyrite (CuFeS2).
- Produce an high quality genome assembly that can be used as reference in other studies
- Annotate the genome to study genes involved in environment adaptation and stress response
- Study how different environment conditions affect gene expression by comparing RNA-Seq expression levels and mass spectrometry protein-level data
The data used in the course of this analysis consist in:
- Raw DNA read data (PacBio SMRT)
- Raw RNA read data (HiSeq2500)
For more information about the data, head over to the data section of the wiki.
The analysis workflow can be schematized in:
- Genome Assembly
- Genome Annotation
- Transcriptome Assembly
- Differential expression analysis
For more detail, have a look at the analysis section of the wiki.
For a complete list of software and settings, refer to the software section of the wiki.
For compute-intensive tasks, Rackham (UPPMAX) computing cluster was used. Data visualization and simple computing tasks were carried out on a consumer laptop running Manjaro-KDE 18.0.4 Illyria (Linux kernel 5.1.4).
- FastQC: raw reads quality check
- kraken2: identification and removal of contaminant reads
- Canu: Read pre-processing (read quality check and trimming) and genome assembly
- QUAST, Gepard: Evaluation of assembly quality
- Prokka, eggNOG: Genome Annotation
- FastQC: raw reads quality check
- kraken2: identification and removal of contaminant reads
- BBDuk: read trimming and adapter removal
- HISAT2: read mapping
- samtools: SAM to BAM conversion, BAM sorting and merging
- Trinity: De-Novo transcriptome assembly
- TransDecoder: Identification of candidate coding regions
- HMMER: Searching protein database using profile HMM
- Diamond: Faster alternative to BLAST. Used to query protein databases
- Salmon: Transcript quantification
- edgeR: Differential expression analysis