Home

Welcome to the lferriphilum wiki

This repository is used as a journal to track Roberto Rossini's analysis steps to reproduce the paper Multi-omics Reveals the Lifestyle of the Acidophilic, Mineral-Oxidizing Model Species Leptospirillum ferriphilum^T by Stephan Christel et al. (2018) doi:10.1128/AEM.02091-17 as part of the Genome Analysis course at Uppsala University (Bioinformatics Programme 2018/2019).

Background

Leptospirillum ferriphilum is a gram-negative prokaryote that plays an important role in acidic metal-rich environments, where is one of the main responsible of iron oxidation. Up until ~2 years ago, no complete genome for this organism was available. In 2017 Stephan Christel et al. sequenced Leptospirillum ferriphilum's genome using PacBio SMRT long-read sequencing in the attempt to produce an high quality genome assembly that could be used as a reference by other studies. Transcript and protein levels were also measured in order to explore differences in the metabolism of Leptospirillum ferriphilum when grown in different conditions, namely continuous culture with ferrous iron and bioleaching culture with chalcopyrite (CuFeS₂).

Research questions and objectives

Produce an high quality genome assembly that can be used as reference in other studies
Annotate the genome to study genes involved in environment adaptation and stress response
Study how different environment conditions affect gene expression by comparing RNA-Seq expression levels and mass spectrometry protein-level data

Data

The data used in the course of this analysis consist in:

Raw DNA read data (PacBio SMRT)
Raw RNA read data (HiSeq2500)

For more information about the data, head over to the data section of the wiki.

Analysis outline

The analysis workflow can be schematized in:

Genome Assembly
Genome Annotation
Transcriptome Assembly
Differential expression analysis

For more detail, have a look at the analysis section of the wiki.

Software

For a complete list of software and settings, refer to the software section of the wiki.

For compute-intensive tasks, Rackham (UPPMAX) computing cluster was used. Data visualization and simple computing tasks were carried out on a consumer laptop running Manjaro-KDE 18.0.4 Illyria (Linux kernel 5.1.4).

Genome Assembly and Annotation

FastQC: raw reads quality check
kraken2: identification and removal of contaminant reads
Canu: Read pre-processing (read quality check and trimming) and genome assembly
QUAST, Gepard: Evaluation of assembly quality
Prokka, eggNOG: Genome Annotation

Transcriptome assembly

FastQC: raw reads quality check
kraken2: identification and removal of contaminant reads
BBDuk: read trimming and adapter removal
HISAT2: read mapping
samtools: SAM to BAM conversion, BAM sorting and merging
Trinity: De-Novo transcriptome assembly
TransDecoder: Identification of candidate coding regions
HMMER: Searching protein database using profile HMM
Diamond: Faster alternative to BLAST. Used to query protein databases

RNA-Seq Differential Expression Analysis

Salmon: Transcript quantification
edgeR: Differential expression analysis

Provide feedback

Saved searches

Use saved searches to filter your results more quickly