Description:
Repository for scripts and resources used for genome-centric metagenomics of anaerobic digester sludge (and Zymo Mock) with different sequencing approaches (Nanopore R9/R10, PacBio HiFi, Illumina short reads).
Quick links:
- Published paper featuring the presented results can be accessed here.
- Sequencing read datasets are available at ENA for the Zymo and anaerobic digester samples.
- Anaerobic digester MAGs and Zymo assembly sequences can be downloaded from Figshare.
- Summary of using Nanopore R10.4 for MAG recovery.
- Summary of using Nanopore R10.4.1 for MAG recovery.
- For an updated workflow for microbial genome recovery, check out mmlong2-lite.
- A high complexity metagenomic sample (anaerobic digester sludge) was sequenced with Nanopore R10.4 as well as Illumina Miseq, Nanopore R9.4.1 and PacBio HiFi to compare the different sequencing platforms. Overview of bioinformatic processing steps is presented below:
- Using PacBio HiFi assembly polished with Illumina reads as a reference, Nanopore R10.4 assembly was found to feature improved homopolymer calling, compared to Nanopore R9.4.1, especially for guanines and cytosines:
- Improvement in homopolymer calling for Nanopore R10.4 data is significant for genome-centric metagenomics, as most microbial genomes do not feature many homopolymers above the length of 10. To illustrate this, homopolymer rates were counted in genomes from RefSeq database:
- IDEEL test was applied to observe that Illumina read polishing did not vastly improve the IDEEL score for MAGs from Nanopore R10.4 data above the coverage of 40 (suggesting a minimal presence of artificial protein truncations in the consensus sequence), which is in contrast to Nanopore R9.4.1 data:
Conclusion: Nanopore R10.4 chemistry is a significant improvement over Nanopore R9.4.1 in terms of hompolymer calling, which enables the recovery of microbial genomes with vastly less systematic errors in consensus sequences.
- The R10.4 Nanopore chemistry has been superceded by the R10.4.1 chemistry, which can perform at different sequencing speeds, allowing users to tune sequencing yield and read accuracy. To test the different run modes, we have sequenced anaerobic digester sludge DNA using the P2 sequencer and have generated approximately 47 gbp (400 bps) and 29 gbp (260 bps) of simplex read data on the same PromethION R10.4.1 flow cell from the same library (no reloading).
- Mapping the simplex reads to the Illumina-polished PacBio HiFi metagenome assembly resulted in a modal read accuracy of 99 % (Q20) for the 260 bps run mode, while for the 400 bps mode read accuracy was slightly lower as expected:
- Despite the difference in read accuracy, homopolymer calling rates in simplex reads were estimated to be mostly the same between the 400 and 260 bps modes:
- Similarly, most homopolymers up to the length of 10 in the consensus sequences of above 20x coverage were estimated to be correctly resolved in R10.4.1 data, regardless of sequencing speed and without the need for short read polishing:
- The IDEEL test for clustered MAGs also showed no vast differences in the estimated rates of protein truncation between different Nanopore R10.4.1 sequencing speed modes, when MAG coverage is comparable:
Conclusion: Nanopore R10.4.1 chemistry preserves the improvements introduced by the R10.4 chemistry. Furthermore, we did not observe significant benefits in using the 260 bps mode over the 400 bps sequencing speed mode for MAG recovery.