Reads Loss during DADA2 run #2027

mentorwan · 2024-09-25T20:10:50Z

We ran a full-length Pacbio DADA2 analysis. Here is a question we encountered during the process:
There is some minor read loss during the DADA2 process. For example, in one sample, stats.tsv shows 24,049 non-chimera reads, but the DADA2-generated biom file or qzv file or taxonomy table shows only 24,025 reads—a loss of 24 reads.

I previously thought the number of reads would match the number of non-chimera reads after QC. Although this read loss is minimal, I checked other samples: some show no loss while others have very few lost reads.

Maybe it’s not a significant issue. Could you clarify our understanding or provide any related information we might be missing? Thanks.

benjjneb · 2024-09-26T00:52:15Z

For example, in one sample, stats.tsv shows 24,049 non-chimera reads, but the DADA2-generated biom file or qzv file or taxonomy table shows only 24,025 reads

Can you clarify what workflow you are using and how these different numbers are being generated?

mentorwan · 2024-09-27T12:51:19Z

The workflow we use is HiFi Full length 16S workflow: https://github.com/PacificBiosciences/HiFi-16S-workflow

The number is generated by output from this pipeline.
Here is table in stats.tsv related to this sample:

sample-id	input	filtered	denoised	non-chimeric	percentage of input non-chimeric
SC830317	39431	24600	24132	24049	60.99

But in DADA2_table.qzv file, we can see that for this sample, only 24025 reads assigned. There are 24 reads differences.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reads Loss during DADA2 run #2027

Reads Loss during DADA2 run #2027

mentorwan commented Sep 25, 2024

benjjneb commented Sep 26, 2024

mentorwan commented Sep 27, 2024

Reads Loss during DADA2 run #2027

Reads Loss during DADA2 run #2027

Comments

mentorwan commented Sep 25, 2024

benjjneb commented Sep 26, 2024

mentorwan commented Sep 27, 2024