Low number of good pairs after filtering #52

myxotheles · 2023-09-11T13:37:25Z

Hi,

I am trying to get duplex sequencing up but I find I get a very low number of 'good pairs' after filtering and consenquently, a very low number of called duplex reads. For example:

Total Reads 28801102
Read pairs (n) 10901142
Paired (%) 75
Good pairs 1151139
Good pairs (%) 4
BAM Duplex reads 1002726
Percentage of original reads (%) 3.48
Mapped 94%

So in this example, I am left with only 4% of the original reads.

I am using the basic usage as recommended:

duplex_tools pairs_from_summary $output_dir/sequencing_summary.txt $output_dir

duplex_tools filter_pairs $output_dir/pair_ids.txt $output_dir

nanopore_guppy guppy_basecaller_duplex \
        --input_path $input_dir \
        -r --save_path $duplex_dir \
        --device auto \
        --config $model \
        --duplex_pairing_mode from_pair_list \
        --duplex_pairing_file $output_dir/pair_ids_filtered.txt \
        --align_ref $ref \
        --bam_out

Questions:

Why do I get so few good pairs and subsequently good reads? 4% is a bit useless.
Should I skip the filtering step and run the second guppy run with the pair_ids.text instead?

Lastly, the duplex basecalling could benefit from simplification. Dorado usage looks good but I am getting errors so its not working at the moment. Would be great if guppy could be simplified!

The text was updated successfully, but these errors were encountered:

ollenordesjo · 2023-09-27T12:19:55Z

Hi @myxotheles,

Apologies for late reply, we're phasing out duplex-tools in favour of all batteries included in dorado.

Sorry to hear you're getting issues, would be excellent to know which errors you are having with dorado as that is the current method we recommend.

Just a couple of sanity checks for the run and dataset:

Was the flow cell a high-duplex flow cell?
What is the read length of the sample?
Is the sample native human or something else?
For the basecalling, was both the pass and fail reads used in the input dir?

Lastly, the summary metrics you're reporting, which tool do they come from?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Low number of good pairs after filtering #52

Low number of good pairs after filtering #52

myxotheles commented Sep 11, 2023 •

edited

Loading

ollenordesjo commented Sep 27, 2023

Low number of good pairs after filtering #52

Low number of good pairs after filtering #52

Comments

myxotheles commented Sep 11, 2023 • edited Loading

ollenordesjo commented Sep 27, 2023

myxotheles commented Sep 11, 2023 •

edited

Loading