Document the relationship between ont-guppy-duplex-pipeline and duplex_tools #18

tbooth · 2022-08-26T13:22:21Z

The current advice from ONT regarding how to perform duplex basecalling is here:

https://community.nanoporetech.com/posts/guppy-v6-0-0-release (dated 6th December 2021 - login required to view)

It makes no mention of duplex-tools, but says to pip install ont-guppy-duplex-pipeline and then run the script from that package, guppy_duplex, on the original fast5 files.

As far as I can see, this script is a rather clunky wrapper that calls guppy in simplex mode, then performs the equivalent of duplex_tools pairs_from_summary (the code for this is in ont_guppy_duplex_pipeline/channel_neighbours.py and looks like it's related to your duplex_tools/pairs_from_summary.py but the logic is not quite the same) and then runs guppy_basecaller_duplex to get the final result.

My main interest just now is to get a good but quick assessment of the approx number of duplex reads in each dataset, for QC purposes, and so duplex-tools seems the more useful approach. But so save others from having to peer through source code like I've been doing, could you please add some info to the README.md to say what is the relationship between these two ONT-developed packages?

Cheers!

The text was updated successfully, but these errors were encountered:

tbooth · 2022-08-26T16:10:35Z

Sorry, my mistake - I see ont-guppy-duplex-pipeline does also incorporate an alignment-based filtering step, but it does not yield the same results as this package. I get about twice the number of candidate duplex pairs. I guess I'll need to actually basecall these to see how many are false positives.

cjw85 · 2022-08-26T16:20:51Z

Hi @tbooth,

The scripts in the current version in Guppy were taken from an earlier version of this repository, hence the similarities. Guppy needs updating, IIRC the major difference is the compute performance. @ollenordesjo can comment on the output differences.

onordesjo · 2022-09-05T09:22:40Z

Hi @tbooth, sorry, have been on vacation, just seeing this now.

The option that mostly affects the output results is the min_qscore filter (https://github.com/nanoporetech/duplex-tools/blob/master/duplex_tools/pairs_from_summary.py#L320). Without re-running the same things through ont-guppy-duplex-pipeline and duplex_tools and checking carefully, that would be my first guess on why the number of candidate duplex pairs are different.

Depending on your requirements, you may want to set this threshold lower than the default (I would suggest including the best ~85% of reads or something similar, whereever that threshold may be for your dataset). We had some discussions about setting this threshold more adaptively, but decided that a constant threshold would keep it more reproducible on a per-read level.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Document the relationship between ont-guppy-duplex-pipeline and duplex_tools #18

Document the relationship between ont-guppy-duplex-pipeline and duplex_tools #18

tbooth commented Aug 26, 2022

tbooth commented Aug 26, 2022

cjw85 commented Aug 26, 2022

onordesjo commented Sep 5, 2022

Document the relationship between ont-guppy-duplex-pipeline and duplex_tools #18

Document the relationship between ont-guppy-duplex-pipeline and duplex_tools #18

Comments

tbooth commented Aug 26, 2022

tbooth commented Aug 26, 2022

cjw85 commented Aug 26, 2022

onordesjo commented Sep 5, 2022